It's a valid argument though. If Czechia has increased testing 2x, then positive case counts should increase 2x.
This whole international number comparison is actually bollocks. It's starting to annoy me. The WHO needs to step in and standardise the data. We need test counts, positives per 1000 tests, and qualifying the bands for the different symptoms. We need standards on what data is counted as a death and when. This needs be in place for every future epidemic. Regional data obfuscates what's actually happening.
"If you torture data long enough, it will confess"
- Ronald Coase
2x sampling doesn't necessarily mean positive case counts should increase 2x at all. And this is a mistake that many people will make in assumptions (myself included).
For starters, doing a quick visual check, the graph isn't linear
It is much easier in this case, to go with a delayed reading of death rate and multiply it through by deaths and figure out an approximation of infected cases. So 100 deaths = 10,000 cases. There are things that can influence your death rate of course, and for that reason we have box and whisker plots, regressions with tolerances and variance. We have all these extra statistical measures for exactly this purpose.
My job is data science and we see mistakes happen all the time at all levels of experience. Hell I make mistakes all the time too. It's easy to make a claim, ti's actually really hard in reality to pinpoint the exact data to make the claim. This is a key challenge to understand and interpret data that is released out there; some people have agendas, others don't know what they are doing fully, and some are too excited by their findings to properly check if it's correct. This is why AI and ML is such an amazing field. Because you likely found all the main critical factors to result in a working AI or your AI wouldn't work outside of your own training cases.
So typically in programming we do:
Rules + Inputs = Outputs
So we take data and apply some functions and we get results.
In Machine Learning
We take
Outputs + Inputs = Rules
We have the inputs (features), we have the results, and machine learning is supposed to return to us the set of rules to get us those answers.
So in this case there are several factors that I would consider key in proving his claim.
Which country
Do they wear Masks
Do they do social distancing
Climate
Health Care
Education
Government type
etc
These are the types of inputs I would put into a ML algorithm and I would then toss in the resulting data for the case increases per day. and See the ML algorithm build some sort of regression.
If I were to take a list of variables and tossed it into a ML algorithms and asked it to predict the number of cases or some regression, would masks be the feature to have 100% weighting over the result of case increase?
That doesn't make any sense. There is 0% chance wearing a mask for 100% of the population will stop the increase in cases.
China and Asia wear masks everywhere and daily. They manufacture over 90% of the worlds masks and are the greatest user of them.
And yea Wuhan blew up and other countries in Asia continue to grow in cases.
So it's clear there are other factors.
No ML algorithm would put 100% weighting behind anything, unless the input was the result.
And if someone was bored enough to create the training set we could toss this into a decision tree or any other basic ML model and prove his claim to be false.