Friday, December 18, 2015

MDs and stats: probably not worth the trouble.

The meme

The mathematical abilities of medical doctors is a classic meme. It is not totally unfounded. This classic study "Statistical Literacy of Obstetrics Gynecology Residents" is often summarized by the catchy phrase "significantly less than 50% of doctors can answer a true false question about statistics". Indeed, if one reads the abstract, it appears that only 42% of the respondents in a large sample of MDs (n=4173) were able to recognize the definition of what a P Value is. Some abstained, some - just a bit less - got it right, the study can't be generalized...

The question was ‘‘True or False: The P value is the probability that the null hypothesis is correct.’’ 

The correct answer is false. Assuming some null hypothesis is true, the P value is the probability of obtaining a test result at least as extreme as the one obtained. Say you want to test a placebo and your study produces a P Value of 0.01 basically means that your data had a 1% chance to occur if the if the null hypothesis was true (see here for an explanation with charts). It means nothing more. Likewise, a P value of 0.95 means the result you obtained had a 95% chance to be observed if the the null hypothesis was true.

Some important things to note:

  • by definition, the math is based on a starting point where the null hypothesis is considered to be true. Therefore, you are NOT calculating a probability that it is valid or invalid. The probability is applied to the data, not the null hypothesis. (some will even argue that the data has no probability since it is there already, but that part is for the larger B vs F debate I will mention a bit later)
  • if your data has comes up with a low P value, you can't really tell distinguish between "the null hypothesis is false"  and "my sample is biased by something else"
  • the confidence value to which the P value is compared is a bit arbitrary and depends on the field. Medicine is often quite happy with 0.05, fundamental physics wants "5 sigmas" or a P Value of 0.0000003.

Let's stop for a minute to think about the first and last point: why such a discrepancy?  What is the chance of incorrectly rejecting a true null hypothesis? Well, it depends on several factors, selection bias (in the above case, only gynecologists for example), population size, prevalence of the observed effect, etc... Physicists can afford to characterize their testing environment, run as many tests as our taxes allow. Physicians are limited by who they enroll in their studies, vague null hypothesis based on previous studies, small sample sizes, different prevalences of what they test, etc... One of the main issue is the initial probability of the true null. It is assumed to be 100% in the calculations of the P value but, particularly in Medicine, it is almost always not the case! Addressing that question leads to another set of calculations that provide, you guessed it, another probability. What is the chance of incorrectly rejecting the null hypothesis? It turns out that a P value of 0.05 has at least a 23% chance of incorrectly rejecting a true null hypothesis. In practice, the probability is around 50%! (start here for a recursive exploration in the issue) but it can also simply be 100% if the null hypothesis is poorly chosen.

But there is more: even if an experiment has been set up and reported properly, it has a significant chance of being understood or reported incorrectly, even in the specialized press. In this case, a now famous Higgs Boson announcement was reported by both the Daily Telegraph and Nature. Nature got it wrong... Maybe both writers tossed a coin and one got lucky?  (medical analogy: after some procedure, you are either alive or dead, there is only one direction of interest. You can't be more alive or more dead. But you can be taller or smaller than average, or produce more or less TNF, etc...)

Does ignorance matter?

Going back to our gynecologists study, generalizing it as "MD's don't know shit about maths and stats" could be a mistake. But, even if it is not, does it matter? Is it actually a bad thing not to know about P values? Let's see what Nature (again) has to say on the issue and let's hope that, this time, the coin they tossed fell, by chance, on the correct side: Scientific method: Statistical Errors. But is there a consensus on the issue? The comments are priceless and quickly turn into an argument.

Assume a study reports a P value 0.05, does it really matter if 50% of the MDs who read it get it wrong when the study itself has a true error rate of 50% ? A question worth asking I believe.

In the subset of MDs answering the question correctly, how many will go down one level deeper and understand all the parameters that intervened in the generation of that P value? If they don't understand them, what is the chance that they falsely convince themselves that a result is likely true or a null hypothesis is false? Now, that is getting a bit hairy... But we are not done yet.

Assume they are of the rare breed that gets it fully (I certainly would not claim that I do myself even if I spent more time than the average MD looking at the issue), is that beneficial? They could, of course, start teaching and improve the statistical literacy of other MDs by helping them understand the correct definition and interpretation of P values. If they are particularly gifted, they could even start disagreeing with other particularly gifted statisticians who happen to fundamentally disagree with them. Or they could decide to jump in the big arena and join the never ending Bayesian vs Frequentist debate...

A typical Bayesian vs Frequentist fundamental debate could be summarized as follow

Bob to Fred: "The probability that you are wrong is extremely high"
Fred to Bob: "Don't bother, I observe that you are wrong"

or the other way round... who cares.


Meanwhile in the real world...

But that's not where the glory and money is... The main career path of statistically competent MD is in the industry, where his only goal will be tweak studies to achieve a nice P value for whatever drug or device study his employer wants to push on his statistically illiterate colleagues.

That's something he will achieve by slightly tweaking the null hypothesis, excluding some data, cherry picking the prevalence rate of the issue that he addressed... "We need P : 0.01 to push this through, get it."
More statistically competent MDs means more tweaked studies if they are on the dark side. Unfortunately, the dark side usually pays better.

Clearly, increasing the number of statistically competent MDs is NOT  good thing. Quite the opposite.

Disclaimer: since I am expressing a single opinion about a binary choice, there's a 50% chance that I am wrong. A priori. If my hypotheses are true. And I understood the issue. Got my prior right...

No comments:

Post a Comment