Double-dipped sundae with a picked cherry on top

Let’s say that your hypothesis is that pitching in the 2010 baseball season is much stronger than 2009 pitching. In support of your hypothesis, you take a sample of excellent starting pitchers, say the ten pitchers with the most complete games. The average ERA for this group (at writing) is 3.17, and when you compare this to the 2009 MLB average ERA of 4.45, you say “see, I told you that we’re entering a new era of pitcher-dominated baseball!”.

Not so fast. Pitching a complete game is correlated with a low ERA (if batters were hitting you, you’d be taken out for a relief pitcher). This logic is circular: you are taking the best pitchers to prove that pitchers are great. These best pitchers are not representative of all pitchers.

Unfortunately, this statistical mistake is not uncommon in science, and a couple of recent papers have addressed this “voodoo” or “double dipping”. 

The Neuroskeptic just pointed out a particularly egregious case of a paper advocating double dipping as a way of getting better results from clinical drug trials. Briefly, their method is to run clinical trials at many centers, and then discount the centers that show a strong placebo effect. As the effect of any drug is measured by the amount of benefit that participants in the drug condition get over the participants in the placebo condition, centers with a strong placebo effect have a weaker drug effect.

Not all placebos are created equal, and not all types of patients respond to placebos in the same way. For example, severely depressed people have very little placebo effect in antidepressant trials, so antidepressants only have a strong effect in this population. 

There have been many recent, hard-hitting criticisms of several practices of big pharma, and they have been known to cherry pick studies for publication. Although only 50% of government-funded clinical drug trials find that a particular drug works, over 85% of industry-funded studies do.