Statistics, Statistics. Disappointing Indeed
February 16, 2015
At dinner on Saturday evening, a medical researcher professional mentioned that reproducing results from tests conducted in the researcher’s lab was tough. I think the buzzword for this is “non reproducibility.” The question was asked, “Perhaps the research is essentially random?” There were some furrowed brows. My reaction was, “How does one know what’s what with experiments, data, or reproducibility tests?” The table talk shifted to a discussion of Saturday Night Live’s 40th anniversary. Safer ground.
Navigate to “Science’s Significant Stat Problem.” The article makes clear that 2013 thinking may have some relevance today. Here’s a passage I highlighted in pale blue:
Scientists use elaborate statistical significance tests to distinguish a fluke from real evidence. But the sad truth is that the standard methods for significance testing are often inadequate to the task.
There you go. And the supporting information for this statement?
One recent paper found an appallingly low chance that certain neuroscience studies could correctly identify an effect from statistical data. Reviews of genetics research show that the statistics linking diseases to genes are wrong far more often than they’re right. Pharmaceutical companies find that test results favoring new drugs typically disappear when the tests are repeated.
For the math inclined the write up offers:
It’s like flipping coins. Sometimes you’ll flip a penny and get several heads in a row, but that doesn’t mean the penny is rigged. Suppose, for instance, that you toss a penny 10 times. A perfectly fair coin (heads or tails equally likely) will often produce more or fewer than five heads. In fact, you’ll get exactly five heads only about a fourth of the time. Sometimes you’ll get six heads, or four. Or seven, or eight. In fact, even with a fair coin, you might get 10 heads out of 10 flips (but only about once for every thousand 10-flip trials). So how many heads should make you suspicious? Suppose you get eight heads out of 10 tosses. For a fair coin, the chances of eight or more heads are only about 5.5 percent. That’s a P value of 0.055, close to the standard statistical significance threshold. Perhaps suspicion is warranted.
Now the kicker:
And there’s one other type of paper that attracts journalists while illustrating the wider point: research about smart animals. One such study involved a fish—an Atlantic salmon—placed in a brain scanner and shown various pictures of human activity. One particular spot in the fish’s brain showed a statistically significant increase in activity when the pictures depicted emotional scenes, like the exasperation on the face of a waiter who had just dropped his dishes. The scientists didn’t rush to publish their finding about how empathetic salmon are, though. They were just doing the test to reveal the quirks of statistical significance. The fish in the scanner was dead.
How are those Big Data analyses working out, folks?
Stephen E Arnold, February 16, 2015