
No one said this would be easy.
You've heard of MRI machines for diagnostic imaging: big machines where the subject slides into the center of a noisy donut-shaped machine while a powerful magnetic field and radio waves create a picture of their insides.
There are many types of MRI scans. Functional MRI (fMRI) looks at how different parts of the brain respond to stimuli. A common fMRI use is for Alzheimers neuro-imaging. Some 40,000 published papers have used fMRI to delve into the human brain over the last 25 years.
However, until this study, the software packages used to analyze the data were never validated with real data.
The study
In the paper Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates researchers Anders Eklund and Hans Knutsson, of Sweden, and Thomas E. Nichols, of the UK, ran almost three million random group analyses using real -- not simulated -- human data to compute actual false positive rates. They concluded:
. . . the parametric statistical methods are shown to be conservative for voxelwise inference and invalid for clusterwise inference.
The invalid techniques gave up to 70 percent false positives. That's bad. How bad?
These results question the validity of some 40,000 fMRI studies and may have a large impact on the interpretation of neuroimaging results.
The Storage Bits take
In a world of Big Data, statistical quality needs to be taken seriously. But statistics are complex, so even highly educated professionals rely on packages making assumptions they don't understand, trusting that the results are sound.
This study shows that for at least one important area of research the trust in the statistical validity is misplaced. The packages were "validated" with synthetic data, but:
. . . it is obviously very hard to simulate the complex spatiotemporal noise that arises from a living human subject in an MR scanner.
This isn't only a problem in brain research. In data storage, for example, long asserted RAID array data loss rates assumed that drive failures were independent.
It took more than a decade for research to find this wasn't true. Of course, during that decade, vendors sold billions of dollars worth of underperforming RAID arrays, while collecting the service data that should have shown them the truth.
But as with the irresponsible lending practices that led to the Great Recession, the companies profiting from the invalid assumptions didn't want to spoil the party. As Will Rogers said "It isn't what we don't know that gives us trouble, it's what we know that ain't so."
Even more true with Big Data.