How To Find Out If Scientists Are Screwing With You
The GRIM test is a new, simple formula that can detect whether psychology studies are just making stuff up. The best part? Anyone can use it
Scientific studies often produce results that seem improbable, but they shouldn’t produce results that are actually scientifically, mathematically, and logically impossible. Despite safeguards in place to prevent blatant falsehoods from being published, occasionally such a study somehow staggers through the peer review process, and then it takes us a while to notice that the data isn’t sitting quite right.
But now there’s a way to check that data, uncover fraudsters and draw your own red circles around sloppy research. It’s called the GRIM Test, and it’s delightful.
Here’s the theory behind the test. Data, by nature, comes in discrete pieces and follows specific mathematical rules—at least, when it hasn’t been tampered with. To borrow an example from the authors of the paper themselves, imagine that a psychology study presents a sample of 12 students with an average age of 20.92. If, hypothetically, we were to make one of the students in that sample one year older, the average age would jump to 21. This thought experiments teaches us that the averages in the sample can only change by one twelfth. The average ages could be 20.68 or 20.80 or 20.92 or 21. But not 20.67 or 2.81.
So what happens when the paper is published, and the abstract describes twelve college students, with an average age of 20.72? “Well, usually, absolutely nothing whatsoever,” the authors write. “This looks plausible, and it would be a cold day in hell before anyone ever thought to check it.”
“But: if you do check it, you find it’s wrong. The ages are impossible.”
For the study, scientists James Heathers and Nicholas Brown went through roughly 70 psychology papers by hand, GRIM-testing each of them to their hearts’ content. There were rules, of course. The GRIM test only works on small samples and whole numbers and only tests averages—the acronym itself stands for “Granularity-Related Inconsistency of Means”, which more or less means “detecting when averages must be impossible, due to the discrete nature of data—so its not exactly the stuff of math and physics. But it works remarkably well on the sort of numbers and averages you’d find in tucked safely into the abstract of a psychology paper.
Upon examining the results, Heathers and Brown found serious problems—nearly half of the papers reported at least one impossible value, and one in five papers contained multiple impossible values. The findings suggest widespread fraud, incompetence or sloppiness in psychological research.
Why would psychology researchers would make up data? To be sure, some of these cases of impossible averages were probably honest mistakes. But it is also likely that some researchers, under pressure to publish statistically significant results, aren’t opposed to fiddling with the numbers just a bit. Since the very game of statistical significance is a bit of a farce, one could almost imagine an otherwise upright scientist adjusting an average by some infintesimal fraction, simply to get a decent paper published. Of course, such slight changes will be nearly impossible for the peer review system to catch—who’s going to notice an average of 20.72 that should read 20.68? Who’s even going to care?
The GRIM Calculator—that’s who!
Now, besides providing the obvious powder keg that will kick off The Great Nerd War, the authors’ choice in targeting psychology papers will likely tear open old wounds in the science publishing community. In 2015, a landmark study found that most psychology papers could not be replicated, casting a suspicious shadow over the entire field. Although at least one subsequent study has since vindicated those very papers, scientists—especially those interested in the burgeoning field of meta-research, or “the study of studies”—remain skeptical. So when the authors of this new paper suggest that half of psychology studies may contain impossible averages, it’s bound to cause a stir.
Not that there was necessarily any foul play. The authors themselves confronted several of the researchers behind these impossible numbers, and found that many of them were amenable to issuing corrections and walking through their mistakes. Occasionally, an impossible number crept in due to a typo. Other times, it was due to incomplete reporting of the research methods.
But it’s hard to imagine that outright fraud didn’t play a role in at least a few of these disturbing cases. “Are we accusing anyone of anything? Not on your life,” the authors write. “But. Is it likely we found some [fraud]?