When Correlation Actually Does Imply Causation
A revolutionary new statistical test teases apart the difference between cause and effect, sending correlation and causation into a tailspin
Correlation does not imply causation. It’s is one of the bedrocks of science—of rationalism. And yet, the flow from cause to effect is sometimes quite obvious. We know that wind causes turbines to spin, not the other way around, and that cold weather causes snowfall; snow doesn’t lower the temperature. In these frustrating scenarios, scientists are often obliged to run expensive, time-consuming studies just to demonstrate that an observed cause and effect relationship between two variables is, in fact, a cause and effect.
But now, a new study in the Journal Of Machine Learning Research may change that, with an innovative statistical trick that can determine cause and effect based solely on observational data. The results suggest that correlation not only implies causation, but that correlation can prove causation—under the right conditions.
It can be frustrating when the general public misunderstands correlation bias. So frustrating, in fact, that we made an angry video about it. Studies have found “links” between Arizona’s divorce rate and death by lightning strikes, and have correlated soda consumption with car accidents. Since you can correlate pretty much any two observed variables as long as you’re willing to twist the numbers a bit, “links” and “correlations” mean literally nothing to scientists unless they are subjected to a randomized, controlled study.
Acupuncture, for instance, has been correlated to cancer cures in a handful of observational studies. But until a randomized, controlled experiment demonstrates that acupuncture causes a cure rather than merely correlates to one, acupuncturists are no better than con artists.
But what if there was a mathematical way to prove that two variables (acupuncture and cancer remission, for instance) have a clear path of cause and effect—to use numbers alone to show that acupuncture causes cancer remission. Such a formula would save researchers billions of dollars that would otherwise be spent trying to prove causation. And it would deflate the Latin snobs of the post hoc ergo propter hoc camp, forever.
For this new study statisticians focused on the simple cases of X and Y, two lone variables that are definitely linked—except we do not know whether X causes Y, or Y causes X. One way to solve that problem would be to run an expensive study that controls all outside variables and hones in on X and Y.
Or we could tap into additive noise model testing.
Additive noise model testing is based on the simple assumption that there is always some statistical noise clinging to the key variables in any experiment—areas where the data becomes fuzzy and unreliable due to measurement errors. Regardless of any link, each variable will have its own unique noise signature, with one caveat: If X causes Y, then the noise in X will be able to contaminate Y, but the noise in Y will not able to do the same to X. Because a cause can affect an effect, but an effect cannot affect a cause (read that last line a few times).
The key, then, is to follow the noise contamination. If the noise from the acupuncture data causes noise in the cancer cures data, and the noise from the cancer cures data does not cause any noise in the acupuncture data, we can prove—without any expensive clinical trials—that acupuncture cures cancer.
The researchers tested their theory on 88 (decidedly less pressing) datasets. One dataset linked altitude to temperature, another linked temperature to snowfall. Yet another related the price of rent to the size of an apartment and wind turbine speeds to the local windspeed. In all of these cases the causation is obvious, and which variable does the “causing” is pretty easy to parse. Altitude causes temperature increase—lower temperatures don’t make mountains grow in height. But the simplicity of the datasets was one of the strengths of this study. By focusing on links that are beyond dispute, the statisticians could test their model and see whether it came to the logical, established conclusion.
Remarkably, the study suggests that the additive noise model was quite accurate. Roughly 80 percent of the correlations were appropriately linked through cause and effect. The findings suggest that this innovative method could be a boon to researchers across scientific disciplines.
But hey—it could just be a correlation.