When Witnesses Agree 100%, They’re Probably Wrong
Perfect agreement among witnesses is so unlikely that it should be considered a red flag, a study suggests
Let’s face it—there’s something inherently fishy about a panel of witnesses who each recall the exact same series of events. Humans are imperfect; we see things differently, forget minor details and recount stories in odd orders. So, when witnesses’ accounts don’t differ by a healthy margin, it’s actually a sign something might be wrong.
Now, in a new study published in Proceedings of the Royal Society A, scientists use statistics to prove this point, with findings that suggest the probability of perfect agreement between witnesses is almost zero.
“Getting a large group of unanimous witnesses in these circumstances is unlikely, according to the laws of probability,” said Professor Derek Abbott, a probability expert at the University of Adelaide in Australia and coauthor on the study, in a prepared statement. “It’s more likely the system itself is unreliable.”
The ancient Jewish court system had an odd quirk. Monetary cases were adjudicated by three judges and the majority opinion ruled. If all three judges were unanimous, the proceedings simply went faster. For capital cases, however, a panel of 23 judges would hear the testimony and then give their opinions. If a reasonable majority found the defendant guilty, he or she would be executed—but if every single judge on the panel handed down the same guilty verdict, the defendant walked free.
Until recently, that seemed counterintuitive. Then, statisticians discovered verschlimmbesserung.
That’s right. Verschlimmbesserung. It’s an incredibly inconvenient German word that roughly translates to “disimprovement” (bear with us). Statisticians use it to describe unique situations in which we can mathematically demonstrate that more evidence or improvements to a given situation will actually make matters worse. The simplest example is when officials add multiple detours to a congested route. Logic dictates that more detours around the traffic hub should improve conditions—and yet, traffic only gets worse.
For this study, scientists set out to test whether the same rule of disimprovement applies to witnesses testifying in court, specifically in a police lineup. After running a complex mathematical model, they found that three is the magic number for reliable testimony—up until three witnesses have confirmed the identity of a perpetrator, each witness makes it more likely that we’ve found the guilty party. But after three witnesses, that number begins to trail off and, as each successive witness makes the same positive identification, it becomes less and less likely that they’re telling the truth.
“In our scenario, the probability that a suspect is guilty is strong after three positive identifications by witnesses,” Abbott says. “But our tests showed that the more positive confirmations you have beyond those three, the more it erodes our confidence that this is the right person being identified.”
In other words, one of the ancient Jewish court system’s most puzzling loopholes is in fact supported by modern statistical modeling. Three judges who unanimously agree would, mathematically speaking, be more likely to have made the right call than 23 judges who unanimously agree. Abbott and his colleagues mention the ancient system in their paper, and note that it “indicates a surprising level of intuitive sophistication for the time, when such statistical tools would not have been at their disposal.”
But there are also some quite uncomfortable implications for scientific inquiry. In their study, Abbott and colleagues use the same mathematical model to demonstrate that, in isolated cases, more evidence that ancient pottery has been correctly identified can actually weaken the archaeological evidence. “Overwhelming evidence can itself be evidence of uncertainty, and thus be less convincing than more ambiguous data,” the authors write.
Now that’s disturbing, because the backbone of science is replication. We run the same drug trials over and over again and have multiple independent sources identify archaeological treasures because that’s how we make the evidence more robust. But Abbott and colleagues are motioning that, in select cases, we may want to re-evaluate our scientific beliefs precisely where the evidence is overwhelming and, by default, unlikely to be correct. That would require a paradigm shift in how we conduct scientific studies.
Verschlimmbesserung isn’t just a mouthful—it’s a new way of looking at evidence. And it may become a major inconvenience.