A Revolutionary Algorithm To Clear Up Rape Kit Backlogs

Probabilistic genotyping makes it possible to identify offenders' DNA in minutes

Illustration: Tara Jacoby
Sep 10, 2016 at 12:18 PM ET

If you step into California’s Alameda County forensics laboratory where rape kits are processed, the first thing you’ll notice is the robots. These whirring desktop machines don’t have googly-eyes or articulated, humanistic arms — they are less “Short Circuit,” more early-Macintosh — but they are robots nonetheless. These machines have reduced the time needed to test kits — the sets of evidence collected from a victim after a sexual assault — by processing dozens of DNA samples at once. The innovation is vital, as the United States reckons with a backlog of hundreds of thousands of untested rape kits.

But it’s a piece of software, not hardware, that is currently being heralded as the future of rape kit testing. It’s algorithms, not robots, that could revolutionize this beleaguered area of DNA testing, and forensics in general.

“It’s kind of what everyone is talking about, where everyone is going,” said Kristi Lanzisera, a supervising criminalist in the Alameda County Sheriff’s forensic biology unit, after showing off an array of robotics. Her lab just squeezed the software into this year’s budget and is in the process of purchasing it.

More Testing Rape Kit Backlogs Saves Millions Of Dollars

This technology is called probabilistic genotyping, and it greatly simplifies the process of identifying an offender’s genotype from a DNA sample. Rape kits often contain samples with both the victim’s DNA, as well as that of at least one perpetrator. These are known as mixtures and they’re incredibly difficult to analyze. In identifying a genotype, analysts typically compare variations, or alleles, on 13 points in the DNA. But when there are multiple people’s DNA involved in a sample, it can be difficult to tell which alleles belong together or, put another way, to the same person. Probabilistic genotyping software performs complex mathematical computations — much faster and with greater accuracy than a human can — of the statistical likelihood of the individual genotypes in a mixture.

What that means, according to some, is that it might take less time to test rape kits. Mind-shatteringly difficult mathematical calculations that might have taken several hours can now be done in minutes. It’s a significant refinement at the end phase of testing a rape kit. The robots have sped up everything — from extraction, which is typically where male DNA in a sample is separated from female DNA, to the Polymerase Chain Reaction (PCR) process, where the DNA is amplified, or copied, for testing. But it’s that final step of analyzing and calculating probability where the robots’ usefulness ends and the bottleneck begins.

“What people don’t realize is, yes, the laboratory is part of it, but then you have to take all that data and interpret it, and it’s not always easy and that takes a big chunk of time,” said Lanzisera. “Then you have to write a report, then someone has to review it, then a second person has to review it.” The software could greatly cut down the amount of time needed for analysis.

This, of course, carries with it a lot of excitement, given the massive backlog of rape kits. For a number of reasons, police departments around the country have piles of untested rape kits sitting around — meaning that someone has been assaulted, undergone the invasive procedure of getting tested soon thereafter, and the potential evidence that could convict the assailant is just sitting in a back room somewhere. The reasons for the backlog vary — from rape myths that cause officers to de-prioritize the testing of certain kits to insufficient laboratory staffing — but the evidence is clear: when backlogs get tested, rapists end up prison.

More Clearing The Rape Kit Backlog Is Fraught With Obstacles

In some cases, states have successfully cleared their backlogs. In California, for example, state-run labs brought rape kit testing turnaround to within 20 days — down from six months — and eliminated their backlog after instituting something known as the Rapid DNA Service Team (RADS). This protocol sees the three most vital and promising samples tested from every single rape kit in the state. This way, the very best, most promising evidence is fast-tracked. Some state labs have further streamlined the process by incorporating probabilistic genotyping.

Just earlier this year, the California Department of Justice’s Jan Bashinski Laboratory started using STRMix, a probabilistic genotyping software. Gary Sims, criminalist manager at the lab, says it saves several hours of time and frees up analysts to work on other cases. Cases with complex mixtures tend to be what bring down the laboratory’s average processing time for rape kits, so a streamlined testing process in those cases could dramatically impact the overall testing time.

That said, no one is suggesting that probabilistic genotyping will solve the backlog problem, and some argue it won’t actually speed up processing times, since the software’s results must be double-checked by humans. In fact, some say a potential downside is that it makes more samples analyzable. “Evidence that was previously discarded as uninterpretable is now giving positive or negative results, that is implicating or exonerating an accused [person] or being suitable for upload to a database,” said John Buckleton, co-creator of STRMix. Ultimately, that can mean more work, not less.

So, it streamlines part of the testing process, while also effectively creating more samples to test. But that added work could mean more convictions in sexual assault cases, using evidence that would otherwise be discarded. In general, Buckleton says probabilistic genotyping “increases the value of evidence gained from DNA analysis of sexual assault kits.”

Similarly, Steve Guroff, forensic biology supervisor at the San Diego Sheriff’s Department, which recently instituted probabilistic genotyping, says the software is most useful in cases with mixed DNA samples and, since that’s often the case in rape cases, it may “allow us to identify the perpetrators of sexual assaults more frequently than otherwise.”

Another benefit of the software is that it removes some of the room for human error in DNA analysis. As The Atlantic recently reported, there is growing awareness around the potential for lab workers to make missteps in these incredibly complicated calculations. Of course, the software also introduces the potential for errors both technical and human. In one incident, lab workers in Albany, New York, were accused of cheating on their qualification test for using TrueAllele, a probabilistic genotyping software, which raised questions about their qualifications for doing the job.

Defense attorneys have attempted, without success, to get TrueAllele to disclose its algorithm. As one such attorney told Ars Technica, “When you put data into computer and it spits out something, you’d like to know how it did it.” STRMix, on the other hand, is open source. The company’s management acknowledges that no software is free of errors, but they say any errors are small and reflected in the numerical value of the statistic.

For now, the forensics world is largely approaching probabilistic genotyping with excitement, even though it remains shrouded in some mystery. “It’s a little bit like this amorphous, magical unicorn thing,” said Alameda County Sheriff’s Lanzisera. “We’re all like, ‘Okay, what is it gonna do? How long will it really take?'” From the sounds of it, crime labs across the state, if not the country, are about to find out. As she put it, “Everybody is either in the process of purchasing or plans to purchase and will purchase it in the future.”