HEALTH

Genetic Research Has Been Compromised Because Scientists Suck At Excel

Scientists don't know how to stop the Microsoft program from autoformatting gene names into dates

HEALTH
Aug 25, 2016 at 4:00 PM ET

Some of our greatest scientific minds don’t know how to use Microsoft Excel, compromising their research and introducing time-consuming errors, according to a new study.

To be helpful to us common folk, Excel sometimes automatically formats dates. But some gene symbols, typically expressed as a short series of letters and numbers, look like dates, and, under its default settings, Excel goes ahead and changes those as well. So when scientists enter “SEPT2” and “MARCH1” into their spreadsheets, Excel thinks they mean September 2 and March 1, and helpfully changes them to 2-Sept or 1-Mar. But SEPT2 is actually Septin 2 and MARCH1 is, of course, Membrane-Associated Ring Finger (C3HC4) 1, E3 Ubiquitin Protein Ligase. You can see the problem here.

In an article for Genome Biology, three researchers analyzed a decade’s worth of articles from 18 scientific journals. They found that a fifth of the studies that used Microsoft Excel for their supplementary files had those conversion errors — 987 files in 704 articles, the report says. And the rate of errors is actually getting worse as time goes on.

The paper points out that this is all despite the fact that the Excel quirk was made public over a decade ago — according to this 2004 paper, there are at least 30 gene names that Excel converts to dates. The BBC got a statement from Microsoft that explains Excel’s default settings were designed for “most day-to-day scenarios” and it’s a pretty simple affair to turn the automatic formatting off. Which means that there are fundamental errors in potentially thousands of scientific papers because the people writing them didn’t know this was a problem, but should have. Or perhaps they didn’t know how to fix it, though they should have (it’s really not that hard). It’s also possible researchers knew about the issue, but couldn’t be bothered to fix it, but they really should have.

The best part is, the BBC interviewed a scientist who said the real problem here was that the scientists were using Excel at all, which he said should only used for “lightweight scientific analysis.” Which means that not only do the scientists suck at Excel, but they aren’t even supposed to be using it in the first place. These are the people we trust to find cures for diseases.