OkCupid Could Sue Over Massive Data Leak
Who's to blame: the company that compiled this information, or the people who repackaged it?
The fact that three Danish researchers briefly published 70,000 OkCupid users’ extensive sexual preferences isn’t just a major faux pas. The popular dating site says it could use a controversial U.S. anti-hacking law to try to sue their pants off.
OkCupid doesn’t do a great job of hiding its users’ information. Some profiles are easily searchable, and answers to personal profile questions—which range from kinky queries like “Do you like to be tied up during sex?” to inquiries about religious beliefs—are publicly available online. Specific user answers to these questions aren’t displayed on profiles; the site invisibly uses them to algorithmically calculate the likelihood that two individuals will be a good match.
However, the data associated with each account is stored in such a way that a talented computer programmer able to build a “scraping” tool for data could successfully access and compile it without permission from OkCupid or any of its users. That’s what the three researchers, led by Aarhus University graduate student Emil Ole William Kirkegaard, did. Afterward, the trio posted its enormous .csv file, which included usernames and the answers associated with them for nearly 70,000 users, online for anybody to download and view.
By Thursday, the research group had replaced the open file with one that was password protected, though a number of individuals, including several Vocativ employees, had already downloaded the original. While the file doesn’t include users’ real names, it does match personal information with OkCupid usernames—some of which are literally comprised of a first and last name. Others, meanwhile, match with public accounts on sites like Twitter or Flickr.
Needless to say, OkCupid isn’t taking this breach lightly. “This is a clear violation of our terms of service—and the Computer Fraud and Abuse Act—and we’re exploring legal options,” OkCupid spokesperson Matthew Traub told Vocativ.
The CFAA is notorious as a broadly written hacking law, signed by President Ronald Reagan in 1986, that criminalizes “unauthorized access” to a computer or network. While it’s often used to prosecute criminal hackers or hacktivists—Creative Commons co-founder Aaron Swartz famously committed suicide after being charged under the CFAA—it can be used for civil cases, too.
Likely the most famous case in which it was used is from several years ago, when data scraping company 3Taps began collecting real-estate postings data from classified ads website Craigslist and selling it in bulk to other companies. One buyer was Padmapper, which used its information to allow users view a map of an area, but populated with recent Craigslist real estate posts.
In 2012, Craigslist sued the two companies, saying that they’d violated the CFAA because scraping those public posts, which Craigslist’s terms of service specifically prohibits, constituted “unauthorized access” to a computer or network. Last year, 3Taps agreed to settle for $1 million.
Take, for example, the case of the notorious “cannibal cop,” NYPD officer Gilberto Valle. He was convicted of violating the CFAA for using police software to find women he could fantasize about kidnapping and eating. Though he was originally found guilty on that charge, Judge Paul Gardephe of the Second Circuit overturned that conviction, noting that people frequently use computers in ways they’re not explicitly allowed to, and that prosecuting everyone who did would be a nightmare. “While the Government might promise that it would not prosecute an individual for checking Facebook at work, we are not at liberty to take prosecutors at their word in such matters,” he wrote.
Lawyer Tor Ekeland, known for defending hackers accused of violating the CFAA, told Vocativ that the law is trending away from aggressively prosecuting those who violate a site’s terms of service. But there isn’t yet a set standard across the U.S.
Kirkegaard told Vocativ he had not received legal threats from the dating site, and has defended his research on his Twitter account. He maintains that data that’s already openly available to the public is fair game to be repackaged for research. “data is anonymous,” he wrote.
“I question the judgment of these researchers,” Ekeland said, but “seriously, this company left information about its subscribers’ sexual preferences on the internet? That’s mind blowing to me.”