1.8 Million Floppy Disks: Visualizing The Giant Panama Papers Leak
Try 30 elephants' weight in paper
The massive Panama Papers data leak released by an unknown source to the International Consortium of Journalists has been deemed perhaps “the biggest…in the history of data journalism,” amounting to 2.6 terabytes of data. The trove includes 4.8 million e-mails, 3 million database entries, 2 million PDFs, 1 million images, and over 300,000 additional text files on offshore bank accounts connected to the Panama law firm Mossack Fonseca. It’s so big that it could never have been leaked in the past.
“As information flows are increasing through the internet, over the last couple years we’ve seen these kinds of monumental data dumps of information that are really raising questions about what’s happening behind the shadows,” said Craig Fagan, senior policy coordinator at Transparency International. “At every [instance] we’ve thought, ‘This is a watershed moment,’ but up until now this has been the biggest dump that there has been.”
Two terabytes might not seem that big in a digital world, but it’s an unprecedented size for one single leak. Not that long ago, a leak this size would have been impossible to pull off, purely because getting that amount of information out of a building, undetected, would have been so difficult. In paper terms, even excluding the database entries, it would have taken at least 16,200 reams, a weight of 300,000 pounds, to print out the Panama Papers. That’s over 1,000 times as many pages than Pentagon Papers whistleblower Daniel Ellsberg had to photocopy.
Not that it would have been much easier in the 1980s or ’90s, when leaking 2.6TB of data would mean hauling off your data on 1.8 million floppy disks. Laid flat, that’s enough floppy disks to cover 54 tennis courts. Saved on CD-ROMs, the Panama Papers would amount to 16-foot-tall stack of discs.
But now that 2.6TB of data can fit in a pocket-sized drive, or be sent as packets of encrypted files, it’s a different story. To deal with this trove, the ICIJ developed a protected search engine to allow 300+ journalists trawl the cache. Other leaks pale in comparison. The Panama Papers data trove was bigger than the combined content of Wikileaks’ Cablegate (1.7GB), the Luxembourg Leaks (4GB), the Swiss Leaks (3.3GB), the 260GB Offshore Leaks, and Edward Snowden‘s NSA flash drive (60 GB) nearly eight times over.
The source behind this leak has remained anonymous so far, having told the Suddeutsche Zeitung journalists that his life was in danger and that communication would only occur via encrypted files. This process—ideal for both protecting his identity and supplying the hundreds of journalists brought in on the project—reflects the ever-increasing capabilities of whistleblowers and the possibility of mega-leaks such as this soon becoming commonplace.
“If we look at technology trends and how information flows have changed, we’ve had access to huge amounts of data that we never had before and it’s only going to continue to grow exponentially,” Fagan said. “Which means that there are only going to continue to be more electronic trails of what’s happening, and—as we’ve seen in recent cases of trying to crack people’s iPhone codes and things like that to get at bits of information—we’re going to continue to see this friction between privacy, security, and global good. That’s where we’re going to continue to see challenges as to where the boundaries on each of these issues.”