Massive Internet Outage Had A Pretty Dumb Cause: A Typo
Pity the poor Amazon programmer and their errant finger
Tuesday’s massive internet outage wasn’t the fault of malicious hackers or major equipment failure. It was all thanks to a typo.
The partial blackout stemmed from a severe disruption with Amazon Web Services, where a portion of their Simple Storage Service, a cloud storage operation, mysteriously went down for several hours. That rendered numerous sites that relied on S3 inoperative, including portions of popular apps like Slack, Giphy, Trello, as well as a number of news sites, including Vocativ. At the time, the reason was unknown, save that it appeared to be contained in Amazon’s Northern Virginia facility.
That cause, according to Amazon’s postmortem, published Thursday, is, essentially, that a programmer made a typing error. The S3 team was tasked with a standard debugging issue that involved taking a handful of servers offline. Disaster struck when a “command was entered incorrectly,” the company said in a statement, and a programmer inadvertently took down far more systems than intended, forcing Amazon to undergo a massive reset that took several hours before everything was working normally again.
Tuesday’s disruption was the largest known interruption of its kind by Amazon, and the company hopes to avoid such embarrassment in the future, saying it had already implemented safeguards to prevent so many servers from being taken offline together, and that it’s trying to figure out how to make a full reset, if one happens again, will take to complete.