Home System Utility A Typo Took Amazon S3 Offline

A Typo Took Amazon S3 Offline

0
A Typo Took Amazon S3 Offline

[ad_1]

Companies around the world rely on Amazon Web Services (AWS) to form the backbone of their presence on the Internet. So when AWS suffers a problem, everybody notices.

On Tuesday, February 28, the Amazon Simple Storage Service (S3), which is the cloud storage part of AWS, was disrupted. Websites and online services started disappearing offline and spewing out errors to visitors. It took Amazon several hours to get a handle on the problem, but we now know the cause: a typo.

In a summary of the disruption(Opens in a new window), Amazon explained that the S3 engineering team was looking into an issue causing the S3 billing system to function slowly. In order to fix the problem a small number of servers for a subsystem of S3 needed to be taken offline. However, when the command to take them offline was input, a mistake was made. This resulted in a lot more servers being taken down.

That in itself shouldn’t have caused a major outage, but some of these additional servers were key to a couple of other S3 subsystems functioning. One of those was the index subsystem, which handles metadata and location information for all S3 objects in the US-EAST-1 region. The other was the placement subsystem, which handles storage allocation for new S3 objects.

Both subsystems required rebooting, and while that was happening other parts of AWS started to fail, including the Amazon S3 console, Elastic Compute Cloud (EC2), Elastic Block Stores (EBS), AWS Lambda, and the S3 APIs couldn’t be accessed. So basically, a complete meltdown of the system taking several hours to fix all because of a mistyped command.

Recommended by Our Editors

Amazon’s Snowmobile Transports 100PB of Data Using a Truck

It should come as no surprise that Amazon is now going to make several changes to the way in which AWS operates in future to avoid this ever happening again. But it just goes to show, it doesn’t matter how big and robust a service becomes, it only takes one human with admin privileges to bring it all crashing down.

[ad_2]

Source link : https://www.pcmag.com/news/a-typo-took-amazon-s3-offline