StatusCake

Lessons Learned From Amazon’s S3 Outage

ssl monitoring

Over 150,000 businesses rely on Amazon’s Simple Storage Service (S3) for backend cloud-based services for their websites. In March of this year, many of those businesses found out how dependent they were on the cloud when Amazon S3 experienced an outage for almost four hours. Many websites slowed to a crawl and some were unable to load at all.

The outage occurred when Amazon was attempting to fix a problem with a payment and billing system and executed a command that was supposed to remove a few servers from one of S3’s subsystems. However, an incorrect command resulted in removing many support servers and disrupted the websites of many S3 users. Restoring those support servers took much longer than expected.

The outage had a major impact on large e-commerce retailers. Of the top 100 online retailers, 54 suffered a reduction in loading time of 20% or more. Of the affected sites, loading speed decreased on average by 29.7 seconds, with sites taking an average of 42.7 seconds to load. In the world of e-commerce, when page loading speed declines, so does revenue. If a site fails to load, it’s the equivalent of closing the doors of a high street retailer.

The main lesson from this incident is not to put all your eggs into one basket. You at least need to have a contingency plan for how to handle an outage at a third-party provider, such as storing backup data and images on local servers that you can use if needed.

It may cost more, but using more than one source for cloud services and connecting them with automatic failovers can keep your site running smoothly. If you take that approach, using two sources, you should not utilize more than 40% of the capacity of each site to ensure you have enough capacity if once source should experience an outage.

Netflix is a good example of the effectiveness of using multiple sources for cloud services. In 2012, an electrical storm caused a power outage at Amazon and Netflix went down for about three hours, costing the company an estimated $600,000 (£480,000) in revenue. After that incident, Netflix decided to implement a strategy to have its cloud services based in 12 locations worldwide that were designed to roll over automatically should one our more locations fail. That proved to be a wise decision, as Netflix did not experience any performance degradation during the recent Amazon S3 outage.

No third-party service can or will guarantee 100% uptime. Most offer 99.99% uptime, but you do need to worry about that 0.01% possibility of downtime. As Murphy’s Law states, anything that can go wrong will go wrong. Be prepared for the worst, and build redundancy into your operations, backup your data, and test for vulnerabilities.

One last lesson you should take away from this incident applies to any critical operation you undertake, not just to potential cloud problems – always double-check before you implement a major action. Had Amazon followed that advice, this incident would not have happened – a typo in the command instruction caused the outage.

Share this

More from StatusCake

DNS
Engineering

What’s new in Chrome Devtools?

3 min read For any web developer, DevTools provides an irreplaceable aid to debugging code in all common browsers. Both Safari and Firefox offer great solutions in terms of developer tools, however in this post I will be talking about the highlights of the most recent features in my personal favourite browser for coding, Chrome DevTools. For something

Engineering

How To Create An Animated 3D Button From Scratch

6 min read There has certainly been a trend recently of using animations to elevate user interfaces and improve user experiences, and the more subtle versions of these are known as micro animations. Micro animations are an understated way of adding a little bit of fun to everyday user interactions such as hovering over a link, or clicking

Want to know how much website downtime costs, and the impact it can have on your business?

Find out everything you need to know in our new uptime monitoring whitepaper 2021

*By providing your email address, you agree to our privacy policy and to receive marketing communications from StatusCake.