StatusCake

Lessons Learned From Amazon’s S3 Outage

Over 150,000 businesses rely on Amazon’s Simple Storage Service (S3) for backend cloud-based services for their websites. In March of this year, many of those businesses found out how dependent they were on the cloud when Amazon S3 experienced an outage for almost four hours. Many websites slowed to a crawl and some were unable to load at all.

The outage occurred when Amazon was attempting to fix a problem with a payment and billing system and executed a command that was supposed to remove a few servers from one of S3’s subsystems. However, an incorrect command resulted in removing many support servers and disrupted the websites of many S3 users. Restoring those support servers took much longer than expected.

The outage had a major impact on large e-commerce retailers. Of the top 100 online retailers, 54 suffered a reduction in loading time of 20% or more. Of the affected sites, loading speed decreased on average by 29.7 seconds, with sites taking an average of 42.7 seconds to load. In the world of e-commerce, when page loading speed declines, so does revenue. If a site fails to load, it’s the equivalent of closing the doors of a high street retailer.

The main lesson from this incident is not to put all your eggs into one basket. You at least need to have a contingency plan for how to handle an outage at a third-party provider, such as storing backup data and images on local servers that you can use if needed.

It may cost more, but using more than one source for cloud services and connecting them with automatic failovers can keep your site running smoothly. If you take that approach, using two sources, you should not utilise more than 40% of the capacity of each site to ensure you have enough capacity if once source should experience an outage.

Netflix is a good example of the effectiveness of using multiple sources for cloud services. In 2012, an electrical storm caused a power outage at Amazon and Netflix went down for about three hours, costing the company an estimated $600,000 (£480,000) in revenue. After that incident, Netflix decided to implement a strategy to have its cloud services based in 12 locations worldwide that were designed to roll over automatically should one our more locations fail. That proved to be a wise decision, as Netflix did not experience any performance degradation during the recent Amazon S3 outage.

No third-party service can or will guarantee 100% uptime. Most offer 99.99% uptime, but you do need to worry about that 0.01% possibility of downtime. As Murphy’s Law states, anything that can go wrong will go wrong. Be prepared for the worst, and build redundancy into your operations, backup your data, and test for vulnerabilities.

One last lesson you should take away from this incident applies to any critical operation you undertake, not just to potential cloud problems – always double-check before you implement a major action. Had Amazon followed that advice, this incident would not have happened – a typo in the command instruction caused the outage.

Share this

More from StatusCake

How To

Using Social Media for Affiliate Marketing

7 min read Today, social media uses a wide range of different social networking platforms to help its users with the creation and sharing of ideas, information, personal interests and hobbies by establishing virtual networks. Affiliates can benefit from these groups that use web-based applications to communicate, interact and connect.

man-and-woman-on-website
How To

How to Improve User Experience

4 min read One of the biggest eCommerce questions – how do you improve user experience? Here are all the top tips for a better UX design to help you drive revenue.

woman-astronaut-flying
Product & Updates

How Page Speed Affects SEO

2 min read We all know page speed is crucial but not just for you brand – it affects SEO so hugely that it can make your pages drop in ranking.

short-reads

How to Reduce Bounce Rate

5 min read Learn what bounce rate is and how it is measured, what a good bounce rate looks like, and the actionable steps you can take to improve the bounce rate on your website.

man-on-bike
short-reads

What Causes Website Downtime?

3 min read Website downtime no longer needs to be this big mystery that keeps you up at night. Here are the top causes for website downtime AND how to fix them.

short-reads

Over 40% of Online Advertisements are Too Large and Slow Down Websites

2 min read Large ads have been a major issue with online publishers who have been struggling with how to curtail what many call “fat ads.” The oversized ads have a major impact on the ability of website visitors to see them as they can’t view them if they don’t load properly.