Want to know how much website downtime costs, and the impact it can have on your business?
Find out everything you need to know in our new uptime monitoring whitepaper 2021
Over 150,000 businesses rely on Amazon’s Simple Storage Service (S3) for backend cloud-based services for their websites. In March of this year, many of those businesses found out how dependent they were on the cloud when Amazon S3 experienced an outage for almost four hours. Many websites slowed to a crawl and some were unable to load at all.
The outage occurred when Amazon was attempting to fix a problem with a payment and billing system and executed a command that was supposed to remove a few servers from one of S3’s subsystems. However, an incorrect command resulted in removing many support servers and disrupted the websites of many S3 users. Restoring those support servers took much longer than expected.
The outage had a major impact on large e-commerce retailers. Of the top 100 online retailers, 54 suffered a reduction in loading time of 20% or more. Of the affected sites, loading speed decreased on average by 29.7 seconds, with sites taking an average of 42.7 seconds to load. In the world of e-commerce, when page loading speed declines, so does revenue. If a site fails to load, it’s the equivalent of closing the doors of a high street retailer.
The main lesson from this incident is not to put all your eggs into one basket. You at least need to have a contingency plan for how to handle an outage at a third-party provider, such as storing backup data and images on local servers that you can use if needed.
It may cost more, but using more than one source for cloud services and connecting them with automatic failovers can keep your site running smoothly. If you take that approach, using two sources, you should not utilize more than 40% of the capacity of each site to ensure you have enough capacity if once source should experience an outage.
Netflix is a good example of the effectiveness of using multiple sources for cloud services. In 2012, an electrical storm caused a power outage at Amazon and Netflix went down for about three hours, costing the company an estimated $600,000 (£480,000) in revenue. After that incident, Netflix decided to implement a strategy to have its cloud services based in 12 locations worldwide that were designed to roll over automatically should one our more locations fail. That proved to be a wise decision, as Netflix did not experience any performance degradation during the recent Amazon S3 outage.
No third-party service can or will guarantee 100% uptime. Most offer 99.99% uptime, but you do need to worry about that 0.01% possibility of downtime. As Murphy’s Law states, anything that can go wrong will go wrong. Be prepared for the worst, and build redundancy into your operations, backup your data, and test for vulnerabilities.
One last lesson you should take away from this incident applies to any critical operation you undertake, not just to potential cloud problems – always double-check before you implement a major action. Had Amazon followed that advice, this incident would not have happened – a typo in the command instruction caused the outage.
Share this
2 min read Read about the latest websites that have experienced downtime including Netflix, Twitter, Facebook and more inside!
2 min read Read about how Google suffered an outage due to the soaring temperatures in the UK in July and how they rectified it right here!
3 min read See the results of our website downtime survey to see some of the most shocking and surprising stats! You won’t be disappointed.
6 min read Find out everything you need to know about Dark Mode and what you can do, as a developer, to make it easier to use.
4 min read Want to know how you can pass Core Web Vitals with the do’s and don’ts of front end development? Find out everything you need to know here!
3 min read Want to know how to get your customers using your SaaS product to ensure they don’t churn? Here’s all our top tips to help you out!
Find out everything you need to know in our new uptime monitoring whitepaper 2021