Want to know how much website downtime costs, and the impact it can have on your business?
Find out everything you need to know in our new uptime monitoring whitepaper 2021



Virgin Money Giving experienced a website crash during the London Marathon, and that crash was both embarrassing and costly. In the short term, the crash prevented people from providing support to the marathon participants promptly. In the long term, Virgin has taken a hit in brand reputation that may take a while to recover from.
Of course, Virgin is not the only organization to fall victim to website crashes or slowdown. During Black Friday last year, many large online retailers suffered the same fate. Even a degradation in site loading time can have detrimental effects as serious as a website crash. Customers will abandon a site that is slow to load and take their business elsewhere, and search engines will downgrade the ranking of sites that have a track record of frequent crashes or slow loading time.
You need to be proactive to keep your site up and running. Here are four steps that you should take:
Most businesses know when they will experience peak traffic based on previous experience. If you are on online retailer, you know what volume you experienced on previous peak days such as Black Friday, and this should be your starting point for planning for how much traffic your site should be capable of handling to allow for a major spike in traffic.
Once you determine the peak traffic flow that you wish to accommodate, identify any bottlenecks on your website that might prevent you from handling it. Then, load test each to see if any of them fail, and make appropriate changes to eliminate those bottlenecks. Be sure to do this well in advance of when you expect your peak traffic to hit.
After evaluating the individual potential bottlenecks, conduct a complete load and stress test on your site and apps using the maximum anticipated amount of traffic plus an additional amount of traffic to give you a margin of safety. A complete professional load test will simulate peak traffic amounts easily and quickly and will show you exactly what failed if your site does not pass the test. Once your site passes the final check, you can be confident that your site is ready.
Sometimes, circumstances beyond your control can thwart even the most comprehensive plan, and your site will still crash. Therefore, it’s best to have a plan to help mitigate the damage if your site does go down. Consider using a website monitoring service so that you will know promptly if your site does crash. Prepare a communications plan so that you can inform your visitors and customers why your site went down, what steps you are taking to get the site back online, and how long you expect it will take for you to resume normal operations.
When your website goes down, it’s the equivalent of a brick-and-mortar store locking its front door. Taking steps to keep your website up and running during peak traffic flows is crucial in maintaining your reputation and keeping your customers from going elsewhere.
Share this
3 min read In the previous post, we looked at how alert noise is rarely accidental. It’s usually the result of sensible decisions layered over time, until responsibility becomes diffuse and response slows. One of the most persistent assumptions behind this pattern is simple. If enough people are notified, someone will take responsibility. After more than fourteen years
3 min read In a previous post, The Incident Checklist: Reducing Cognitive Load When It Matters Most, we explored how incidents stop being purely technical problems and become human ones. These are moments where decision-making under pressure and cognitive load matter more than perfect root cause analysis. When systems don’t support people clearly in those moments, teams compensate.
4 min read In the previous post, we looked at what happens after detection; when incidents stop being purely technical problems and become human ones, with cognitive load as the real constraint. This post assumes that context. The question here is simpler and more practical. What actually helps teams think clearly and act well once things are already
3 min read In the previous post, we explored how AI accelerates delivery and compresses the time between change and user impact. As velocity increases, knowing that something has gone wrong before users do becomes a critical capability. But detection is only the beginning. Once alerts fire and dashboards light up, humans still have to interpret what’s happening,
5 min read In a recent post, I argued that AI doesn’t fix weak engineering processes; rather it amplifies them. Strong review practices, clear ownership, and solid fundamentals still matter just as much when code is AI-assisted as when it’s not. That post sparked a follow-up question in the comments that’s worth sitting with: With AI speeding things
4 min read Why strong reviews, accountability, and monitoring matter more in an AI-assisted world Artificial intelligence has become the latest fault line in software development. For some teams, it’s an obvious productivity multiplier. For others, it’s viewed with suspicion. A source of low-quality code, unreviewable pull requests, and latent production risk. One concern we hear frequently goes
Find out everything you need to know in our new uptime monitoring whitepaper 2021