StatusCake

Website downtime: the cost, the impact, and the solution

cost of downtime

Planned website downtime Vs. unplanned website downtime

Unplanned downtime is the hardest situation to prepare for as it could happen at any time. The only way that you can plan for that eventuality is by having a website monitoring solution in place that can alert you as soon as your website goes down. If you’re the first to know, then it makes it easier for you to do something about it before your customers start reacting, especially on social media. 

Internally, it’s important to have someone on call to handle the situation (a developer/product manager/customer support manager), and the hope is that the problem itself has been experienced before, which makes it an easier problem to fix.

Planning for website downtime

The best way to prepare for planned downtime is to put up a maintenance notice to customers as soon as possible to let them know of any inconveniences they might suffer from during the downtime. Update your public status page to let your customers know in real-time what is happening with the downtime and the times in which the website should be down and when it is estimated to be back up. 

Internally, ensure you and your team have a ‘run-through’ of the work that needs to be carried out before the downtime on a sandbox so you know that it will work on a live system. It’s also important to have a support team on standby to handle any queries that customers may have during the downtime. Getting every department on the “same page” for the messaging is also important – you want one aligned message with the same tone through all communications whether that’s on the public status page or in an email. 

The cost of website downtime

The major cost is from the loss of customers (and therefore revenue) that are unforgiving to the downtime event. If it’s for a long period of time, especially. Similar costs around marketing, and potentially tech costs for over-working servers, or services you might use within your solution. 

The solution to this: have a set plan, and checklist of the work that needs to be carried out should unplanned downtime arise. Ensure your team are aware of the plan and is all able to do it themselves if needed. It almost has to become second nature to your team so that they are aware of how to fix any issues during a downtime event. 

Budgeting for website downtime

Having any sort of reserve, whether that’s a resource of internal staff or ability to outsource means of fixing the downtime, that can be used towards any periods of downtime would be advisable. Budgets for downtime are very much dependent on the state of the system in question that perhaps might experience downtime. As a technical person in a company, you’d have to tally up the weak points of the architecture in order to best judge the costs that might arise due to downtime.

Developers, especially, sometimes consider a problem to be a lot simpler to solve before getting stuck in. With that mindset, you often end up in situations where you’ll attempt a change during the downtime phase. This can elongate the length of that downtime, and often lead to panic-led decisions because the overall plan was not well planned. 

Preventing the cost of website downtime 

The number one way to mitigate downtime expenses is by having website monitoring in place to alert you to downtime as soon as it happens. This reduces the risk of lost revenue and customers, alongside any customer complaints and/or social media backlash. The common misconception is that it will be easy to notice when your website is down, however, this doesn’t account for the fact that downtime isn’t binary – if you’re up in the UK, this doesn’t mean your website is up across the 30 other locations you’re supposed to be live in.

Having an architecture that is able to support itself during downtime is also a huge way to mitigate downtime. If there is a way in which you can have a failover in place to carry on business functions as usual then this is strongly advised. On top of a well-structured architecture, having a team that is competent, and able to return systems to working health is a great advantage to have. They are the very people who engineer the system in the first place thus making them best suited for the role of the maintainer. 

A lack of proper setup, contingencies, and people who know how to solve any issues that arise are often the case for unplanned downtime. If there are not enough people on the team that know how to best address an issue, the downtime can last a lot longer as you’re waiting on the right people to solve the issues being faced. Bad architectural choices in your system can also make this worse as it complicates how easy a downtime event might be to fix. Trying to follow practices and standards known globally throughout the technology industry helps keep complexity in check regardless of the service you’re offering, which can lead to better fix times. The ultimate solution? Dare we say it – StatusCake.

Share this

More from StatusCake

Buy vs Build in the Age of AI (Part 1)

5 min read AI Has Made Building Monitoring Easy. It Hasn’t Made Owning It Any Easier. A few months ago, I spoke to an engineering manager who proudly told me they had rebuilt their monitoring stack over a long weekend. They’d used AI to scaffold synthetic checks. They’d generated alert logic with dynamic thresholds. They’d then wired everything

Alerting Is a Socio-Technical System

3 min read In the previous posts, we’ve looked at how alert noise emerges from design decisions, why notification lists fail to create accountability, and why alerts only work when they’re designed around a clear outcome. Taken together, these ideas point to a broader conclusion. That alerting is not just a technical system, it’s a socio-technical one. Alerting

Designing Alerts for Action

3 min read In the first two posts of this series, we explored how alert noise emerges from design decisions, and why notification lists fail to create accountability when responsibility is unclear. There’s a deeper issue underneath both of those problems. Many alerting systems are designed without being clear about the outcome they’re meant to produce. When teams

A Notification List Is Not a Team

3 min read In the previous post, we looked at how alert noise is rarely accidental. It’s usually the result of sensible decisions layered over time, until responsibility becomes diffuse and response slows. One of the most persistent assumptions behind this pattern is simple. If enough people are notified, someone will take responsibility. After more than fourteen years

Alert Noise Isn’t an Accident — It’s a Design Decision

3 min read In a previous post, The Incident Checklist: Reducing Cognitive Load When It Matters Most, we explored how incidents stop being purely technical problems and become human ones. These are moments where decision-making under pressure and cognitive load matter more than perfect root cause analysis. When systems don’t support people clearly in those moments, teams compensate.

The Incident Checklist: Reducing Cognitive Load When It Matters Most

4 min read In the previous post, we looked at what happens after detection; when incidents stop being purely technical problems and become human ones, with cognitive load as the real constraint. This post assumes that context. The question here is simpler and more practical. What actually helps teams think clearly and act well once things are already

Want to know how much website downtime costs, and the impact it can have on your business?

Find out everything you need to know in our new uptime monitoring whitepaper 2021

*By providing your email address, you agree to our privacy policy and to receive marketing communications from StatusCake.