Want to know how much website downtime costs, and the impact it can have on your business?
Find out everything you need to know in our new uptime monitoring whitepaper 2021



We recently sent out a customer survey and in which we asked users what they thought of every aspect of StatusCake – the single aspect which came in top in terms of marks was reliability and trustworthiness of alerts. It’s the core of our product so it makes sense that we want to get it right and as the survey showed it was clear we were hitting the nail on the head in almost all cases – but there was a niggling 2% of users who rated it under 5/5 and it’s only right we don’t ignore that.
Over the past few days we’ve been doing micro improvements to try to improve the speed of delivery for downtime alerts and this has manifested as a powerful set of improvements.
Firstly we’ve made changes to the Alert Trigger Rate. The vast majority of our users have set their trigger rate to be around 5 minutes ensuring they don’t get bothered about small periods of downtime, but what exactly happens at 5 minutes?
Previous to now your checks would continue on their normal check rate and upon each check the system would see if the current span between the point of downtime first detected to the current check was greater than trigger rate minutes then send out an alert. Sorry if that sounds confusing – it is! But we’ve simplified things to improve how quickly you get alerted – now when you have a 5 minute trigger rate you will get an alert on that 5th minute, no matter your check rate. As soon as your site is detected as down we will now check every 30 seconds until the trigger rate is hit – we’ll also check 20 seconds before your set trigger rate.
We’ve also introduced a better system for detecting the type of downtime and thus adjusting the confirmation servers as a result. If one system detects downtime as being content match then rather than just attempting once to see if the content match has failed on that test each confirmation server will take 3 attempts at loading the test, this way it’s much more likely to catch any downtime such as micro issues that only appear every so often.
I hope this helps explain a bit of the improvements we’ve rolled out today, but if I’m rambled on and make no sense I’ll summaries – we’ve got even better at sending alerts!
Share this
3 min read In the previous post, we looked at how alert noise is rarely accidental. It’s usually the result of sensible decisions layered over time, until responsibility becomes diffuse and response slows. One of the most persistent assumptions behind this pattern is simple. If enough people are notified, someone will take responsibility. After more than fourteen years
3 min read In a previous post, The Incident Checklist: Reducing Cognitive Load When It Matters Most, we explored how incidents stop being purely technical problems and become human ones. These are moments where decision-making under pressure and cognitive load matter more than perfect root cause analysis. When systems don’t support people clearly in those moments, teams compensate.
4 min read In the previous post, we looked at what happens after detection; when incidents stop being purely technical problems and become human ones, with cognitive load as the real constraint. This post assumes that context. The question here is simpler and more practical. What actually helps teams think clearly and act well once things are already
3 min read In the previous post, we explored how AI accelerates delivery and compresses the time between change and user impact. As velocity increases, knowing that something has gone wrong before users do becomes a critical capability. But detection is only the beginning. Once alerts fire and dashboards light up, humans still have to interpret what’s happening,
5 min read In a recent post, I argued that AI doesn’t fix weak engineering processes; rather it amplifies them. Strong review practices, clear ownership, and solid fundamentals still matter just as much when code is AI-assisted as when it’s not. That post sparked a follow-up question in the comments that’s worth sitting with: With AI speeding things
4 min read Why strong reviews, accountability, and monitoring matter more in an AI-assisted world Artificial intelligence has become the latest fault line in software development. For some teams, it’s an obvious productivity multiplier. For others, it’s viewed with suspicion. A source of low-quality code, unreviewable pull requests, and latent production risk. One concern we hear frequently goes
Find out everything you need to know in our new uptime monitoring whitepaper 2021