StatusCake

Downtime Alert Improvements!

website monitoring

We recently sent out a customer survey and in which we asked users what they thought of every aspect of StatusCake – the single aspect which came in top in terms of marks was reliability and trustworthiness of alerts. It’s the core of our product so it makes sense that we want to get it right and as the survey showed it was clear we were hitting the nail on the head in almost all cases – but there was a niggling 2% of users who rated it under 5/5 and it’s only right we don’t ignore that.

Over the past few days we’ve been doing micro improvements to try to improve the speed of delivery for downtime alerts and this has manifested as a powerful set of improvements.

The changes we have put into place and why.

Firstly we’ve made changes to the Alert Trigger Rate. The vast majority of our users have set their trigger rate to be around 5 minutes ensuring they don’t get bothered about small periods of downtime, but what exactly happens at 5 minutes?

Previous to now your checks would continue on their normal check rate and upon each check the system would see if the current span between the point of downtime first detected to the current check was greater than trigger rate minutes then send out an alert. Sorry if that sounds confusing – it is! But we’ve simplified things to improve how quickly you get alerted – now when you have a 5 minute trigger rate you will get an alert on that 5th minute, no matter your check rate. As soon as your site is detected as down we will now check every 30 seconds until the trigger rate is hit – we’ll also check 20 seconds before your set trigger rate.

We’ve also introduced a better system for detecting the type of downtime and thus adjusting the confirmation servers as a result. If one system detects downtime as being content match then rather than just attempting once to see if the content match has failed on that test each confirmation server will take 3 attempts at loading the test, this way it’s much more likely to catch any downtime such as micro issues that only appear every so often.

I hope this helps explain a bit of the improvements we’ve rolled out today, but if I’m rambled on and make no sense I’ll summaries – we’ve got even better at sending alerts!

Share this

More from StatusCake

When Code Becomes Cheap: The New Reliability Constraint in Software Engineering

4 min read How AI Is Shifting Software Engineering’s Primary Constraint For most of the history of software engineering, the primary constraint was production. Code was expensive, skilled engineers were scarce, and shipping features required concentrated human effort. Velocity was limited by how fast people could reason, implement, test, and deploy. That constraint shaped everything from team size,

Buy vs Build in the Age of AI (Part 3)

5 min read Autonomous Code, Trust Boundaries, and Why Governance Now Matters More Than Ever In Part 1, we looked at how AI has reduced the cost of building monitoring tools. Then in Part 2, we explored the operational and economic burden of owning them. Now we need to talk about something deeper. Because the real shift isn’t

Buy vs Build in the Age of AI (Part 2)

6 min read The Real Cost of Owning Monitoring Isn’t Code — It’s Everything Else In Part 1, we explored how AI has dramatically reduced the cost of building monitoring tooling. That much is clear. You can scaffold uptime checks quickly, generate alert logic in minutes, and set-up dashboards faster than most teams used to schedule the kickoff

Buy vs Build in the Age of AI (Part 1)

5 min read AI Has Made Building Monitoring Easy. It Hasn’t Made Owning It Any Easier. A few months ago, I spoke to an engineering manager who proudly told me they had rebuilt their monitoring stack over a long weekend. They’d used AI to scaffold synthetic checks. They’d generated alert logic with dynamic thresholds. They’d then wired everything

Alerting Is a Socio-Technical System

3 min read In the previous posts, we’ve looked at how alert noise emerges from design decisions, why notification lists fail to create accountability, and why alerts only work when they’re designed around a clear outcome. Taken together, these ideas point to a broader conclusion. That alerting is not just a technical system, it’s a socio-technical one. Alerting

Designing Alerts for Action

3 min read In the first two posts of this series, we explored how alert noise emerges from design decisions, and why notification lists fail to create accountability when responsibility is unclear. There’s a deeper issue underneath both of those problems. Many alerting systems are designed without being clear about the outcome they’re meant to produce. When teams

Want to know how much website downtime costs, and the impact it can have on your business?

Find out everything you need to know in our new uptime monitoring whitepaper 2021

*By providing your email address, you agree to our privacy policy and to receive marketing communications from StatusCake.