We recently sent out a customer survey and in which we asked users what they thought of every aspect of StatusCake – the single aspect which came in top in terms of marks was reliability and trustworthiness of alerts. It’s the core of our product so it makes sense that we want to get it right and as the survey showed it was clear we were hitting the nail on the head in almost all cases – but there was a niggling 2% of users who rated it under 5/5 and it’s only right we don’t ignore that.
Over the past few days we’ve been doing micro improvements to try to improve the speed of delivery for downtime alerts and this has manifested as a powerful set of improvements.
Firstly we’ve made changes to the Alert Trigger Rate. The vast majority of our users have set their trigger rate to be around 5 minutes ensuring they don’t get bothered about small periods of downtime, but what exactly happens at 5 minutes?
Previous to now your checks would continue on their normal check rate and upon each check the system would see if the current span between the point of downtime first detected to the current check was greater than trigger rate minutes then send out an alert. Sorry if that sounds confusing – it is! But we’ve simplified things to improve how quickly you get alerted – now when you have a 5 minute trigger rate you will get an alert on that 5th minute, no matter your check rate. As soon as your site is detected as down we will now check every 30 seconds until the trigger rate is hit – we’ll also check 20 seconds before your set trigger rate.
We’ve also introduced a better system for detecting the type of downtime and thus adjusting the confirmation servers as a result. If one system detects downtime as being content match then rather than just attempting once to see if the content match has failed on that test each confirmation server will take 3 attempts at loading the test, this way it’s much more likely to catch any downtime such as micro issues that only appear every so often.
These are just two of the changes we’ve made to improve our backend check system and I say with some confidence we have one of, if not the most advanced check systems in the space.
I hope this helps explain a bit of the improvements we’ve rolled out today, but if I’m rambled on and make no sense I’ll summarise – we’ve got even better at sending alerts!