
Want to know how much website downtime costs, and the impact it can have on your business?
Find out everything you need to know in our new uptime monitoring whitepaper 2021



In a previous post, The Incident Checklist: Reducing Cognitive Load When It Matters Most, we explored how incidents stop being purely technical problems and become human ones. These are moments where decision-making under pressure and cognitive load matter more than perfect root cause analysis. When systems don’t support people clearly in those moments, teams compensate.
They add process.
They add people.
They add noise.
Alerting is one of the most visible places where this shows up.
Most engineering leaders have seen some version of this. An alert fires. It goes to:
On paper, this looks robust. Many eyes. Many paths. And plenty of redundancy. In practice however, response slows. People hesitate. Everyone waits to see who will act first.
The problem isn’t that the alert was wrong. It’s that responsibility was unclear.
Many teams can recall an incident where dozens of people were notified at once.
Messages start appearing in chat. A few engineers begin investigating quietly. Others wait. Some assume someone more senior has it covered. Everyone can see the problem, but no one is certain they’re the one who should act first.
Minutes then pass. Not because the team lacks skill, but because the system hasn’t made ownership obvious. This isn’t a failure of individuals. It’s a predictable outcome when responsibility is diffuse.
Alert noise is rarely the result of carelessness. It’s almost always the result of good intentions layered over time.
A name gets added after one incident where someone wasn’t notified. Another after a near-miss. A team mailing list after an escalation felt slow. Each decision makes sense in isolation.
Many years later, the alert goes everywhere; but action goes nowhere. What began as a safety measure quietly becomes a source of delay and confusion.
When alerting doesn’t feel safe, teams compensate socially.
If you’re not confident that: (i) the signal is reliable; (ii) the right person will see it; or (iii) someone will take ownership quickly, then widening the audience feels like the least risky option.
Noise is easier to tolerate than silence. Being interrupted feels better than missing something critical. From that perspective, over-alerting isn’t negligence; it’s a coping mechanism.
In human factors research, this pattern is well understood. Under uncertainty, people optimise for safety as they perceive it in the moment; even when that creates new risks elsewhere.
Broadcast alerting comes with predictable side effects.
None of this shows up in alert metrics. It shows up in incident timelines, stress levels, and postmortems that say, “We weren’t sure who should act.”
Imagine a cockpit where every warning light alerts every crew member at once, with no clear division of responsibility. Everyone sees the issue. No one is sure who should act first. That wouldn’t be considered a robust safety system. It would be recognised as a design flaw. Software systems are no different.
Teams often try to fix alert noise by tuning thresholds, reducing volume, or adding documentation. Those things can help, but they don’t address the underlying issue.
Alerting systems encode assumptions about ownership, trust, and how humans behave under pressure. When alerts go to “everyone,” the system is implicitly saying: “We don’t know who should act here.” That’s not a notification problem. It’s an organisational one.
High-performing teams design alerting with a different goal in mind – not maximum reach, nor maximum redundancy, but clear action.
They optimise for:
That often means fewer recipients, clearer ownership, and signals people trust enough to move on without debate. It can feel riskier at first. Over time, it creates calmer, faster responses, and far less noise.
In the next post, we’ll look at why large notification lists rarely create accountability, and why being “included” is not the same as being responsible. Because when everyone is alerted, no one is responsible. And that’s a design choice worth examining.
Share this

3 min read The allure of OpenClaw is undeniable. You deploy a highly autonomous, self-hosted AI agent, give it access to your repositories and inboxes, and watch it reason through complex workflows while you sleep. It is the dream of the ultimate 10x developer tool realized. But as any veteran DevOps engineer will tell you: running an LLM-backed
7 min read There are cloud outages, and then there are us-east-1 outages. That distinction matters because failures in AWS’s Northern Virginia region rarely feel like ordinary regional incidents. They tend instead to expose something larger and more uncomfortable: too much of the modern internet still behaves as though one place is an acceptable concentration point for infrastructure,
7 min read Artificial intelligence is making software easier to produce. That much is already obvious. Code that once took hours to scaffold can now be drafted in minutes. Boilerplate, integration logic, tests, refactors and small internal tools can be generated with startling speed. In some cases, even substantial pieces of implementation can be assembled quickly enough to
10 min read Whilst AI has compressed the visible stages of software delivery; requirements, validation, review and release discipline have not disappeared. They have been pushed into automation, runtime and governance. The real risk is not that the lifecycle is dead, but that organisations start acting as if accountability died with it. There is a now-familiar story about
4 min read How AI Is Shifting Software Engineering’s Primary Constraint For most of the history of software engineering, the primary constraint was production. Code was expensive, skilled engineers were scarce, and shipping features required concentrated human effort. Velocity was limited by how fast people could reason, implement, test, and deploy. That constraint shaped everything from team size,
5 min read Autonomous Code, Trust Boundaries, and Why Governance Now Matters More Than Ever In Part 1, we looked at how AI has reduced the cost of building monitoring tools. Then in Part 2, we explored the operational and economic burden of owning them. Now we need to talk about something deeper. Because the real shift isn’t
Find out everything you need to know in our new uptime monitoring whitepaper 2021