Want to know how much website downtime costs, and the impact it can have on your business?
Find out everything you need to know in our new uptime monitoring whitepaper 2021



We’ve all heard the jokes about how us Brits can’t handle the hot weather but when the UK hit record highs in July this year, we have to admit that we really did struggle. No more so than our friends over at Google.
Google isn’t a stranger to the occasional outage and website downtime, after seeing Google Maps go down in May earlier this year. But this time, the outage was apparently due to the soaring temperatures we were experiencing.
Luckily for Google, this wasn’t a globally-felt outage and only UK-based. The Google Cloud data center in the capital initially reported the issue.
Google did acknowledge that there was an outage and claimed it was a “cooling failure” which we can only assume was due to the unprecedented 38 degree celsius heatwave.
This was Google’s full description as part of their incident report:
Description:
On Tuesday, 19 July 2022, a cooling failure in one of the buildings that hosts the zone europe-west2-a impacted multiple Google Cloud services. This resulted in some customers experiencing service unavailability for impacted products. The cooling system was repaired at 14:13 PDT, and we restored our services by 2022-07-20, 04:28 PDT. A small number of customers experienced residual effects which were fully mitigated by 2022-07-20, 21:20 PDT when we fully closed the incident. Preliminary root cause has been identified as multiple concurrent failures to our redundant cooling systems within one of the buildings that hosts the europe-west2-a zone for the europe-west2 region.
https://status.cloud.google.com/incidents/XVq5om2XEDSqLtJZUvcH
According to Google’s official report, the outage lasted for 1 day and 14 hours which is one of the longest outages that the goliath search engine has ever experienced, albeit not many users were affected.
Surprisingly, Oracle also experienced a similar situation due the rising temperatures.
An online report from Oracle stated:
“As a result of unseasonal temperatures in the region, a subset of cooling infrastructure within the UK South (London) Data Centre experienced an issue. This led to a subset of our service infrastructure needed to be powered down to prevent uncontrolled hardware failures,” reads an Oracle Cloud status message
Share this
4 min read How AI Is Shifting Software Engineering’s Primary Constraint For most of the history of software engineering, the primary constraint was production. Code was expensive, skilled engineers were scarce, and shipping features required concentrated human effort. Velocity was limited by how fast people could reason, implement, test, and deploy. That constraint shaped everything from team size,
5 min read Autonomous Code, Trust Boundaries, and Why Governance Now Matters More Than Ever In Part 1, we looked at how AI has reduced the cost of building monitoring tools. Then in Part 2, we explored the operational and economic burden of owning them. Now we need to talk about something deeper. Because the real shift isn’t
6 min read The Real Cost of Owning Monitoring Isn’t Code — It’s Everything Else In Part 1, we explored how AI has dramatically reduced the cost of building monitoring tooling. That much is clear. You can scaffold uptime checks quickly, generate alert logic in minutes, and set-up dashboards faster than most teams used to schedule the kickoff
5 min read AI Has Made Building Monitoring Easy. It Hasn’t Made Owning It Any Easier. A few months ago, I spoke to an engineering manager who proudly told me they had rebuilt their monitoring stack over a long weekend. They’d used AI to scaffold synthetic checks. They’d generated alert logic with dynamic thresholds. They’d then wired everything
3 min read In the previous posts, we’ve looked at how alert noise emerges from design decisions, why notification lists fail to create accountability, and why alerts only work when they’re designed around a clear outcome. Taken together, these ideas point to a broader conclusion. That alerting is not just a technical system, it’s a socio-technical one. Alerting
3 min read In the first two posts of this series, we explored how alert noise emerges from design decisions, and why notification lists fail to create accountability when responsibility is unclear. There’s a deeper issue underneath both of those problems. Many alerting systems are designed without being clear about the outcome they’re meant to produce. When teams
Find out everything you need to know in our new uptime monitoring whitepaper 2021