StatusCake

Google’s outage on the UK’s hottest day of the year

clock measuring uptime

We’ve all heard the jokes about how us Brits can’t handle the hot weather but when the UK hit record highs in July this year, we have to admit that we really did struggle. No more so than our friends over at Google

Google isn’t a stranger to the occasional outage and website downtime, after seeing Google Maps go down in May earlier this year. But this time, the outage was apparently due to the soaring temperatures we were experiencing.

Where did this outage happen? 

Luckily for Google, this wasn’t a globally-felt outage and only UK-based. The Google Cloud data center in the capital initially reported the issue. 

Why did Google experience the outage? 

Google did acknowledge that there was an outage and claimed it was a “cooling failure” which we can only assume was due to the unprecedented 38 degree celsius heatwave.

This was Google’s full description as part of their incident report:

Description:

On Tuesday, 19 July 2022, a cooling failure in one of the buildings that hosts the zone europe-west2-a impacted multiple Google Cloud services. This resulted in some customers experiencing service unavailability for impacted products. The cooling system was repaired at 14:13 PDT, and we restored our services by 2022-07-20, 04:28 PDT. A small number of customers experienced residual effects which were fully mitigated by 2022-07-20, 21:20 PDT when we fully closed the incident. Preliminary root cause has been identified as multiple concurrent failures to our redundant cooling systems within one of the buildings that hosts the europe-west2-a zone for the europe-west2 region.

https://status.cloud.google.com/incidents/XVq5om2XEDSqLtJZUvcH

How long did the outage last for? 

According to Google’s official report, the outage lasted for 1 day and 14 hours which is one of the longest outages that the goliath search engine has ever experienced, albeit not many users were affected.

Was Google the only one to experience an outage due to the heatwave? 

Surprisingly, Oracle also experienced a similar situation due the rising temperatures.

An online report from Oracle stated:

“As a result of unseasonal temperatures in the region, a subset of cooling infrastructure within the UK South (London) Data Centre experienced an issue. This led to a subset of our service infrastructure needed to be powered down to prevent uncontrolled hardware failures,” reads an Oracle Cloud status message

Share this

More from StatusCake

When Code Becomes Cheap: The New Reliability Constraint in Software Engineering

4 min read How AI Is Shifting Software Engineering’s Primary Constraint For most of the history of software engineering, the primary constraint was production. Code was expensive, skilled engineers were scarce, and shipping features required concentrated human effort. Velocity was limited by how fast people could reason, implement, test, and deploy. That constraint shaped everything from team size,

Buy vs Build in the Age of AI (Part 3)

5 min read Autonomous Code, Trust Boundaries, and Why Governance Now Matters More Than Ever In Part 1, we looked at how AI has reduced the cost of building monitoring tools. Then in Part 2, we explored the operational and economic burden of owning them. Now we need to talk about something deeper. Because the real shift isn’t

Buy vs Build in the Age of AI (Part 2)

6 min read The Real Cost of Owning Monitoring Isn’t Code — It’s Everything Else In Part 1, we explored how AI has dramatically reduced the cost of building monitoring tooling. That much is clear. You can scaffold uptime checks quickly, generate alert logic in minutes, and set-up dashboards faster than most teams used to schedule the kickoff

Buy vs Build in the Age of AI (Part 1)

5 min read AI Has Made Building Monitoring Easy. It Hasn’t Made Owning It Any Easier. A few months ago, I spoke to an engineering manager who proudly told me they had rebuilt their monitoring stack over a long weekend. They’d used AI to scaffold synthetic checks. They’d generated alert logic with dynamic thresholds. They’d then wired everything

Alerting Is a Socio-Technical System

3 min read In the previous posts, we’ve looked at how alert noise emerges from design decisions, why notification lists fail to create accountability, and why alerts only work when they’re designed around a clear outcome. Taken together, these ideas point to a broader conclusion. That alerting is not just a technical system, it’s a socio-technical one. Alerting

Designing Alerts for Action

3 min read In the first two posts of this series, we explored how alert noise emerges from design decisions, and why notification lists fail to create accountability when responsibility is unclear. There’s a deeper issue underneath both of those problems. Many alerting systems are designed without being clear about the outcome they’re meant to produce. When teams

Want to know how much website downtime costs, and the impact it can have on your business?

Find out everything you need to know in our new uptime monitoring whitepaper 2021

*By providing your email address, you agree to our privacy policy and to receive marketing communications from StatusCake.