StatusCake

Google’s outage on the UK’s hottest day of the year

clock measuring uptime

We’ve all heard the jokes about how us Brits can’t handle the hot weather but when the UK hit record highs in July this year, we have to admit that we really did struggle. No more so than our friends over at Google

Google isn’t a stranger to the occasional outage and website downtime, after seeing Google Maps go down in May earlier this year. But this time, the outage was apparently due to the soaring temperatures we were experiencing.

Where did this outage happen? 

Luckily for Google, this wasn’t a globally-felt outage and only UK-based. The Google Cloud data center in the capital initially reported the issue. 

Why did Google experience the outage? 

Google did acknowledge that there was an outage and claimed it was a “cooling failure” which we can only assume was due to the unprecedented 38 degree celsius heatwave.

This was Google’s full description as part of their incident report:

Description:

On Tuesday, 19 July 2022, a cooling failure in one of the buildings that hosts the zone europe-west2-a impacted multiple Google Cloud services. This resulted in some customers experiencing service unavailability for impacted products. The cooling system was repaired at 14:13 PDT, and we restored our services by 2022-07-20, 04:28 PDT. A small number of customers experienced residual effects which were fully mitigated by 2022-07-20, 21:20 PDT when we fully closed the incident. Preliminary root cause has been identified as multiple concurrent failures to our redundant cooling systems within one of the buildings that hosts the europe-west2-a zone for the europe-west2 region.

https://status.cloud.google.com/incidents/XVq5om2XEDSqLtJZUvcH

How long did the outage last for? 

According to Google’s official report, the outage lasted for 1 day and 14 hours which is one of the longest outages that the goliath search engine has ever experienced, albeit not many users were affected.

Was Google the only one to experience an outage due to the heatwave? 

Surprisingly, Oracle also experienced a similar situation due the rising temperatures.

An online report from Oracle stated:

“As a result of unseasonal temperatures in the region, a subset of cooling infrastructure within the UK South (London) Data Centre experienced an issue. This led to a subset of our service infrastructure needed to be powered down to prevent uncontrolled hardware failures,” reads an Oracle Cloud status message

Share this

More from StatusCake

A Notification List Is Not a Team

3 min read In the previous post, we looked at how alert noise is rarely accidental. It’s usually the result of sensible decisions layered over time, until responsibility becomes diffuse and response slows. One of the most persistent assumptions behind this pattern is simple. If enough people are notified, someone will take responsibility. After more than fourteen years

Alert Noise Isn’t an Accident — It’s a Design Decision

3 min read In a previous post, The Incident Checklist: Reducing Cognitive Load When It Matters Most, we explored how incidents stop being purely technical problems and become human ones. These are moments where decision-making under pressure and cognitive load matter more than perfect root cause analysis. When systems don’t support people clearly in those moments, teams compensate.

The Incident Checklist: Reducing Cognitive Load When It Matters Most

4 min read In the previous post, we looked at what happens after detection; when incidents stop being purely technical problems and become human ones, with cognitive load as the real constraint. This post assumes that context. The question here is simpler and more practical. What actually helps teams think clearly and act well once things are already

When Things Go Wrong, Systems Should Help Humans — Not Fight Them

3 min read In the previous post, we explored how AI accelerates delivery and compresses the time between change and user impact. As velocity increases, knowing that something has gone wrong before users do becomes a critical capability. But detection is only the beginning. Once alerts fire and dashboards light up, humans still have to interpret what’s happening,

When AI Speeds Up Change, Knowing First Becomes the Constraint

5 min read In a recent post, I argued that AI doesn’t fix weak engineering processes; rather it amplifies them. Strong review practices, clear ownership, and solid fundamentals still matter just as much when code is AI-assisted as when it’s not. That post sparked a follow-up question in the comments that’s worth sitting with: With AI speeding things

Make Your Engineering Processes Resilient. Not Your Opinions About AI

4 min read Why strong reviews, accountability, and monitoring matter more in an AI-assisted world Artificial intelligence has become the latest fault line in software development.  For some teams, it’s an obvious productivity multiplier.  For others, it’s viewed with suspicion.  A source of low-quality code, unreviewable pull requests, and latent production risk. One concern we hear frequently goes

Want to know how much website downtime costs, and the impact it can have on your business?

Find out everything you need to know in our new uptime monitoring whitepaper 2021

*By providing your email address, you agree to our privacy policy and to receive marketing communications from StatusCake.