StatusCake

clock measuring uptime

Even Facebook faced downtime

At current it seems likely this downtime has been caused by a cache server failure followed by a load balancing issue. This means requests are not getting redirected to an applicable server and as such are failing at the point of request.

Update 09:24 BST: After around 29 minutes of Downtime Facebook has started to recover for many users and our global uptime monitoring servers have started to receive status code 200). There are still some lingering speed issues and some countries are finding the service fluctuate.

Original: If you thought Downtime was just an issue that small independent stores have to deal with then think again. The world’s largest social network is currently experiencing a global blackout. Starting around 8.54am BST the social network has been producing 503 errors (indicating no server available to handle the request).

We got alerted to this issue within seconds and are working on a resolution. I don’t expect it will take any longer than half an hour – Facebook Source

This isn’t the first time the social network has experienced downtime and won’t be the last, though this downtime highlights the importance of having a 3rd party monitoring service and even more importantly a remotely hosted status page.

Share this

More from StatusCake

Buy vs Build in the Age of AI (Part 2)

6 min read The Real Cost of Owning Monitoring Isn’t Code — It’s Everything Else In Part 1, we explored how AI has dramatically reduced the cost of building monitoring tooling. That much is clear. You can scaffold uptime checks quickly, generate alert logic in minutes, and set-up dashboards faster than most teams used to schedule the kickoff

Buy vs Build in the Age of AI (Part 1)

5 min read AI Has Made Building Monitoring Easy. It Hasn’t Made Owning It Any Easier. A few months ago, I spoke to an engineering manager who proudly told me they had rebuilt their monitoring stack over a long weekend. They’d used AI to scaffold synthetic checks. They’d generated alert logic with dynamic thresholds. They’d then wired everything

Alerting Is a Socio-Technical System

3 min read In the previous posts, we’ve looked at how alert noise emerges from design decisions, why notification lists fail to create accountability, and why alerts only work when they’re designed around a clear outcome. Taken together, these ideas point to a broader conclusion. That alerting is not just a technical system, it’s a socio-technical one. Alerting

Designing Alerts for Action

3 min read In the first two posts of this series, we explored how alert noise emerges from design decisions, and why notification lists fail to create accountability when responsibility is unclear. There’s a deeper issue underneath both of those problems. Many alerting systems are designed without being clear about the outcome they’re meant to produce. When teams

A Notification List Is Not a Team

3 min read In the previous post, we looked at how alert noise is rarely accidental. It’s usually the result of sensible decisions layered over time, until responsibility becomes diffuse and response slows. One of the most persistent assumptions behind this pattern is simple. If enough people are notified, someone will take responsibility. After more than fourteen years

Alert Noise Isn’t an Accident — It’s a Design Decision

3 min read In a previous post, The Incident Checklist: Reducing Cognitive Load When It Matters Most, we explored how incidents stop being purely technical problems and become human ones. These are moments where decision-making under pressure and cognitive load matter more than perfect root cause analysis. When systems don’t support people clearly in those moments, teams compensate.

Want to know how much website downtime costs, and the impact it can have on your business?

Find out everything you need to know in our new uptime monitoring whitepaper 2021

*By providing your email address, you agree to our privacy policy and to receive marketing communications from StatusCake.