StatusCake

Fortnite, AWS, and the Importance of Monitoring

statuscake

The Battle Royale game Fortnite has become a sensation amongst online gamers in no time at all. To explain it in simple terms, 100 players are simultaneously dropped into a battleground measuring several (in-game) square kilometers, and must proceed alone or as part of a team towards a random central point on the map whilst avoiding or confronting the other players. The last man or team standing takes the top spot and wins the game. It all adds up to an intense and at times hilarious experience that can last around 1-20 minutes.

x3lcwubljaexogivi5wy

The growth in popularity of the game has been epic from a 60,000 players on launch last July to 3,200,000 players in under nine months, and suddenly keeping the game up-and-running was going to require some pretty serious infrastructure.

From day one Epic, the publisher behind Fortnite, has like so many other large businesses such as Airbnb, Unilver, and Netflix relied on Amazon Web Services (AWS) to keep it online.
AWS gives Epic the ability to cope when player numbers spike; the difference in infrastructure workload might be up to ten times difference between the peaks and troughs.

Epic also takes advantage of AWS’s “availability zones”. These 55 zones are designed to ensure web services don’t lag in any one zone. Where one zone fails another simply takes up the baton. Fortnite currently runs across 24 of these zones.

This isn’t to say that AWS and the use of availability zones are infallible. In February of this year Fortnite experienced multiple outages which even AWS’s availability zoning couldn’t prevent.

It’s also worth remembering that whilst many companies such as Epic rely on AWS for its reliability and stability it’s worth remembering that Amazon itself can still have problems.

Just last month on Amazon’s Prime Day the rush for bargains not only brought Amazon down but impacted AWS. Whilst the AWS service itself continued to operate normally, AWS customers were unable to login to their accounts.

More serious however was the four hour outage in AWS’ US-East-1 region in February this year which saw over half of the top 100 internet retailers impacted. Many websites saw the performance of their sites impacted severely (Disney’s store took over 1000% longer to load than normal), many other sites went down completely; the same availability zone having similar issues again in May.

All of this highlights that even if you’re using cloud service providers such as AWS or Google Cloud that monitoring your website is as important as ever.

Share this

More from StatusCake

Buy vs Build in the Age of AI (Part 3)

5 min read Autonomous Code, Trust Boundaries, and Why Governance Now Matters More Than Ever In Part 1, we looked at how AI has reduced the cost of building monitoring tools. Then in Part 2, we explored the operational and economic burden of owning them. Now we need to talk about something deeper. Because the real shift isn’t

Buy vs Build in the Age of AI (Part 2)

6 min read The Real Cost of Owning Monitoring Isn’t Code — It’s Everything Else In Part 1, we explored how AI has dramatically reduced the cost of building monitoring tooling. That much is clear. You can scaffold uptime checks quickly, generate alert logic in minutes, and set-up dashboards faster than most teams used to schedule the kickoff

Buy vs Build in the Age of AI (Part 1)

5 min read AI Has Made Building Monitoring Easy. It Hasn’t Made Owning It Any Easier. A few months ago, I spoke to an engineering manager who proudly told me they had rebuilt their monitoring stack over a long weekend. They’d used AI to scaffold synthetic checks. They’d generated alert logic with dynamic thresholds. They’d then wired everything

Alerting Is a Socio-Technical System

3 min read In the previous posts, we’ve looked at how alert noise emerges from design decisions, why notification lists fail to create accountability, and why alerts only work when they’re designed around a clear outcome. Taken together, these ideas point to a broader conclusion. That alerting is not just a technical system, it’s a socio-technical one. Alerting

Designing Alerts for Action

3 min read In the first two posts of this series, we explored how alert noise emerges from design decisions, and why notification lists fail to create accountability when responsibility is unclear. There’s a deeper issue underneath both of those problems. Many alerting systems are designed without being clear about the outcome they’re meant to produce. When teams

A Notification List Is Not a Team

3 min read In the previous post, we looked at how alert noise is rarely accidental. It’s usually the result of sensible decisions layered over time, until responsibility becomes diffuse and response slows. One of the most persistent assumptions behind this pattern is simple. If enough people are notified, someone will take responsibility. After more than fourteen years

Want to know how much website downtime costs, and the impact it can have on your business?

Find out everything you need to know in our new uptime monitoring whitepaper 2021

*By providing your email address, you agree to our privacy policy and to receive marketing communications from StatusCake.