StatusCake

HTTP Error Codes Explained: Common 4xx & 5xx Errors, Root Causes, and Fixes

website downtime

Last updated: 4 February 2026

HTTP error codes are more than just messages shown to users, They’re signals from distributed systems. For engineers, SREs, and platform teams, these codes are often the first indicator of degraded reliability, misconfiguration, or upstream failure.

This guide focuses on how HTTP error codes behave in real production environments: behind CDNs, load balancers, reverse proxies, APIs, and microservices. Rather than just defining error codes, we explain why they happen, what they usually mean operationally, and what to check first when you encounter one.

Quick reference: common HTTP error codes

Code Class Client or Server Typical production meaning
400 4xx Client Malformed or invalid request
401 4xx Client/Auth Missing or invalid authentication
403 4xx Client/Auth Permission denied
404 4xx Client/App Route or resource not found
408 4xx Client/Network Client timed out waiting for response
500 5xx Server Unhandled server-side failure
502 5xx Server/Infra Bad gateway or upstream failure
503 5xx Server/Infra Service unavailable (overload or maintenance)
504 5xx Server/Infra Gateway timeout

Client errors vs server errors & why this matters operationally

At a high level:

  • 4xx errors indicate requests the server chose not to fulfil; whereas
  • 5xx errors indicate requests the server failed to fulfil

Operationally, this distinction is critical:

  • A spike in 4xx errors usually does not indicate an outage.
  • A spike in 5xx errors often does indicate SLO risk

Misclassifying these errors can hide real incidents, trigger unnecessary alerts, and cause retries that amplify failures. Understanding the differences allows teams to design better alerts, retries, and incident responses.

400 Bad Request

What this error means

The server could not process the request because it was malformed or invalid.

Common real-world causes

  • Invalid JSON or request body
  • Missing required parameters
  • Incorrect headers (e.g. Content-Type)

What to check first

  • Request validation logs
  • API schema mismatches
  • Client-side serialization

401 Unauthorized

What this error means

Authentication is required, but the request lacks valid credentials.

Common real-world causes

  • Expired tokens
  • Missing Authorization headers
  • Clock skew affecting token validity

What to check first

  • Identity provider health
  • Token expiry and refresh logic
  • Authentication middleware

403 Forbidden

What this error means

The server understood the request but refuses to authorise it.

Common real-world causes

  • Incorrect IAM or RBAC rules
  • IP allowlists or geo-blocking
  • CDN or WAF rules

What to check first

  • Permission policies
  • Security logs
  • Recent access control changes

404 Not Found

What this error means

The requested resource does not exist or cannot be located.

Common real-world causes

  • Broken links or outdated routes
  • Deployment drift between environments
  • Misconfigured rewrite rules

What to check first

  • Application routing
  • CDN cache behaviour
  • Deployment artifacts

408 Request Timeout

What this error means

The server timed out waiting for the client to send the request.

Common real-world causes

  • Slow or unstable client connections
  • Large payload uploads
  • Network congestion

What to check first

  • Client network metrics
  • Load balancer idle timeout settings
  • Request size limits

500 Internal Server Error

What this error means

The server encountered an unexpected condition that prevented it from fulfilling the request.

Common real-world causes

  • Unhandled exceptions
  • Dependency failures
  • Misconfigured environment variables

What to check first

  • Application error logs
  • Recent deployments
  • Dependency health

502 Bad Gateway

What this error means

A server acting as a gateway or proxy received an invalid response from an upstream server.

In modern architectures, this usually means one service could not successfully communicate with another.

What it usually means in production

  • The upstream service is down or unreachable
  • The upstream returned a malformed response
  • The connection was reset mid-request

This error is most commonly generated by:

  • Load balancers
  • Reverse proxies (Nginx, Envoy)
  • CDNs

Common real-world causes

  • Crashed backend containers or VMs
  • DNS resolution failures between services
  • TLS handshake failures
  • Timeout mismatches between proxy layers

How to diagnose

  • Check upstream service health and error rates
  • Inspect proxy and load balancer logs
  • Compare timeout configurations across layers

Is this transient or serious?

  • Transient: brief spikes during deploys or autoscaling
  • Serious: sustained error rate increase across regions

Prevention and monitoring

  • Health checks on upstream services
  • Synthetic monitoring from multiple regions
  • Alerting on error rate and duration

503 Service Unavailable

What this error means

The server is currently unable to handle the request.

Common real-world causes

  • Planned maintenance
  • Autoscaling lag
  • Resource exhaustion

What to check first

  • Capacity metrics
  • Deployment status
  • Maintenance windows

504 Gateway Timeout

What this error means

A gateway did not receive a timely response from an upstream server.

Common real-world causes

  • Slow backend services
  • Network latency
  • Database query bottlenecks

What to check first

  • Upstream response times
  • Timeout thresholds
  • Slow query logs

Frequently asked questions (FAQ)

What is the difference between 502 and 503?

A 502 indicates an invalid response from an upstream service, while a 503 indicates the service is unavailable (often due to overload or maintenance).

Are 4xx errors bad for SEO?

Generally no. Search engines expect some 4xx responses. Persistent 404s on important pages, however, should be addressed.

Which error codes should trigger alerts?

Most teams alert on sustained increases in 5xx error rates, not individual errors.

Can CDNs change error codes?

Yes. CDNs often generate their own 5xx responses when origin servers fail to respond correctly.

How monitoring tools help

HTTP error codes are most useful when combined with uptime monitoring, error rate alerting, and regional checks. Tools like StatusCake help teams detect, classify, and respond to these failures before users notice.

Share this

More from StatusCake

Alerting Is a Socio-Technical System

3 min read In the previous posts, we’ve looked at how alert noise emerges from design decisions, why notification lists fail to create accountability, and why alerts only work when they’re designed around a clear outcome. Taken together, these ideas point to a broader conclusion. That alerting is not just a technical system, it’s a socio-technical one. Alerting

Designing Alerts for Action

3 min read In the first two posts of this series, we explored how alert noise emerges from design decisions, and why notification lists fail to create accountability when responsibility is unclear. There’s a deeper issue underneath both of those problems. Many alerting systems are designed without being clear about the outcome they’re meant to produce. When teams

A Notification List Is Not a Team

3 min read In the previous post, we looked at how alert noise is rarely accidental. It’s usually the result of sensible decisions layered over time, until responsibility becomes diffuse and response slows. One of the most persistent assumptions behind this pattern is simple. If enough people are notified, someone will take responsibility. After more than fourteen years

Alert Noise Isn’t an Accident — It’s a Design Decision

3 min read In a previous post, The Incident Checklist: Reducing Cognitive Load When It Matters Most, we explored how incidents stop being purely technical problems and become human ones. These are moments where decision-making under pressure and cognitive load matter more than perfect root cause analysis. When systems don’t support people clearly in those moments, teams compensate.

The Incident Checklist: Reducing Cognitive Load When It Matters Most

4 min read In the previous post, we looked at what happens after detection; when incidents stop being purely technical problems and become human ones, with cognitive load as the real constraint. This post assumes that context. The question here is simpler and more practical. What actually helps teams think clearly and act well once things are already

When Things Go Wrong, Systems Should Help Humans — Not Fight Them

3 min read In the previous post, we explored how AI accelerates delivery and compresses the time between change and user impact. As velocity increases, knowing that something has gone wrong before users do becomes a critical capability. But detection is only the beginning. Once alerts fire and dashboards light up, humans still have to interpret what’s happening,

Want to know how much website downtime costs, and the impact it can have on your business?

Find out everything you need to know in our new uptime monitoring whitepaper 2021

*By providing your email address, you agree to our privacy policy and to receive marketing communications from StatusCake.