Tutorials System Design Mastery
Health Checks & Circuit Breakers: Preventing cascading failures
On this page
Preventing Cascading Failures
In a microservices world, if Service A is slow, Service B waits. If Service B waits, its memory fills up, causing it to fail. Soon, the whole company is offline. This is a Cascading Failure.
1. Health Checks
The Load Balancer constantly pings your servers (e.g., GET /health). If a server fails to respond 3 times, the Load Balancer automatically removes it from the pool. This is "Self-Healing" infrastructure.
2. The Circuit Breaker Pattern
If an external service (like a Payment Gateway) is failing, don't keep trying and wasting resources. "Trip" the circuit. For the next 60 seconds, all calls to that service fail immediately (or return a cached default). This gives the failing service time to recover and keeps your app alive.
4. Interview Mastery
Q: "What are the three states of a Circuit Breaker?"
Architect Answer: "1) **Closed**: Everything is normal, requests go through. 2) **Open**: Failures detected, requests fail immediately. 3) **Half-Open**: After a timeout, we allow a *single* test request. If it succeeds, we close the circuit. If it fails, we go back to Open. This state machine is the foundation of library tools like **Polly** for .NET or **Hystrix** for Java."
Sign in to ask a question or upvote helpful answers.
No questions yet — be the first to ask!