In distributed systems, calls fail. Retries with backoff recover from transient errors; a circuit breaker stops you from hammering a dependency that is genuinely down. They're complementary: retry the blip, break on the outage.
In distributed systems, calls fail. Retries with backoff recover from transient errors; a circuit breaker stops you from hammering a dependency that is genuinely down. They're complementary: retry the blip, break on the outage.
For transient failures (timeouts, brief network blips, 503s), retry — but back off exponentially and add jitter so retries spread out instead of synchronizing into a thundering herd.
attempt 1 → wait ~1s (+ random jitter)
attempt 2 → wait ~2s (+ random jitter)
attempt 3 → wait ~4s (+ random jitter)
→ cap at maxRetries (e.g. 3) and a max delay → don't retry forever
Key cautions:
A circuit breaker tracks failures to a dependency and, after a threshold, trips open — failing fast (or returning a fallback) instead of calling a dead service. After a cooldown it goes half-open to probe recovery.
CLOSED → calls pass through; count failures
too many failures → trip → OPEN
OPEN → fail fast immediately (no call); start cooldown timer
cooldown elapsed → HALF-OPEN
HALF-OPEN → allow a few probe calls
success → CLOSED (recovered) ; failure → back to OPEN
Retry → transient, likely-to-succeed-soon errors (1 slow call)
Circuit breaker → repeated/sustained failures (the dependency is down)
→ use together: breaker prevents retries from piling onto a dead service
Without these, one failing dependency takes the caller down with it: requests pile up on timeouts, retries amplify the load, and the failure cascades across services. Exponential backoff with jitter recovers from blips without a retry storm; capping and idempotency keep retries safe; and the circuit breaker stops wasting resources on a dead dependency and gives it room to recover. Together they turn a dependency failure into a contained, self-healing event.