What resilience patterns protect microservices (circuit breaker, retry, timeout, bulkhead)?

Question

Accepted Answer

In a distributed system, **everything fails eventually**. Resilience patterns stop a single failure from cascading into a full outage.

## The core patterns

- **Timeout** — never wait forever for a response.
- **Retry** — retry transient failures, with backoff + jitter.
- **Circuit breaker** — stop calling a failing service to let it recover.
- **Bulkhead** — isolate resources so one slow dependency can't drown the rest.

## Circuit breaker example

```js
const breaker = new CircuitBreaker(callPaymentService, {
  timeout: 3000,                 // fail the call after 3s
  errorThresholdPercentage: 50,  // open if >50% of calls fail
  resetTimeout: 10000            // after 10s, try one request (half-open)
});
breaker.fallback(() => ({ status: 'queued' })); // graceful degradation
```

## Circuit breaker states

```text
CLOSED ──(failures exceed threshold)──▶ OPEN
  ▲                                       │ (after resetTimeout)
  │ (trial succeeds)                       ▼
  └────────────── HALF-OPEN ◀──────────────┘
                (one trial request)
```

## Bulkhead

```text
[ pool A: 10 threads ]  → payment calls
[ pool B: 10 threads ]  → search calls
If search hangs, it drains pool B only — payments keep working.
```

## Pitfall

**Retries without backoff** amplify load on an already-struggling service (a retry storm). Always add backoff, jitter, and a retry cap.

## Why it matters

These patterns are what turn an inevitable single-service failure into a degraded feature instead of a site-wide outage.

They work together: timeouts bound waiting, circuit breakers stop hammering dead services, bulkheads contain blast radius, and retries recover from blips — omit any one and failures still cascade.