Where and how do you configure rate limiting?

Question

Accepted Answer

Rate limiting caps how many requests a client can make in a time window. You apply it at **multiple layers** because each sees something different, and you key it by **whatever identifies the abuser**.

## Where to configure it

- **Edge / CDN** — the first line, before traffic reaches you. Cheapest to enforce (the attacker never touches your origin) but coarse, usually keyed by IP.
- **Reverse proxy** (nginx, Envoy) — protects the origin from floods that pass the CDN, with fine control over zones and bursts.
- **Application layer** — the smartest layer: it knows the **user, API key, or token**, so it can apply per-account quotas and protect expensive business operations a proxy cannot see.

## How to key and shape it

- **Key by** IP (anonymous), API key (partners), or authenticated user (per-account fairness).
- **Token bucket vs leaky bucket** — token bucket allows short **bursts** by accumulating tokens, then steadies out; leaky bucket smooths to a constant rate. Most APIs want token bucket so legitimate bursts are not punished.
- **Pick limits from baseline + headroom** — measure normal peak per client, then set the cap comfortably above it so real users never hit it.
- **Return `429 Too Many Requests` with `Retry-After`** so clients back off politely instead of hammering.

## Example: nginx limit_req

```nginx
# Define a shared-memory zone keyed by client IP.
# rate=10r/s = the steady refill rate (token bucket).
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;

server {
    location /api/ {
        # burst=20: allow a short spike of 20 queued requests
        # nodelay: serve the burst immediately instead of spacing it out
        limit_req zone=api burst=20 nodelay;

# Return 429 (not the default 503) so clients see a rate-limit signal
        limit_req_status 429;

proxy_pass http://backend;
    }
}
```

Here each IP refills at 10 requests/second, may burst up to 20, and anything beyond gets a `429`.

## Why it matters

Rate limiting is your cheapest, always-on defense against Layer 7 floods, credential stuffing, and runaway scrapers. Layering it (edge for volume, proxy for the origin, app for business logic) and keying it correctly stops abusers while real users — and legitimate bursts — pass through untouched. Setting limits from real baselines is what keeps it from becoming an outage of your own making.