Rate limiting caps how many requests a client can make in a time window. You apply it at multiple layers because each sees something different, and you key it by whatever identifies the abuser.
Rate limiting caps how many requests a client can make in a time window. You apply it at multiple layers because each sees something different, and you key it by whatever identifies the abuser.
429 Too Many Requests with Retry-After so clients back off politely instead of hammering.# Define a shared-memory zone keyed by client IP.
# rate=10r/s = the steady refill rate (token bucket).
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
server {
location /api/ {
# burst=20: allow a short spike of 20 queued requests
# nodelay: serve the burst immediately instead of spacing it out
limit_req zone=api burst=20 nodelay;
# Return 429 (not the default 503) so clients see a rate-limit signal
limit_req_status 429;
proxy_pass http://backend;
}
}
Here each IP refills at 10 requests/second, may burst up to 20, and anything beyond gets a 429.
Rate limiting is your cheapest, always-on defense against Layer 7 floods, credential stuffing, and runaway scrapers. Layering it (edge for volume, proxy for the origin, app for business logic) and keying it correctly stops abusers while real users — and legitimate bursts — pass through untouched. Setting limits from real baselines is what keeps it from becoming an outage of your own making.