A key microservices benefit is scaling each service independently to match its own load, instead of scaling the whole app. Finding bottlenecks is then a matter of measuring per-service and per-hop.
A key microservices benefit is scaling each service independently to match its own load, instead of scaling the whole app. Finding bottlenecks is then a matter of measuring per-service and per-hop.
# Kubernetes HPA: scale orders on CPU
minReplicas: 3
maxReplicas: 20
metric: cpu
targetUtilization: 70 # add pods when avg CPU > 70%
1. Metrics: which service has high latency / saturation? (RED/USE)
2. Traces: which SPAN in the request is slow?
3. Drill in: DB query? lock? N+1 calls? GC pause?
Gateway ──┤ Orders ──┤ Payments ████████████ ← 80% of latency here
Inventory ─┤
⚠️ Chatty synchronous calls (fan-out per request)
⚠️ Shared/overloaded database
⚠️ Missing or cold cache
⚠️ Unbounded retries amplifying load
Scaling a service whose bottleneck is a shared database just moves more load onto the DB — scale the actual constraint, not the symptom.
Independent scaling lets you spend capacity precisely where the load is, which is far cheaper than scaling a monolith wholesale.
But scaling blindly wastes money and can worsen things; measuring per-service metrics and per-hop traces is what tells you the real constraint to fix.