Zero-downtime 部署意味着在不中断用户的情况下发布新版本——应用程序在整个部署过程中保持可用。实现它需要谨慎的部署策略、向后兼容的更改、健康检查以及优雅处理正在进行的请求。
Zero-downtime 所需条件
GOAL: deploy a new version with NO user-facing downtime (always-available service):
→ never take the whole service offline to deploy
→ always have healthy instances serving while updating others
→ handle in-flight requests gracefully (don't drop them mid-request)
→ Combine several techniques (below).
关键技术
✓ DEPLOYMENT STRATEGY — rolling, blue-green, or canary (never all-at-once downtime):
→ rolling: update instances gradually (others keep serving)
→ blue-green: switch traffic to the new environment instantly
✓ LOAD BALANCER + HEALTH CHECKS → route traffic only to healthy/ready instances;
new instances join only when READY (readiness checks)
✓ GRACEFUL SHUTDOWN → draining: stop sending new requests to an instance, let it FINISH
in-flight requests, THEN stop it (handle SIGTERM) → no dropped requests
✓ BACKWARD-COMPATIBLE changes → old and new versions coexist during the rollout
(API and DATABASE compatibility — expand-contract migrations)
