Observability 是从系统的外部输出——通过 logs、metrics 和 traces——来理解系统内部状态的能力。它对于运维、调试和维护系统(特别是分布式系统)至关重要,因为你无法管理看不见的东西。
可观测性的三大支柱
LOGS → timestamped records of events (what happened) → detailed, for debugging specific issues
METRICS → numerical measurements over time (CPU, latency, request rate, error rate) →
aggregate health/performance; dashboards; alerting
TRACES → follow a request's path through the system (across services) → understand flows,
find bottlenecks/failures in DISTRIBUTED systems (which service was slow?)
→ together: understand WHAT happened, the OVERALL state, and the PATH of requests.
