这些是可观测性的三大支柱。它们回答不同的问题:指标告诉你存在问题,日志告诉你发生了什么,追踪告诉你在分布式流中哪里消耗了时间或出现了错误。
三大支柱
text
METRICS aggregate numbers over time (counters, gauges, histograms)
→ cheap, low cardinality, great for trends & ALERTING
→ e.g. error rate = 2%, p99 latency = 800ms
LOGS discrete, timestamped events with detail (often structured JSON)
→ rich context for DEBUGGING a specific request
→ e.g. {"level":"error","user":123,"msg":"payment declined"}
TRACES the path of one request across services, with timing per span
→ shows latency BREAKDOWN and where a call fails
→ e.g. checkout 800ms = api 50ms + db 700ms + email 50ms
