これらは可観測性の3つの柱です。異なる質問に答えます:メトリクスは何かが間違っていることを教え、ログは何が起こったかを教え、トレースは分散フロー内のどこで時間またはエラーが発生したかを教えます。
なぜ重要なのか
text
METRICS aggregate numbers over time (counters, gauges, histograms)
→ cheap, low cardinality, great for trends & ALERTING
→ e.g. error rate = 2%, p99 latency = 800ms
LOGS discrete, timestamped events with detail (often structured JSON)
→ rich context for DEBUGGING a specific request
→ e.g. {"level":"error","user":123,"msg":"payment declined"}
TRACES the path of one request across services, with timing per span
→ shows latency BREAKDOWN and where a call fails
→ e.g. checkout 800ms = api 50ms + db 700ms + email 50ms
