监控工具的格局按支柱分类——metrics、logs、traces——加上综合性托管平台。选择归结为自托管 vs 托管,由团队规模、预算和规模驱动。
按支柱分类的工具
text
METRICS Prometheus → pull-based scraping, time-series DB, PromQL query language
Grafana → dashboards on top of Prometheus (and many other sources)
LOGS ELK → Elasticsearch + Logstash + Kibana (powerful, heavy to run)
Loki → "Prometheus for logs": cheap, indexes labels not full text
TRACES Jaeger → distributed tracing, OpenTelemetry-compatible
Tempo → trace backend that pairs with Grafana/Loki
ALL-IN-ONE Datadog → managed metrics + logs + traces + APM in one product
