Observabilitas mapanku ing telu pilar — log, metrik, lan jejak — lan tujuane yaiku mangsuli "apa sing salah lan sapa sebabe" kanggo sistem sing gedhe banget ora bisa dicek tangan. Ing skala, strategine yaiku babagan korelasi, sampling, lan biaya.
Observabilitas mapanku ing telu pilar — log, metrik, lan jejak — lan tujuane yaiku mangsuli "apa sing salah lan sapa sebabe" kanggo sistem sing gedhe banget ora bisa dicek tangan. Ing skala, strategine yaiku babagan korelasi, sampling, lan biaya.
| Pilar | Mangsuli | Alat |
|---|
| Metrik | Apa ana sing salah? (rate, latency) | Prometheus, Grafana |
| Jejak | Endi ing aliran? | OpenTelemetry, Jaeger |
| Log | Apa tenan sing kelakon? | ELK, Loki |
Metrics alert ─▶ trace pinpoints the slow service ─▶ logs explain the cause
(broad) (path) (detail)
Trace/correlation ID kudu nerusake liwat label metrik, baris log, lan span, supaya bisa pivot antarane iki.
log line: level=error trace_id=abc123 service=payments msg="gateway timeout"
^^^^^^^^^^^^^^^ same id appears in the trace + metrics
✓ Standardize: OpenTelemetry across all services
✓ Use structured (JSON) logs — queryable, not grep-only
✓ Sample traces (e.g. keep all errors + 1% of success) to control cost
✓ Define SLOs and alert on symptoms (latency/error rate), not noise
✓ RED/USE method for dashboards (Rate, Errors, Duration)
Ngelog kabeh ing 100% ora terjangkau lan ndheleng sinyal. Sample, struktur, lan alert ing SLO sabanjure.
Karo atusan layanan, sampeyan ora bisa SSH lan ndelok — observabilitas mau cara mung kanggo ngerti perilaku produksi.
Strategi menang yaiku korelasi, sampel, lan SLO-driven: katonake masalah nyata kanthi cepet tanpa ngrusak sampeyan ing panyimpenan telemetri utawa ndheleng on-call ing gangguan.