Observability - null | AI Insight Note

관찰가능성(Observability)은 시스템의 외부 출력만으로 내부 상태를 이해할 수 있는 정도다. 마이크로서비스와 분산 시스템의 복잡도가 증가하면서 단순한 모니터링을 넘어 세 가지 신호(Three Pillars)를 통합 분석하는 접근이 표준화됐다.

Three Pillars

메트릭 (Metrics)

시간에 따른 수치 데이터. Prometheus + Grafana가 표준.

python

from prometheus_client import Counter, Histogram, start_http_server

request_count = Counter('http_requests_total',
    'Total HTTP requests', ['method', 'endpoint', 'status'])
request_duration = Histogram('http_request_duration_seconds',
    'HTTP request duration', buckets=[0.01, 0.05, 0.1, 0.5, 1.0])

@request_duration.time()
def handle_request(method, endpoint):
    # ... 처리 ...
    request_count.labels(method, endpoint, '200').inc()

로그 (Logs)

이벤트의 시간 순 기록. 구조화된 로그(JSON)가 분석에 유리.

python

import structlog

log = structlog.get_logger()
log.info("user_login", user_id=123, ip="192.168.1.1", success=True)
# {"event": "user_login", "user_id": 123, "ip": "...", "success": true, "timestamp": ...}

트레이스 (Traces)

분산 요청의 전체 흐름 추적. OpenTelemetry + Jaeger/Zipkin.

클라이언트 → [API 게이트웨이 10ms] → [주문 서비스 50ms] → [DB 30ms]
                                    → [결제 서비스 80ms]
Trace ID: abc123으로 전체 흐름 연결

SLI / SLO / SLA

•SLI (Service Level Indicator): 측정 지표 (가용성, 레이턴시, 오류율)
•SLO (Service Level Objective): 목표값 (가용성 99.9%)
•SLA (Service Level Agreement): 계약 (SLO 미달 시 환불 조항)

Observability관찰가능성

Three Pillars

메트릭 (Metrics)

로그 (Logs)

트레이스 (Traces)

SLI / SLO / SLA

관련 개념