Explain Metrics vs Logs vs Traces.
In observability, metrics are numeric time-series that summarize system health, logs are timestamped event records with detailed context, and traces map a request’s path across services to reveal latency and dependencies.
When to Use
- Use metrics to track SLIs like latency, error rates, and throughput.
- Use logs for debugging, auditing, and investigating specific failures.
- Use traces to analyze distributed systems, bottlenecks, or cross-service delays.
Example
If checkout latency spikes, metrics raise an alert, a trace shows the cart → payment bottleneck, and logs reveal a gateway timeout.
Want to master observability and system design for interviews? Explore Grokking System Design Fundamentals, Grokking the System Design Interview, Grokking Database Fundamentals for Tech Interviews, or Mock Interviews with ex-FAANG engineers.
Why Is It Important
Picking the right signal reduces MTTR, avoids data overload, and improves reliability by turning raw data into actionable insights.
Interview Tips
Start with the one-line definition, then explain the “alert → trace → log” workflow. Mention tools like Prometheus, ELK, and OpenTelemetry. Highlight trade-offs like metric cardinality and trace sampling.
Trade-offs
- Metrics: cheap, fast, trendable; limited detail.
- Logs: rich context; costly at scale.
- Traces: show causality; may miss rare events if sampled.
Pitfalls
- Over-alerting on logs.
- Not controlling metric label cardinality.
- Missing trace headers.
- Assuming one signal replaces the others.
GET YOUR FREE
Coding Questions Catalog
$197

$78
$78