Explain Watermarks vs Checkpoints.
Watermarks track event-time progress to manage late-arriving events, while checkpoints are periodic snapshots of state for fault tolerance in streaming systems.
When to Use
Use watermarks when dealing with out-of-order or late events, ensuring correct windowed results. Use checkpoints when fault tolerance is critical, so progress is not lost during failures.
Example
If you’re counting user clicks per minute, a watermark ensures late clicks still count, while a checkpoint saves the count so a system crash won’t reset it.
Want to master these concepts?
Explore Grokking System Design Fundamentals, Grokking the System Design Interview, Grokking Database Fundamentals for Tech Interviews, or practice with Mock Interviews with ex-FAANG engineers.
Why Is It Important
Without watermarks, late data may be discarded, skewing accuracy. Without checkpoints, crashes can erase progress. Together, they ensure accuracy and reliability in streaming.
Interview Tips
Be ready to explain both clearly:
- Watermarks = event-time & late arrivals.
- Checkpoints = state & recovery. Use a simple click-count example to stand out.
Trade-offs
Watermarks add latency by waiting for late data but improve completeness. Checkpoints add overhead in performance but provide resilience.
Pitfalls
Common mistakes:
- Confusing watermarks with checkpoints.
- Setting watermarks too strict, dropping valid data.
- Skipping checkpoints to save resources, risking data loss.
GET YOUR FREE
Coding Questions Catalog
$197

$78
$78