Explain Reservoir Sampling.
Reservoir sampling is an algorithm for randomly selecting k items from a data stream of unknown or very large size, ensuring each element has an equal probability of being chosen.
When to Use
Use reservoir sampling when handling large or infinite data streams that can’t fit in memory — such as logs, telemetry data, or clickstreams. It’s ideal for stream analytics, monitoring, or unbiased sampling in distributed systems.
Example
Suppose you’re analyzing billions of website clicks and want a random subset of 1,000 clicks without knowing the total count. Reservoir sampling maintains this subset efficiently as data flows in.
Want to go deeper?
Learn how scalable systems handle massive data streams in Grokking Data Structures & Algorithms for Coding Interviews, Grokking System Design Fundamentals, Grokking the Coding Interview, or try Mock Interviews with ex-FAANG engineers for real interview practice.
Why Is It Important
It’s memory-efficient (O(k)), unbiased, and supports fair sampling from any data stream — a key concept in big data and distributed systems.
Interview Tips
Be ready to explain:
- How each new element has a k/i chance of being selected.
- The role of random replacement.
- Edge cases (when n < k). Discuss time complexity: O(n) time, O(k) space.
Trade-offs
Efficient and scalable, but not deterministic — every run yields different results. Weighted or biased sampling needs modifications.
Pitfalls
Common errors include forgetting to update probabilities after the kth element or using it when the dataset size is fixed (simpler methods would work).
GET YOUR FREE
Coding Questions Catalog
$197

$78
$78