Explain Reservoir Sampling.

Reservoir sampling is an algorithm for randomly selecting k items from a data stream of unknown or very large size, ensuring each element has an equal probability of being chosen.

When to Use

Use reservoir sampling when handling large or infinite data streams that can’t fit in memory — such as logs, telemetry data, or clickstreams. It’s ideal for stream analytics, monitoring, or unbiased sampling in distributed systems.

Example

Suppose you’re analyzing billions of website clicks and want a random subset of 1,000 clicks without knowing the total count. Reservoir sampling maintains this subset efficiently as data flows in.

Want to go deeper?

Learn how scalable systems handle massive data streams in Grokking Data Structures & Algorithms for Coding Interviews, Grokking System Design Fundamentals, Grokking the Coding Interview, or try Mock Interviews with ex-FAANG engineers for real interview practice.

Why Is It Important

It’s memory-efficient (O(k)), unbiased, and supports fair sampling from any data stream — a key concept in big data and distributed systems.

Interview Tips

Be ready to explain:

How each new element has a k/i chance of being selected.
The role of random replacement.
Edge cases (when n < k). Discuss time complexity: O(n) time, O(k) space.

Trade-offs

Efficient and scalable, but not deterministic — every run yields different results. Weighted or biased sampling needs modifications.

Pitfalls

Common errors include forgetting to update probabilities after the kth element or using it when the dataset size is fixed (simpler methods would work).

TAGS

Coding Interview

CONTRIBUTOR

Design Gurus Team

GET YOUR FREE

Coding Questions Catalog