What is Read Amplification?

Read amplification is when a system reads more physical data than requested by the user, meaning a single logical read can trigger multiple disk or storage reads.

When to Use

Read amplification is most common in LSM-tree databases (like RocksDB, LevelDB) and SSDs, where performance trade-offs prioritize write efficiency or data organization at the expense of extra reads.

Example

If retrieving 1KB of data requires 4KB of actual disk reads, the read amplification factor is 4×.

Want to master concepts like this?

Explore Grokking System Design Fundamentals, sharpen problem-solving with Grokking the System Design Interview, or practice with Mock Interviews with ex-FAANG engineers.

Why Is It Important

High read amplification increases latency, consumes more I/O bandwidth, and reduces device lifespan. At scale, this directly impacts system reliability and cost.

Interview Tips

Explain it simply: “Read amplification happens when more data is read than requested.” Then mention LSM-trees or SSDs as examples, and highlight the trade-off with write amplification for bonus points.

Trade-offs

Optimizing for fast writes usually increases read amplification. Reducing it may require more memory (for caching) or higher write costs. You can’t minimize read, write, and space amplification all at once.