Explain Compaction and GC Impact on Latency.

Compaction and garbage collection (GC) are background processes in systems like Apache Cassandra that merge storage files and reclaim memory. Both can cause latency spikes by consuming CPU, disk I/O, and pausing application threads.

When to Use

Monitor compaction and GC in LSM-tree databases (e.g., Cassandra, RocksDB) and JVM-based microservices under high write load.

Compaction merges SSTables to optimize reads, while GC cleans memory to prevent out-of-memory issues. Both are vital but can temporarily slow queries or writes.

Example

A Cassandra node receiving heavy writes triggers memtable flushes. During major compaction or full GC, latency can rise from milliseconds to seconds. This happens because background I/O competes with live requests.

Want to master real-world performance tuning and interview-ready design?

Explore Grokking System Design Fundamentals, Grokking the Coding Interview, or Mock Interviews with ex-FAANG engineers.

Why Is It Important

Latency-sensitive apps (e.g., fintech, gaming) rely on predictable performance.

Poorly tuned compaction or GC leads to tail-latency spikes, hurting user experience. Optimizing them ensures smoother throughput and higher system availability.

Interview Tips

Explain that both compaction and GC are background tasks that affect P99 latency.

Discuss monitoring tools (like nodetool compactionstats and GC logs), JVM tuning, and heap sizing trade-offs.

Trade-offs

Higher compaction throughput reduces SSTable buildup but consumes more I/O. Larger heaps reduce GC frequency but increase pause duration. The key is balancing throughput vs. latency.

Pitfalls

Common mistakes include leaving default settings, ignoring disk I/O limits, or blaming GC for all latency issues. Proper monitoring and load testing reveal the true cause.

Recommended Resources: