Strategies for designing low-latency, high-throughput data storage solutions

Question

Design Gurus · Accepted Answer

Low-latency, high-throughput data storage is the practice of architecting data persistence layers that simultaneously minimize response time (sub-millisecond to single-digit milliseconds per operation) and maximize the number of operations processed per second (tens of thousands to millions). These two goals often conflict: techniques that reduce latency (keeping everything in memory) increase cost, while techniques that increase throughput (batching writes) can increase latency for individual operations. In system design interviews, storage architecture is the most consequential decision you make—the wrong database choice or storage engine creates bottlenecks that no amount of caching, scaling, or optimization can fix. When an interviewer asks "How would you store this data?", they are testing whether you understand the trade-offs between storage engines, memory hierarchies, replication strategies, and data modeling decisions that determine whether your system meets its latency and throughput SLOs.

Key Takeaways

The storage hierarchy dictates latency: L1 cache (1ns), RAM (100ns), SSD (100μs), HDD (10ms), network to same-AZ (0.5ms), cross-region (50–150ms). Every storage decision is a trade-off between speed and cost along this hierarchy.
Two storage engine families dominate: B-tree (read-optimized, used by PostgreSQL, MySQL, MongoDB) and LSM-tree (write-optimized, used by Cassandra, RocksDB, LevelDB, DynamoDB). Choosing wrong means either slow reads or slow writes at scale.
In-memory databases (Redis, Memcached) provide sub-millisecond latency but are limited by RAM capacity and cost. Use them for hot data (sessions, caches, leaderboards), not as primary storage for large datasets.
Sharding distributes data across nodes to increase both throughput (more nodes handle more operations) and capacity (each node stores a subset). The shard key determines data distribution, query routing, and hotspot risk.
Tiered storage (hot/warm/cold) is the production-standard cost optimization: Redis for hot data, PostgreSQL/DynamoDB for warm data, S3/Glacier for cold data. Each tier trades latency for cost savings.

The Storage Latency Hierarchy

Every storage decision trades speed for cost. This hierarchy should guide every data placement decision in your system design.

Storage Level Typical Latency Cost (relative) Capacity Use Case
L1/L2 CPU cache ~1–10ns Highest (on-chip) KB–MB CPU-level; not directly addressable
RAM (in-memory DB) ~100ns Very high (~$5/GB/month) GB–TB Sessions, caches, leaderboards, counters
NVMe SSD ~100μs Moderate (~$0.10/GB/month) TB Primary databases, hot storage
Standard SSD (EBS gp3) ~1ms Lower (~$0.08/GB/month) TB Standard database volumes
HDD ~10ms Low (~$0.02/GB/month) PB Archival, sequential reads, logs
Network (same AZ) ~0.5ms Transfer costs N/A Service-to-service calls
Network (cross-region) ~50–150ms Higher transfer N/A Multi-region replication

Interview application: "The feed service requires p99 read latency under 10ms. At this target, the data must be served from either RAM or SSD—HDD is eliminated. I would cache the most frequently accessed feeds in Redis (sub-ms) and store the full feed data in DynamoDB on SSD (single-digit ms). Cold feed data older than 30 days moves to S3 for cost savings."

Storage Engine Internals: B-Tree vs LSM-Tree

This is the deepest technical layer interviewers probe. Understanding why PostgreSQL reads fast and Cassandra writes fast comes down to their storage engines.

B-Tree (Read-Optimized)

How it works: Data is stored in sorted pages organized as a balanced tree. Reads follow the tree from root to leaf, typically requiring 3–4 disk reads for billions of records. Writes update pages in-place, which requires finding the correct page, modifying it, and flushing to disk.

Performance: Reads are fast (O(log N) with cached upper tree levels). Writes are slower because each write requires a random disk I/O to update the correct page, plus write-ahead log (WAL) for durability.

Used by: PostgreSQL, MySQL (InnoDB), MongoDB (WiredTiger), SQL Server.

Best for: Read-heavy workloads, point lookups, range queries, analytical queries.

LSM-Tree (Write-Optimized)

How it works: Writes go to an in-memory buffer (memtable). When the buffer fills, it is flushed to disk as an immutable sorted file (SSTable). Background compaction merges SSTables to maintain read performance. Reads may need to check multiple SSTables, making reads slower than B-tree.

Performance: Writes are fast (sequential disk I/O—appending to the memtable and flushing sorted files). Reads are slower because they may need to check the memtable plus multiple SSTables before finding the data. Bloom filters mitigate this by quickly eliminating SSTables that do not contain the key.

Used by: Cassandra, RocksDB, LevelDB, HBase, DynamoDB (internal storage), ScyllaDB.

Best for: Write-heavy workloads, time-series data, event logging, high-ingest-rate systems.

Dimension B-Tree LSM-Tree
Read performance Fast (O(log N), in-place) Slower (check multiple SSTables)
Write performance Slower (random I/O, in-place update) Fast (sequential I/O, append-only)
Write amplification Lower Higher (compaction rewrites data)
Space amplification Lower (one copy of data) Higher (multiple SSTables before compaction)
Range queries Excellent (sorted pages) Good (sorted SSTables, but cross-file)
Typical databases PostgreSQL, MySQL, MongoDB Cassandra, RocksDB, HBase, DynamoDB

Interview application: "Our system ingests 100,000 events per second with relatively few reads. I would use an LSM-tree-based database like Cassandra because write throughput is the bottleneck. LSM-trees convert random writes into sequential I/O, achieving 5–10x higher write throughput than B-trees at this scale. The trade-off is slower point reads, which I would mitigate with Bloom filters and a Redis caching layer for hot data."

In-Memory Databases: When RAM Is the Answer

In-memory databases store the entire dataset in RAM, providing sub-millisecond latency for both reads and writes. They are the fastest storage option but the most expensive per GB.

Redis: Supports rich data structures (strings, lists, sets, sorted sets, hashes, streams). Provides optional persistence (RDB snapshots, AOF log). Supports pub/sub, Lua scripting, and transactions. Single-threaded event loop achieves 100,000+ operations per second per instance.

Memcached: Pure key-value cache. Multi-threaded, slightly lower latency for simple get/set operations. No persistence, no data structures beyond strings. Simpler to operate than Redis.

When to use in-memory storage: Session data (sub-ms lookup for every authenticated request). Caching hot data from primary databases (95% cache hit ratio reduces DB load 20x). Leaderboards and counters (sorted sets in Redis provide O(log N) ranked retrieval). Rate limiting (atomic increment with TTL). Real-time features (presence indicators, typing indicators, live counts).

When NOT to use in-memory storage: Primary storage for datasets larger than available RAM (cost-prohibitive). Data requiring ACID transactions across multiple keys (use PostgreSQL). Long-term storage (RAM is volatile—power loss means data loss without persistence).

Sharding: Scaling Beyond a Single Node

A single database node has finite throughput and storage capacity. Sharding distributes data across multiple nodes to increase both.

Tier	Storage	Latency	Cost	Access Pattern
Hot	Redis / Memcached	<1ms	Highest	Every request; most recent data
Warm	PostgreSQL / DynamoDB (SSD)	1–10ms	Moderate	Frequent queries; active dataset
Cold	S3 Standard	50–100ms	Low	Occasional access; recent archives
Archive	S3 Glacier Deep Archive	Minutes–hours	Lowest	Compliance, regulatory retention

Strategies for designing low-latency, high-throughput data storage solutions

Key Takeaways

The Storage Latency Hierarchy

Storage Engine Internals: B-Tree vs LSM-Tree

B-Tree (Read-Optimized)

LSM-Tree (Write-Optimized)

In-Memory Databases: When RAM Is the Answer

Sharding: Scaling Beyond a Single Node

Shard Key Selection

Consistent Hashing

Tiered Storage: Optimizing Cost Without Sacrificing Latency

Write Optimization Patterns

Write-Ahead Log (WAL)

Batch Writes

Append-Only Storage

Frequently Asked Questions

How do I choose between B-tree and LSM-tree storage engines?

When should I use an in-memory database like Redis?

What is the most important sharding decision?

How does consistent hashing improve sharding?

What is tiered storage and why does it matter?

How do I achieve high write throughput?

What is write amplification and why does it matter?

How do I reduce read latency for a database?

Should I shard my database from the start?

How do I discuss storage architecture in a system design interview?

TL;DR

Storage Level	Typical Latency	Cost (relative)	Capacity	Use Case
L1/L2 CPU cache	~1–10ns	Highest (on-chip)	KB–MB	CPU-level; not directly addressable
RAM (in-memory DB)	~100ns	Very high (~$5/GB/month)	GB–TB	Sessions, caches, leaderboards, counters
NVMe SSD	~100μs	Moderate (~$0.10/GB/month)	TB	Primary databases, hot storage
Standard SSD (EBS gp3)	~1ms	Lower (~$0.08/GB/month)	TB	Standard database volumes
HDD	~10ms	Low (~$0.02/GB/month)	PB	Archival, sequential reads, logs
Network (same AZ)	~0.5ms	Transfer costs	N/A	Service-to-service calls
Network (cross-region)	~50–150ms	Higher transfer	N/A	Multi-region replication

Dimension	B-Tree	LSM-Tree
Read performance	Fast (O(log N), in-place)	Slower (check multiple SSTables)
Write performance	Slower (random I/O, in-place update)	Fast (sequential I/O, append-only)
Write amplification	Lower	Higher (compaction rewrites data)
Space amplification	Lower (one copy of data)	Higher (multiple SSTables before compaction)
Range queries	Excellent (sorted pages)	Good (sorted SSTables, but cross-file)
Typical databases	PostgreSQL, MySQL, MongoDB	Cassandra, RocksDB, HBase, DynamoDB