Strategies for designing low-latency, high-throughput data storage solutions
Low-latency, high-throughput data storage is the practice of architecting data persistence layers that simultaneously minimize response time (sub-millisecond to single-digit milliseconds per operation) and maximize the number of operations processed per second (tens of thousands to millions). These two goals often conflict: techniques that reduce latency (keeping everything in memory) increase cost, while techniques that increase throughput (batching writes) can increase latency for individual operations. In system design interviews, storage architecture is the most consequential decision you make—the wrong database choice or storage engine creates bottlenecks that no amount of caching, scaling, or optimization can fix. When an interviewer asks "How would you store this data?", they are testing whether you understand the trade-offs between storage engines, memory hierarchies, replication strategies, and data modeling decisions that determine whether your system meets its latency and throughput SLOs.
Key Takeaways
- The storage hierarchy dictates latency: L1 cache (~1ns), RAM (~100ns), SSD (~100μs), HDD (~10ms), network to same-AZ (~0.5ms), cross-region (~50–150ms). Every storage decision is a trade-off between speed and cost along this hierarchy.
- Two storage engine families dominate: B-tree (read-optimized, used by PostgreSQL, MySQL, MongoDB) and LSM-tree (write-optimized, used by Cassandra, RocksDB, LevelDB, DynamoDB). Choosing wrong means either slow reads or slow writes at scale.
- In-memory databases (Redis, Memcached) provide sub-millisecond latency but are limited by RAM capacity and cost. Use them for hot data (sessions, caches, leaderboards), not as primary storage for large datasets.
- Sharding distributes data across nodes to increase both throughput (more nodes handle more operations) and capacity (each node stores a subset). The shard key determines data distribution, query routing, and hotspot risk.
- Tiered storage (hot/warm/cold) is the production-standard cost optimization: Redis for hot data, PostgreSQL/DynamoDB for warm data, S3/Glacier for cold data. Each tier trades latency for cost savings.
The Storage Latency Hierarchy
Every storage decision trades speed for cost. This hierarchy should guide every data placement decision in your system design.
| Storage Level | Typical Latency | Cost (relative) | Capacity | Use Case |
|---|---|---|---|---|
| L1/L2 CPU cache | ~1–10ns | Highest (on-chip) | KB–MB | CPU-level; not directly addressable |
| RAM (in-memory DB) | ~100ns | Very high (~$5/GB/month) | GB–TB | Sessions, caches, leaderboards, counters |
| NVMe SSD | ~100μs | Moderate (~$0.10/GB/month) | TB | Primary databases, hot storage |
| Standard SSD (EBS gp3) | ~1ms | Lower (~$0.08/GB/month) | TB | Standard database volumes |
| HDD | ~10ms | Low (~$0.02/GB/month) | PB | Archival, sequential reads, logs |
| Network (same AZ) | ~0.5ms | Transfer costs | N/A | Service-to-service calls |
| Network (cross-region) | ~50–150ms | Higher transfer | N/A | Multi-region replication |
Interview application: "The feed service requires p99 read latency under 10ms. At this target, the data must be served from either RAM or SSD—HDD is eliminated. I would cache the most frequently accessed feeds in Redis (sub-ms) and store the full feed data in DynamoDB on SSD (single-digit ms). Cold feed data older than 30 days moves to S3 for cost savings."
Storage Engine Internals: B-Tree vs LSM-Tree
This is the deepest technical layer interviewers probe. Understanding why PostgreSQL reads fast and Cassandra writes fast comes down to their storage engines.
B-Tree (Read-Optimized)
How it works: Data is stored in sorted pages organized as a balanced tree. Reads follow the tree from root to leaf, typically requiring 3–4 disk reads for billions of records. Writes update pages in-place, which requires finding the correct page, modifying it, and flushing to disk.
Performance: Reads are fast (O(log N) with cached upper tree levels). Writes are slower because each write requires a random disk I/O to update the correct page, plus write-ahead log (WAL) for durability.
Used by: PostgreSQL, MySQL (InnoDB), MongoDB (WiredTiger), SQL Server.
Best for: Read-heavy workloads, point lookups, range queries, analytical queries.
LSM-Tree (Write-Optimized)
How it works: Writes go to an in-memory buffer (memtable). When the buffer fills, it is flushed to disk as an immutable sorted file (SSTable). Background compaction merges SSTables to maintain read performance. Reads may need to check multiple SSTables, making reads slower than B-tree.
Performance: Writes are fast (sequential disk I/O—appending to the memtable and flushing sorted files). Reads are slower because they may need to check the memtable plus multiple SSTables before finding the data. Bloom filters mitigate this by quickly eliminating SSTables that do not contain the key.
Used by: Cassandra, RocksDB, LevelDB, HBase, DynamoDB (internal storage), ScyllaDB.
Best for: Write-heavy workloads, time-series data, event logging, high-ingest-rate systems.
| Dimension | B-Tree | LSM-Tree |
|---|---|---|
| Read performance | Fast (O(log N), in-place) | Slower (check multiple SSTables) |
| Write performance | Slower (random I/O, in-place update) | Fast (sequential I/O, append-only) |
| Write amplification | Lower | Higher (compaction rewrites data) |
| Space amplification | Lower (one copy of data) | Higher (multiple SSTables before compaction) |
| Range queries | Excellent (sorted pages) | Good (sorted SSTables, but cross-file) |
| Typical databases | PostgreSQL, MySQL, MongoDB | Cassandra, RocksDB, HBase, DynamoDB |
Interview application: "Our system ingests 100,000 events per second with relatively few reads. I would use an LSM-tree-based database like Cassandra because write throughput is the bottleneck. LSM-trees convert random writes into sequential I/O, achieving 5–10x higher write throughput than B-trees at this scale. The trade-off is slower point reads, which I would mitigate with Bloom filters and a Redis caching layer for hot data."
In-Memory Databases: When RAM Is the Answer
In-memory databases store the entire dataset in RAM, providing sub-millisecond latency for both reads and writes. They are the fastest storage option but the most expensive per GB.
Redis: Supports rich data structures (strings, lists, sets, sorted sets, hashes, streams). Provides optional persistence (RDB snapshots, AOF log). Supports pub/sub, Lua scripting, and transactions. Single-threaded event loop achieves 100,000+ operations per second per instance.
Memcached: Pure key-value cache. Multi-threaded, slightly lower latency for simple get/set operations. No persistence, no data structures beyond strings. Simpler to operate than Redis.
When to use in-memory storage: Session data (sub-ms lookup for every authenticated request). Caching hot data from primary databases (95% cache hit ratio reduces DB load 20x). Leaderboards and counters (sorted sets in Redis provide O(log N) ranked retrieval). Rate limiting (atomic increment with TTL). Real-time features (presence indicators, typing indicators, live counts).
When NOT to use in-memory storage: Primary storage for datasets larger than available RAM (cost-prohibitive). Data requiring ACID transactions across multiple keys (use PostgreSQL). Long-term storage (RAM is volatile—power loss means data loss without persistence).
Sharding: Scaling Beyond a Single Node
A single database node has finite throughput and storage capacity. Sharding distributes data across multiple nodes to increase both.
Shard Key Selection
The shard key determines how data is distributed. It is the most consequential sharding decision—a bad shard key creates hotspots that negate the benefits of sharding.
Hash-based sharding: Apply a hash function to the shard key and distribute based on hash value. Provides even distribution but makes range queries expensive (must query all shards). Used by DynamoDB, Cassandra (default), MongoDB (hashed shard key).
Range-based sharding: Assign key ranges to shards (users A–M on shard 1, N–Z on shard 2). Supports efficient range queries but risks uneven distribution and hotspots on popular ranges.
Interview application: "I would shard the user activity table by user_id using hash-based sharding. This ensures even distribution—no single shard gets disproportionate load because user IDs are uniformly distributed after hashing. The trade-off is that queries like 'find all users who logged in today' require a scatter-gather across all shards. For this analytics query, I would use a separate denormalized read model rather than querying the sharded operational database."
Consistent Hashing
Standard hash-based sharding (hash % N) requires rehashing all data when nodes are added or removed. Consistent hashing minimizes data movement: only K/N keys need to move when a node is added (K = total keys, N = total nodes).
DynamoDB, Cassandra, and Akamai CDN use consistent hashing with virtual nodes. Each physical node is assigned multiple positions on the hash ring, improving distribution balance.
Tiered Storage: Optimizing Cost Without Sacrificing Latency
Production systems use multiple storage tiers, placing data where it provides the best latency-to-cost ratio based on access frequency.
| Tier | Storage | Latency | Cost | Access Pattern |
|---|---|---|---|---|
| Hot | Redis / Memcached | <1ms | Highest | Every request; most recent data |
| Warm | PostgreSQL / DynamoDB (SSD) | 1–10ms | Moderate | Frequent queries; active dataset |
| Cold | S3 Standard | 50–100ms | Low | Occasional access; recent archives |
| Archive | S3 Glacier Deep Archive | Minutes–hours | Lowest | Compliance, regulatory retention |
Interview application: "User messages from the last 7 days are in Redis (hot tier) for sub-ms delivery. Messages from the last 90 days are in DynamoDB (warm tier) for single-digit ms access when users scroll back. Messages older than 90 days are in S3 (cold tier) loaded on demand. Messages older than 2 years are in Glacier (archive tier) for compliance. Lifecycle policies automate the transitions."
Write Optimization Patterns
Write-Ahead Log (WAL)
Write data to a sequential log before updating the primary data structure. If the system crashes, the log replays to recover uncommitted changes. PostgreSQL, MySQL, and MongoDB all use WAL. The log converts random writes into sequential writes, reducing write latency.
Batch Writes
Accumulate multiple writes in memory and flush them to disk in a single batch operation. Kafka achieves millions of writes per second by batching records into segments. The trade-off: batching increases latency for individual writes (they wait for the batch) but dramatically increases aggregate throughput.
Append-Only Storage
Write new data by appending to the end of a file rather than updating in place. This converts random I/O into sequential I/O—the fastest possible disk access pattern. Kafka, Cassandra (SSTables), and event sourcing systems use append-only storage.
For structured practice applying storage architecture decisions across complete system design problems, Grokking the System Design Interview covers database selection and storage trade-offs in every design solution. For advanced storage patterns including LSM-tree internals, distributed consensus for replicated storage, and production-scale tiered architectures, Grokking the Advanced System Design Interview builds the depth required for L6+ interviews. For a comprehensive roadmap covering storage alongside all other system design fundamentals, Grokking System Design maps the complete preparation journey.
Frequently Asked Questions
How do I choose between B-tree and LSM-tree storage engines?
Choose B-tree (PostgreSQL, MySQL) for read-heavy workloads with complex queries, range scans, and point lookups. Choose LSM-tree (Cassandra, RocksDB) for write-heavy workloads with high ingest rates. B-trees read fast but write slower (random I/O). LSM-trees write fast but read slower (multiple SSTables to check).
When should I use an in-memory database like Redis?
For data requiring sub-millisecond latency: session storage, caching hot database rows, leaderboards, rate limiting counters, and real-time features. Do not use Redis as primary storage for datasets exceeding available RAM or for data requiring ACID transactions across multiple keys.
What is the most important sharding decision?
The shard key. A good shard key distributes data evenly (no hotspots), aligns with your most common query pattern (avoid scatter-gather), and does not need to change over time. Hash-based sharding on a unique identifier (user_id, order_id) is the safest default.
How does consistent hashing improve sharding?
Standard hash-based sharding requires rehashing all data when nodes are added or removed. Consistent hashing minimizes data movement to K/N keys (K = total keys, N = nodes). Virtual nodes improve distribution balance. DynamoDB, Cassandra, and CDNs use this approach.
What is tiered storage and why does it matter?
Tiered storage places data in different storage systems based on access frequency: Redis for hot data (<1ms), DynamoDB for warm data (1–10ms), S3 for cold data (50–100ms), Glacier for archives (minutes–hours). Each tier trades latency for cost savings, optimizing the total cost of ownership.
How do I achieve high write throughput?
Use LSM-tree-based databases (sequential I/O), batch writes (accumulate and flush), append-only storage (no in-place updates), and write-ahead logging (sequential log before random update). These patterns convert random I/O into sequential I/O—the key to high write throughput.
What is write amplification and why does it matter?
Write amplification is the ratio of data written to disk versus data written by the application. LSM-trees have high write amplification because compaction rewrites data multiple times. B-trees have lower write amplification. High write amplification reduces SSD lifespan and consumes I/O bandwidth.
How do I reduce read latency for a database?
Add a caching layer (Redis) for hot data. Use read replicas to distribute read load. Ensure proper indexing on frequently queried columns. Choose a B-tree storage engine for read-optimized access. Co-locate compute and storage in the same availability zone to minimize network latency.
Should I shard my database from the start?
No. Sharding adds significant complexity (cross-shard queries, distributed transactions, rebalancing). Start with vertical scaling (larger instance) and read replicas. Shard only when a single node cannot handle the write throughput or storage capacity requirements. Most systems do not need sharding until they reach millions of daily active users.
How do I discuss storage architecture in a system design interview?
During estimation, calculate storage requirements (users × data/user × retention). During database selection, justify your choice with the workload profile (read-heavy → B-tree, write-heavy → LSM-tree). During the deep dive, discuss caching strategy, sharding approach, and tiered storage. During trade-offs, compare latency, throughput, cost, and operational complexity.
TL;DR
Low-latency, high-throughput data storage requires matching your data to the right level of the storage hierarchy: RAM for sub-ms hot data, SSD for single-digit ms warm data, and S3/Glacier for cold archives. Two storage engine families drive the core trade-off: B-tree (PostgreSQL, MySQL—read-optimized, random I/O writes) vs LSM-tree (Cassandra, RocksDB—write-optimized, sequential I/O writes). Choose based on your workload's read/write ratio. Use in-memory databases (Redis, Memcached) for sessions, caches, and counters—not as primary storage for large datasets. Shard by hashing a unique key (user_id) with consistent hashing to distribute load and minimize data movement during rebalancing. Implement tiered storage with lifecycle policies that automatically move data from hot to cold tiers as access frequency decreases. Optimize writes with WAL, batching, and append-only patterns that convert random I/O into sequential I/O. In interviews, justify every storage decision with the workload profile: "I chose Cassandra because we ingest 100K events/second and LSM-trees handle write-heavy workloads at 5–10x the throughput of B-trees."
GET YOUR FREE
Coding Questions Catalog

$197

$72

$78