On this page

Why Database Storage Matters

The Traditional Approach: B-Trees

Why B-Trees Are So Good at Reads

The Cost of Updating B-Trees

Enter the LSM Tree

How an LSM Tree Works

Step 1: Write to Memory

Step 2: Write Ahead Log

Step 3: Flush to Disk

Step 4: Compaction

Why LSM Trees Are Extremely Fast for Writes

Reads Become More Complicated

The Hidden Cost of Compaction

Range Queries: B-Tree Wins

Write Throughput: LSM Tree Wins

Space Amplification vs Write Amplification

Which Databases Use Which?

B-Tree Based

LSM Tree Based

Choosing the Right Storage Engine

What Interviewers Expect

Learn Database Internals Through System Design

Final Thoughts

LSM Trees vs. B-Trees: How Modern Databases Handle Reads and Writes

Arslan Ahmad

June 26th, 2026

Understand how modern databases optimize reads and writes, why PostgreSQL and Cassandra use different storage engines, and how these trade-offs influence performance, scalability, and system design decisions.

On This Page

Why Database Storage Matters

The Traditional Approach: B-Trees

Why B-Trees Are So Good at Reads

The Cost of Updating B-Trees

Enter the LSM Tree

How an LSM Tree Works

Step 1: Write to Memory

Step 2: Write Ahead Log

Step 3: Flush to Disk

Step 4: Compaction

Why LSM Trees Are Extremely Fast for Writes

Reads Become More Complicated

The Hidden Cost of Compaction

Range Queries: B-Tree Wins

Write Throughput: LSM Tree Wins

Space Amplification vs Write Amplification

Which Databases Use Which?

B-Tree Based

LSM Tree Based

Choosing the Right Storage Engine

What Interviewers Expect

Learn Database Internals Through System Design

Final Thoughts

When engineers compare databases, the discussion usually revolves around SQL versus NoSQL, consistency models, replication, or sharding. Those are important architectural decisions, but they often hide a more fundamental question:

How does the database actually organize data on disk?

The answer has a profound impact on performance.

Whether your application serves millions of reads per second, ingests a constant stream of IoT data, or powers a social media feed, much of its performance depends on the underlying storage engine. Two of the most influential data structures in modern databases are B-Trees and Log-Structured Merge Trees (LSM Trees).

Although they solve the same problem—efficiently storing and retrieving large amounts of data—they take dramatically different approaches. B-Trees optimize for balanced read and write performance by updating data in place, while LSM Trees optimize for high write throughput by treating storage as an append-only system and reorganizing data later.

Whether you're preparing for a system design interview or designing production systems, understanding storage engines gives you a much deeper appreciation of why databases behave differently under different workloads. If you're new to these concepts, Grokking System Design Fundamentals provides an excellent introduction to database internals, caching, replication, partitioning, and other building blocks that every software engineer should understand before diving into advanced distributed systems.

In this article, we'll explore how B-Trees and LSM Trees work, why they were designed differently, their strengths and weaknesses, and when each one is the better choice.

Why Database Storage Matters

Every database must solve the same basic challenge.

Applications continuously issue operations like:

Insert a new row.
Update an existing record.
Delete obsolete data.
Find a specific key.
Retrieve a range of values.

These operations sound simple until you consider scale.

A production database may store billions of records, far exceeding available memory. Most of the data lives on disk, where reads and writes are significantly slower than in-memory operations. The storage engine's primary job is to minimize expensive disk I/O while keeping data organized for efficient access.

This is why database internals matter. Two databases may expose similar SQL syntax or APIs, yet perform very differently because their storage engines optimize for different workloads.

Before choosing a database, it's important to understand the workload you're optimizing for. Modern system design interviews often expect candidates to explain not only which database they would choose, but also why that storage engine fits the application's read/write characteristics. This type of architectural reasoning is covered extensively in Grokking the System Design Interview, where database selection is always discussed in the context of scalability, latency, and real-world trade-offs rather than as isolated technologies.

The Traditional Approach: B-Trees

For decades, B-Trees have been the dominant indexing structure for relational databases.

The idea is elegant.

Instead of storing every key sequentially, a B-Tree organizes keys into balanced pages. Each page contains multiple sorted keys along with pointers to child pages. Because every leaf node sits at roughly the same depth, searching for any key requires only a small number of page reads.

Imagine looking up a customer ID in a table with hundreds of millions of rows. Rather than scanning the entire dataset, the database traverses the tree from the root to the appropriate leaf node, narrowing the search space at each level.

A simplified B-Tree might look like this:

B-Tree

Searching for 60 follows a predictable path:

50 → 75 → 60

Even though the actual tree may contain billions of records, the number of disk accesses remains relatively small because each node stores many keys rather than just one.

Why B-Trees Are So Good at Reads

B-Trees excel because their data remains sorted at all times.

Suppose you need every order placed between 10:00 AM and 11:00 AM.

A B-Tree can quickly locate the first matching record and then read neighboring leaf pages sequentially. Since nearby keys are stored close together, range scans become extremely efficient.

This is one reason traditional relational databases rely heavily on B-Tree indexes for queries involving:

Date ranges
Alphabetical ordering
Numeric intervals
Prefix searches
ORDER BY clauses

The sorted structure naturally supports these operations without requiring additional work.

The Cost of Updating B-Trees

The same structure that makes reads efficient also makes writes more expensive.

Consider inserting a new key into the middle of an already full page.

The database may need to:

Read the existing page from disk.
Insert the new key while maintaining sorted order.
Split the page if necessary.
Update parent pointers.
Write multiple modified pages back to disk.

These random disk writes are significantly more expensive than sequential writes.

As write rates increase, maintaining the tree becomes increasingly costly.

For workloads dominated by continuous writes, another approach performs much better.

Enter the LSM Tree

Log-Structured Merge Trees were designed around a different assumption.

Disk writes are expensive.

Sequential writes are cheap.

Instead of modifying existing data immediately, an LSM Tree accumulates changes in memory and writes them sequentially to disk later.

The philosophy is simple:

Never rewrite existing files if you can append instead.

This dramatically reduces random disk I/O.

How an LSM Tree Works

The write path consists of several stages.

Step 1: Write to Memory

Incoming writes are stored inside an in-memory structure called a MemTable.

Because memory is fast, inserts complete quickly.

Step 2: Write Ahead Log

Before acknowledging success, the database also appends the operation to a Write-Ahead Log (WAL).

If the server crashes, the WAL allows the MemTable to be reconstructed.

Step 3: Flush to Disk

Once the MemTable reaches a threshold, it is written sequentially to disk as an immutable file called an SSTable (Sorted String Table).

No existing data is modified.

A new file is simply created.

LSM Write Pipeline

Step 4: Compaction

Over time, multiple SSTables accumulate.

Background compaction merges overlapping files, removes obsolete versions, and discards deleted records.

This keeps the storage engine manageable while preserving efficient lookups.

Why LSM Trees Are Extremely Fast for Writes

Imagine receiving one million sensor readings every minute.

A B-Tree would continuously modify existing pages.

An LSM Tree simply keeps appending new records.

Sequential disk writes are much faster than scattered random writes, particularly on traditional hard drives and still beneficial on SSDs.

This is why many write-heavy systems choose LSM Trees.

Examples include:

Cassandra
RocksDB
LevelDB
ScyllaDB
HBase

These databases are optimized for continuous ingestion rather than complex analytical queries.

Reads Become More Complicated

The trade-off appears during reads.

Suppose a key exists.

Where should the database look?

It may be:

In the MemTable.
In the newest SSTable.
In an older SSTable.
In an even older SSTable.

Without optimization, every lookup could require checking multiple files.

To reduce unnecessary disk reads, LSM Trees use Bloom Filters.

A Bloom Filter quickly answers:

"This key definitely isn't here."

"This key might be here."

If the filter says the key cannot exist inside a particular SSTable, the database skips reading that file entirely.

Bloom Filters dramatically improve lookup performance while consuming very little memory.

The Hidden Cost of Compaction

Compaction is both the strength and weakness of LSM Trees.

Eventually, scattered SSTables must be merged.

During compaction the database:

Reads existing files.
Merges records.
Removes duplicates.
Deletes obsolete versions.
Writes new SSTables.

Although this happens in the background, it consumes CPU, memory, and disk bandwidth.

Large compactions can temporarily increase latency if not managed carefully.

Modern databases devote significant engineering effort to scheduling compactions efficiently.

Range Queries: B-Tree Wins

Suppose an application needs:

SELECT \*  
FROM Orders  
WHERE CreatedAt BETWEEN ...

B-Trees excel.

Data already exists in sorted order.

The database finds the first matching key and scans forward sequentially.

LSM Trees can support the same query, but data may reside across multiple SSTables created at different times. The engine often performs additional merging during reads, making large range scans more expensive.

This is one reason relational databases remain popular for transactional business applications.

Write Throughput: LSM Tree Wins

Now imagine an IoT platform ingesting telemetry from five million devices.

Every second brings another flood of writes.

Random updates become the bottleneck.

Sequential appends become the advantage.

LSM Trees handle these workloads exceptionally well because they transform random writes into sequential disk operations.

Applications involving:

Logging
Metrics
Time-series data
Event ingestion
Social activity feeds

often benefit from LSM-based storage engines.

Space Amplification vs Write Amplification

Neither data structure is free.

B-Trees primarily pay the cost during updates.

LSM Trees pay the cost later through compaction.

Engineers describe these trade-offs using three concepts:

Read amplification
Write amplification
Space amplification

Read amplification measures how many disk reads are required for one logical read.

Write amplification measures how much physical writing occurs for one logical write.

Space amplification measures how much additional storage is consumed beyond the actual dataset.

B-Trees generally have lower read amplification.

LSM Trees generally achieve higher write throughput but incur greater write amplification because data is rewritten during compaction.

Understanding these trade-offs is far more useful than memorizing which database uses which structure.

Which Databases Use Which?

B-Tree Based

PostgreSQL
MySQL (InnoDB)
SQLite
Oracle Database
SQL Server

These databases prioritize predictable reads, indexing, and transactional workloads.

LSM Tree Based

Cassandra
RocksDB
LevelDB
ScyllaDB
HBase

These systems prioritize sustained write performance and horizontal scalability.

Choosing the Right Storage Engine

A common system design mistake is choosing databases based only on popularity.

Instead, start with workload characteristics.

Ask questions like:

Is the system read-heavy or write-heavy?
Are range scans common?
Is data mostly append-only?
Are updates frequent?
Is write latency more important than read latency?

The answers often determine whether a B-Tree or LSM Tree is the better foundation.

Decision Tree

What Interviewers Expect

In system design interviews, you rarely need to explain the implementation details of B-Trees or LSM Trees.

Instead, interviewers want to know whether you understand the trade-offs.

A strong answer might sound like:

"For this write-heavy event ingestion system, I'd prefer an LSM Tree–based database because sequential writes provide much higher ingestion throughput. If the workload were dominated by transactional queries and range scans, I'd lean toward a B-Tree–based relational database instead."

That explanation demonstrates architectural reasoning rather than memorization.

Learn Database Internals Through System Design

Storage engines are only one piece of modern database architecture.

To make good architectural decisions, you also need to understand indexing, caching, replication, partitioning, consistency models, load balancing, and distributed communication. These concepts work together, and interviewers typically expect you to reason about them as a complete system rather than as individual topics.

If you're building your foundation, Grokking System Design Fundamentals explains the core building blocks of scalable systems, including databases, caches, messaging systems, replication, and storage architecture. It's an excellent starting point before tackling more complex system design problems.

Once you're comfortable with the fundamentals, Grokking the System Design Interview shows how these concepts come together in real-world designs such as Instagram, Twitter, Uber, URL shorteners, and other large-scale systems. Rather than memorizing architectures, you'll learn how to evaluate trade-offs and justify design decisions—the exact skill interviewers are looking for.

One of the biggest advantages of understanding B-Trees and LSM Trees is that database selection stops becoming guesswork. Instead of choosing PostgreSQL, MySQL, Cassandra, or RocksDB based on popularity, you'll understand the workload characteristics that make each storage engine a good architectural fit.

Final Thoughts

B-Trees and LSM Trees represent two different philosophies for managing persistent data.

B-Trees keep information continuously organized, making reads and range queries exceptionally efficient. LSM Trees defer organization, optimizing for fast sequential writes and reorganizing data later through compaction.

Neither approach is universally superior.

The best choice depends entirely on workload characteristics.

Read-heavy transactional systems often benefit from B-Trees.

Write-intensive event-driven systems often benefit from LSM Trees.

As distributed systems continue growing in scale, understanding these storage engines becomes increasingly valuable. They are not merely implementation details hidden inside databases—they influence latency, throughput, storage efficiency, and ultimately the architecture of the systems we build.

Databases

What our users say

Matzuk

Algorithms can be daunting, but they're less so with the right guide. This course - https://www.designgurus.io/course/grokking-the-coding-interview, is a great starting point. It covers typical problems you might encounter in interviews.

Arijeet

Just completed the “Grokking the system design interview”. It's amazing and super informative. Have come across very few courses that are as good as this!

AHMET HANIF

Whoever put this together, you folks are life savers. Thank you :)

Recommended Course

Grokking the System Design Interview

178,110+ students

4.7

The #1 system design course for FAANG interviews, built by ex-FAANG hiring managers.

View Course

Scaling SQL Databases: 8 Challenges of Horizontally Scaling SQL Databases

Arslan Ahmad

Nov 30th, 2023

Database Indexes Are Not Free: The Read, Write, and Storage Trade-offs You Need to Know

Arslan Ahmad

Jun 17th, 2026

PostgreSQL vs. MongoDB vs. DynamoDB

Arslan Ahmad

Feb 20th, 2026

Grokking the Fundamentals of Database Replication for System Design Interviews

Arslan Ahmad

Jul 21st, 2025

One-Stop Portal For Tech Interviews.

LSM Trees vs. B-Trees: How Modern Databases Handle Reads and Writes

Why Database Storage Matters

The Traditional Approach: B-Trees

Why B-Trees Are So Good at Reads

The Cost of Updating B-Trees

Enter the LSM Tree

How an LSM Tree Works

Step 1: Write to Memory

Step 2: Write Ahead Log

Step 3: Flush to Disk

Step 4: Compaction

Why LSM Trees Are Extremely Fast for Writes

Reads Become More Complicated

The Hidden Cost of Compaction

Range Queries: B-Tree Wins

Write Throughput: LSM Tree Wins

Space Amplification vs Write Amplification

Which Databases Use Which?

B-Tree Based

LSM Tree Based

Choosing the Right Storage Engine

What Interviewers Expect

Learn Database Internals Through System Design

Final Thoughts

What our users say

Recommended Course

Join our Newsletter

Read More