Read replicas vs sharding: when to scale reads vs partition data?

A single database can take you far, but sooner or later you face a choice that decides your next order of magnitude. Do you scale reads with replicas or partition data with shards. This guide gives you a practical decision flow that works in real systems and in the system design interview. You will learn how each technique works, when to apply it, and how to combine both for durable scale without surprises.

Introduction

Read replicas create additional copies of the same dataset to serve more read traffic. Sharding splits the dataset into independent partitions so that reads and writes are spread across many servers. Both improve throughput, but they solve very different bottlenecks. Thinking of replicas as more eyes on the same book and shards as many books on different shelves will keep your design sharp and simple.

Why It Matters

Most products start with a single primary database. Latency creeps up when reads pile on, or the app stalls when writes spike. Choosing replicas or shards early can remove the bottleneck with minimal change. Choosing poorly can lock you into complex code, data migrations, and emergency reshard projects. Interviewers want to see that you can identify the real limiter, pick the lightest tool, and plan the next step if the system grows again. That is scalable architecture thinking.

How It Works step by step

Read replicas

All writes go to a single primary.
The primary streams changes to one or more replicas through native replication.
The app sends read queries to replicas using a router or a library.
Replication is usually asynchronous which means replicas can lag behind the primary.
If a request must read its own latest write, route that read to the primary or use a session stickiness rule.
You monitor replica lag, error rates, and failover paths. Some stacks support semi sync or full sync replication at higher latency cost.

Sharding

Choose a partition key such as user id, tenant id, or content id.
Map each key to a shard with a range rule, a hash function, or a directory map.
Each shard has its own primary and often its own replica set.
Your service routes every read and write to the correct shard using the same partition function.
Cross shard queries require scatter and gather, pre computed aggregates, or a search index.
When shards get imbalanced you reshard or add shards using consistent hashing or a controlled migration.

Decision flow

Measure where time is spent. If the single primary has headroom for writes but is drowning in reads, add replicas first.
If the primary write path is saturated, or if the data cannot be served by one primary anymore, plan sharding.
If both reads and writes are large, start with sharding and give each shard its own replicas.
Always add a cache before the database if data is hot and can tolerate brief staleness.

Real World Example

Think about an Instagram style feed. New posts and likes are frequent but the ratio of reads to writes is very high. You can keep a single write primary and scale reads with replicas behind a cache for hot timelines. Now consider a chat service with many concurrent rooms. Writes are continuous and evenly spread across users. A single primary becomes the bottleneck. Partition messages by room id so each shard handles its own read and write load. For a retail catalog like Amazon, you might shard by product id to distribute writes and still keep fast reads, while search and recommendation layers index across shards.

Common Pitfalls and Trade offs

Read replicas

Replication lag can cause stale reads and user confusion. Use primary reads for post write checks or add read after write consistency for critical paths.
A single write primary remains a hard limit. Replicas do not improve write throughput.
Hot rows produce high read load even with many replicas. Add a cache and consider denormalized views.

Sharding

A poor partition key creates hotspots. Time based keys, celebrity users, or monotonic ids often pile work on a few shards. Prefer a hash of user id or a composite key that spreads load.
Cross shard joins and transactions are expensive. Keep entities that must be read together in the same shard.
Resharding is operationally heavy. Plan for growth with consistent hashing or a directory that supports live moves.
Global secondary indexes across shards add complexity. Consider per shard indexes plus a search system for global queries.

Cost and operations

Replicas are simple to add and retire but can multiply storage and compute costs quickly.
Shards reduce the pressure on any single box but add deployment, routing, and observability overhead.
Both need clear failure modes. Replicas need tested failover. Shards need clean isolation so one shard failure does not cascade.

Interview Tip

A common prompt is you have a primary database that is at eighty percent CPU during peak traffic and p ninety five read latency is spiking while write volume is stable. What do you do first. A strong answer is move read traffic to replicas fronted by a cache, keep critical read after write paths on the primary, and define the next step if writes grow which is to shard by the access key that aligns with your main queries.

Key Takeaways

Replicas scale reads on a single dataset. Shards scale reads and writes by splitting the dataset.
Start with replicas when the primary has write headroom and read queries dominate.
Start with sharding when the write path or dataset size no longer fits on one primary.
Many mature systems use shards with per shard replicas for higher durability and throughput.
Caching sits in front of both and often delivers the best return for hot data.

Table of Comparison

Aspect	Read Replicas	Sharding	Caching
Primary Goal	Scale reads on one dataset	Scale reads and writes by partitioning data	Serve hot data without touching the database
Effect on Writes	No improvement	Major improvement	No direct change
Consistency	Often eventual on replicas	Strong within a shard, weak across shards	Stale until invalidated
Operational Complexity	Lower and easier to maintain	Higher due to routing and resharding	Moderate due to invalidation policies
Best Fit	Read-heavy workloads with simple queries	Write-heavy or large datasets	Frequently accessed data
Query Patterns	Point reads, aggregates	Shard-local reads and writes	Repeated reads of the same keys
Failure Impact	Replica loss reduces capacity	Shard failure isolates the blast radius	Cache miss increases database load
Implementation Difficulty	Easier and faster to set up	Requires planning and careful partitioning	Medium difficulty depending on strategy

FAQs

Q1. Do replicas help with write throughput?

No. Replicas only offload reads. All writes still go to the primary unless you are using a multi primary scheme which is a different design.

Q2. Can I use both replicas and sharding?

Yes. Many teams shard to split the dataset and then add replicas to each shard for durability and read scale.

Q3. How many replicas should I add before sharding?

Use data. If the primary has write headroom and you can meet read latency with two or three replicas, keep it simple. If reads are fine but the primary write path or storage is the limiter, plan sharding.

Q4. What is replication lag and how do I handle it?

Lag is the delay between a write on the primary and visibility on replicas. For user flows that must see their own writes, read from the primary or use session stickiness. For general reads, eventual consistency is often acceptable.

Q5. How do I pick a shard key?

Pick a key that aligns with your dominant access path and spreads load evenly. User id, tenant id, or content id are common. Avoid time based keys that cluster writes in a few shards.

Q6. Should I just add a cache instead?

Often yes. A cache in front of the database can remove a large share of read traffic and delay the need for replicas or shards. But it does not fix a saturated write path.

Further Learning

If you want a deep practice path that covers these choices end to end, explore the hands on syllabus in Grokking Scalable Systems for Interviews. For a structured interview playbook with patterns, trade offs, and realistic prompts, study the lessons in Grokking the System Design Interview.