Grokking System Design Fundamentals
Ask Author
Back to course home

0% completed

Vote For New Content
Replication Methods
Table of Contents

Contents are not accessible

Contents are not accessible

Contents are not accessible

Contents are not accessible

Contents are not accessible

Replication in database systems is a strategy for ensuring data availability, redundancy, and load balancing. There are several replication methods, each with its advantages and challenges.

1. Single-leader replication

In single-leader (a.k.a. primary-backup) replication, one node handles all writes while one or more followers asynchronously or synchronously replicate its state. The follower may or may not serve read traffic (depending upon the settings).

Example

MySQL primary-replica: one master accepts writes; replicas pull binlog events and replay them.

Pros

  • Simple conflict-free writes (only one writer)
  • Strong ordering: writes are totally ordered at the leader
  • Easy to reason about failover (promote a replica)

Cons

  • Leader is a single point of write failure
  • Writes can be a bottleneck at high scale
  • Replicas may lag under heavy write load (staleness)

2. Multi-leader replication

Here multiple nodes can accept writes; they asynchronously propagate changes to each other, resolving conflicts via timestamps or application logic.

Example

CouchDB bidirectional sync: each node can accept updates and exchange changesets via HTTP; conflicts flagged for manual or callback resolution.

Pros

  • Higher write availability (any leader can serve writes)
  • Local write locality for geo-distributed setups
  • Graceful partial failures (other leaders stay writable)

Cons

  • Conflict detection and resolution add complexity
  • Possibility of divergent histories if replicas miss sync
  • Increased metadata overhead (vector clocks, change logs)

3. Leaderless (quorum-based) replication

No designated leader—clients send reads/writes to any replica set and rely on read/write quorums to ensure consistency.

Example

Cassandra/Dynamo: write to N replicas; require W acknowledgments. Read from any; require R acknowledgments. With R+W>N you get strong consistency.

Pros

  • No single point of failure or bottleneck
  • Tunable consistency vs latency via quorum sizes
  • Built-in partition tolerance

Cons

  • Operational complexity tuning R, W, N
  • Potential for stale reads if R and W are mis-configured
  • Harder to guarantee global ordering of writes

4. Chain replication

Nodes are arranged in a fixed chain. Writes flow from head → … → tail; reads are served from the tail, so they see all preceding writes.

Example

Google’s Chain Replication paper: head receives write, forwards to next; tail acks back up the chain.

Pros

  • Strong consistency: tail has fully ordered state
  • High throughput: pipelining along chain
  • Simple failover: if one link fails, re-chain the neighbors

Cons

  • Chain length adds write latency (O(chain length) hops)
  • Any link failure stalls the chain until reconfiguration
  • Read load concentrated on tail unless you add more tails

5. Read-replica replication

A variation of single-leader (primary-backup) replication where the leader handles all writes and one or more replicas serve only read traffic. Replicas continuously pull or receive a stream of write updates from the leader but never accept writes themselves.

Difference between Single-leader and Read-replica

  • Single-leader

    • You can optionally direct reads to replicas, but it isn’t the primary goal.
    • Often used for failover, backups, or analytics off the primary.
  • Read-replica

    • Designed to offload all heavy read workloads from the primary.
    • You can add dozens (or hundreds) of replicas to handle spikes in read traffic.

Example

PostgreSQL streaming replication

  • The primary database writes changes to its WAL (Write-Ahead Log).
  • Standby servers connect over a streaming protocol, replay WAL records in near real time, and expose the data as read-only replicas.

Pros

  • Read scalability: Offloads read queries from the primary to many replicas.
  • Isolation: Reads never block writes on the primary.
  • Geographic distribution: You can place read replicas close to users to reduce latency.
  • Failover readiness: Replicas can be promoted to primary if the leader fails.

Cons

  • Stale reads: Replicas lag behind the primary (replication delay), so you may see outdated data.
  • No write distribution: All writes still funnel through the single leader, so it remains a write bottleneck.
  • Operational overhead: Monitoring lag and handling replica drift/failover adds complexity.

6. Snapshot replication

Rather than continuously shipping every change, snapshot replication takes a full copy of the source dataset at a specific point in time and pushes that snapshot to one or more targets on a scheduled basis.

Example

SQL Server snapshot replication

  1. At publication time, the server generates a bulk “snapshot” of tables and schema.
  2. The snapshot is delivered to subscribers (often via file share or network transfer).
  3. Subscribers apply the entire snapshot, replacing their local data.
  4. The process repeats at configured intervals (e.g., nightly).

Pros

  • Simplicity: No need to track individual changes or maintain a continuous log‐shipping pipeline.
  • Consistency: Each snapshot represents a consistent view of the data at one moment.
  • Good for static or slow-changing data: Ideal when real-time updates aren’t required.

Cons

  • Heavy IO and network load: Repeatedly copying entire datasets can consume significant resources.
  • Latency: Changes only appear on subscribers after the next snapshot—no near-real-time updates.
  • Not incremental: Even small changes trigger full snapshot deliveries unless differential techniques are layered on.

7. Hybrid Replication

Explanation

  • Combines different replication methods to meet specific requirements.
  • For example, using multi-leader replication between two data centers and read-replica replication within each data center.

Pros

  • Flexibility to tailor replication strategy to specific needs.
  • Can optimize for both performance and data consistency.

Cons

  • Increased complexity in configuration and management.
  • Potential for conflicting replication behaviors if not properly coordinated.

Summary

Replication trades off write complexity, availability, and consistency.

  • Single-leader is the easiest option but is limited by having only one writer.
  • Multi-leader boosts write locality at the cost of conflict resolution.
  • Leaderless (quorum) removes single points but needs careful quorum tuning.
  • Chain gives strong ordering and pipelining, yet ties latency to chain length.
  • Read-replica replication is perfect when you need to scale out reads under a single-writer model, but be mindful of replica lag and write bottlenecks.
  • Snapshot replication is a straightforward way to distribute a point-in-time copy of data on a schedule, best suited for static or slowly changing datasets where resource cost and latency between snapshots are acceptable.

.....

.....

.....

Like the course? Get enrolled and start learning!

Table of Contents

Contents are not accessible

Contents are not accessible

Contents are not accessible

Contents are not accessible

Contents are not accessible