Paxos vs Raft vs ZAB: choosing a consensus protocol.
Consensus is how a distributed system agrees on one value or one sequence of values despite failures. Paxos, Raft, and ZAB are the three most studied and widely deployed consensus families. All three implement state machine replication with majority quorum, but they differ in mental model, complexity, and the way they handle leader election, log replication, recovery, and membership change.
If you are preparing for a system design interview, knowing when to choose Paxos, Raft, or ZAB will help you reason about correctness, latency, and operational simplicity for a scalable architecture.
Why It Matters
Any system that keeps shared configuration, service discovery data, or critical metadata needs a single source of truth. Control planes for container orchestration, metadata services for filesystems and databases, and coordination services for microservices commonly sit on top of consensus. In interviews, the choice of protocol affects how you explain write latency, read consistency, failover speed, and cross region deployment. A good answer shows you can balance strong consistency and availability while keeping the design operable by real teams.
How It Works Step by step
Below is a compact way to think about consensus and how each protocol instantiates it.
General flow for state machine replication
-
Elect a leader that proposes an ordered stream of log entries.
-
Replicate each entry to a majority of nodes.
-
Commit when a majority acknowledges the same entry index and term or epoch.
-
Apply committed entries to the state machine in order.
-
Recover by reconciling logs and snapshots after crashes or restarts.
-
Change cluster membership without violating safety.
Paxos in practice Multi Paxos
-
Roles include proposers, acceptors, and learners.
-
Basic Paxos uses two message rounds per decision. Prepare with a proposal number to get promises and previously accepted values. Then accept with a value that respects the promises.
-
Multi Paxos optimizes repeated decisions by keeping a stable leader so most entries can skip a fresh prepare.
-
Commit happens once a majority of acceptors accept the same proposal number and value.
-
Membership change and log compaction are not standardized, so production systems add engineering around the core proof.
-
Strengths include long studied correctness and flexibility. Drawback is conceptual complexity and more effort to implement cleanly.
Raft
-
Nodes start as followers. After a randomized timeout a follower becomes a candidate, starts an election, and wins leadership by majority vote.
-
The leader appends entries to its log and replicates them to followers.
-
An entry is committed when it is stored on a majority and the leader observes that the entry is from the current term at the commit index.
-
Raft enforces strong log matching via index and term, which simplifies reasoning about conflicts.
-
Snapshots truncate old prefixes for compaction. Membership change uses joint consensus so that both old and new configurations overlap safely.
-
Strengths include clarity for implementers, widely available libraries, and operability. Drawback can be slightly more control plane traffic in some cases due to explicit leader focus, though in practice it matches Multi Paxos throughput.
ZAB ZooKeeper Atomic Broadcast
-
Designed for the ZooKeeper coordination service. It focuses on total order broadcast for znodes and watch events.
-
ZAB has two modes. Recovery mode synchronizes the cluster after a crash to a consistent prefix. Broadcast mode then replicates new proposals from a single leader.
-
A proposal commits once the leader sees acknowledgments from a majority and the ordering is preserved by a zxid that encodes epoch and counter.
-
Membership is tied to ZooKeeper epochs with well defined recovery semantics.
-
Strengths include strict ordering and mature operational story for coordination workloads. Drawback is specialization for coordination patterns rather than general database replication.
Real World Example
Consider platform teams at a company that looks like Instagram or Netflix. They run a large number of microservices and need a reliable control plane. A common approach is to run Kubernetes with etcd. Etcd uses Raft, which gives clear failure semantics and simple membership changes for day to day operations.
Now consider a team building a coordination service for locks, leader election for application components, and watch based configuration. Adopting ZooKeeper gives you ZAB out of the box with strong ordering and mature clients.
If a team at a company with global data centers is building a custom metadata store and wants a battle tested protocol with many academic proofs, they may pick Multi Paxos through existing libraries. Google systems such as Chubby and Spanner classically rely on Paxos. The main lesson for interviews is that the best protocol choice often follows the ecosystem and operational tooling your platform already trusts.
Common Pitfalls or Trade offs
-
Assuming reads are always linearizable In Raft you need leader reads with a lease check or a read index to avoid stale followers. In Paxos based systems you need a similar guard or a fence. ZAB clients often perform a sync before reading if strict read freshness is required.
-
Unsafe membership change Adding or removing nodes without a joint configuration can violate safety. Raft codifies a safe path. With Paxos you must implement a safe plan. ZAB handles this through epochs and recovery.
-
Cross region latency surprises Majority quorum across regions raises write latency. A common design is a five node cluster with at least three regions, placing a leader in a central region that has the lowest average round trip to others.
-
Log growth and slow recovery Snapshotting strategy matters. Frequent snapshots shorten recovery but add runtime cost. Infrequent snapshots speed steady state but slow catch up.
-
Over engineering the control plane Do not reinvent an entire consensus stack if your problem is solved by off the shelf systems like etcd or ZooKeeper, which are already tuned for operational safety.
Interview Tip
If asked to choose a protocol, anchor your answer on the workload and operations. For a control plane that needs a simple mental model and fast iteration by the platform team, pick Raft through a mature store like etcd. For a pure coordination service with strong ordering of small metadata and watch semantics, pick ZooKeeper with ZAB. For highly customized replication in a large scale database or a global lock service, pick Multi Paxos, citing mature proofs and production heritage.
Key Takeaways
-
All three require majority quorum for safety and provide total order of committed updates.
-
Raft optimizes for understandability and operational clarity, making it a strong default for control planes.
-
Paxos optimizes for generality and has deep theory with many variants, making it a fit for custom replicated databases and global lock services.
-
ZAB optimizes for coordination workloads and event ordering inside ZooKeeper.
-
Your choice should reflect ecosystem libraries, operational maturity, and latency budget across regions.
Table of Comparison
| Protocol | Leader Model | Commit Rule | Membership Change | Snapshot & Compaction | Operational Complexity | Typical Platforms | Best Fit |
|---|---|---|---|---|---|---|---|
| Paxos (Multi-Paxos) | Practical leader for throughput; original form can be leaderless | Majority accepts same proposal number and value | No single standard; must be engineered per system | Engineered per system | Higher; many subtle edge cases | Chubby, Spanner, custom databases | Custom replication with strong proofs |
| Raft | Single leader with randomized elections | Entry stored on majority; leader sees match in current term | Joint consensus built in | Built-in snapshots and compaction | Lower; clear spec and libraries | Etcd, Consul, Kubernetes control planes | Service control planes and configuration stores |
| ZAB | Single leader with recovery and broadcast modes | Majority acknowledgment with total order via zxid | Epoch-based recovery and rejoin | Managed internally by ZooKeeper | Low (within ZooKeeper) | ZooKeeper, Kafka (legacy controller) | Coordination and watch-heavy metadata systems |
FAQs
Q1. Which consensus protocol is easiest to learn and implement for a system design interview?
Raft. The role transitions follower to candidate to leader, the log matching on index and term, and the built in joint consensus make it easier to explain and to reason about correctness.
Q2. Does Paxos have better throughput than Raft?
In practice Multi Paxos and Raft have similar message complexity and comparable throughput when a stable leader is present. Differences come more from implementation quality, batching, and network layout than from the core protocol.
Q3. When should I choose ZAB over Raft?
Choose ZAB when you want a coordination service with strict event ordering and mature client patterns like watchers. If you plan to run ZooKeeper for locks, leader election for app components, and small configuration values, ZAB is the native choice.
Q4. Are reads always linearizable in these systems?
Not by default. In Raft use leader reads with a lease or a read index. In Paxos based systems use a fence or a check tied to the current quorum. In ZAB a client can issue a sync to ensure it sees the latest committed state before reading.
Q5. How do these protocols behave across three regions?
All use majority quorum. With five nodes placed across three regions, a write commits once any three nodes have the entry. Put the leader in the region with the lowest average round trip time to others to reduce commit latency.
Q6. Can I avoid a leader entirely?
Classic Paxos can operate without a permanent leader but throughput is lower. Real systems usually run Multi Paxos or Raft with a long lived leader for efficiency. ZAB requires a leader.
Further Learning
To build a rock solid mental model for consensus and apply it in real interviews, explore the patterns in Grokking the System Design Interview with hands on case studies that include control planes and metadata stores. For a deeper dive into replication and cross region trade offs with labs and scenarios, enroll in Grokking Scalable Systems for Interviews.
GET YOUR FREE
Coding Questions Catalog
$197

$78
$78