Table of Contents

1. The Bully Algorithm

2. The Ring Algorithm

3. Paxos Algorithm (Leader Election via Consensus)

How Paxos Leader Election Works

4. Raft Algorithm

5. ZooKeeper’s Leader Election (Ephemeral Nodes)

How ZooKeeper Leader Election Works

Comparison of Leader Election Algorithms

Conclusion

Image
Arslan Ahmad

5 Best Leader Election Algorithms for System Design

Discover the top 5 leader election algorithms in distributed systems (Bully, Ring, Paxos, Raft, ZooKeeper). Learn how they work and when to use each in system design.
Image
On this page

1. The Bully Algorithm

2. The Ring Algorithm

3. Paxos Algorithm (Leader Election via Consensus)

How Paxos Leader Election Works

4. Raft Algorithm

5. ZooKeeper’s Leader Election (Ephemeral Nodes)

How ZooKeeper Leader Election Works

Comparison of Leader Election Algorithms

Conclusion

This blog explores the top leader election algorithms used in distributed systems like Bully, Ring, Paxos, Raft, and ZooKeeper.

Imagine a group of servers trying to agree on who’s the “leader” – the one in charge of coordinating tasks. It’s like a team choosing a captain.

In a distributed system (think databases or microservices spread across machines), a leader node typically handles critical coordination: ordering updates, committing changes, and keeping data consistent across replicas.

But what happens when that leader goes down?

The system needs a reliable way to pick a new leader on the fly.

This is where leader election algorithms come in.

Leader election answers the simple but vital question: “Which node is currently in charge?”.

The answer can’t be hard-coded or left to chance – it must withstand real-world chaos like crashes, network delays, or partitions.

When a leader fails, the cluster must quickly detect it, agree on a replacement, and keep going without missing a beat (or double-processing tasks).

Sound tricky?

It is – and that’s why understanding these algorithms is key for designing robust systems.

Now, let’s discuss five popular leader election algorithms and see how each tackles the problem in its own way. We’ll start with two classic algorithms and then move to the consensus big-leagues.

1. The Bully Algorithm

Ever heard the phrase “the biggest bully wins”?

In the Bully Algorithm, the node with the highest ID always wins the election.

Bully Algorithm
Bully Algorithm

Here’s how it works in a nutshell:

  • Election Trigger: When a node notices that the current leader isn’t responding (usually via a timeout), it starts an election.

  • Challenging Higher Nodes: The initiating node sends an “election” message to all other nodes with higher IDs (higher priority). It’s basically challenging them: “I think the leader is down – anyone bigger than me out there?”

  • Responses: If no higher-ID node responds, congrats – the initiator becomes the leader by default. But if a higher-ID node does respond, that higher node takes over the election process (essentially saying “I’ll handle this, I’m bigger”). The initial node then stands down and waits.

  • Determining the Winner: Eventually, the node with the highest ID among those still alive will announce itself as the new leader by sending a “victory” message to all other nodes. This ensures everyone knows who the new boss is.

2. The Ring Algorithm

The Ring Algorithm takes a different, more orderly approach (no bullying here!).

Imagine all nodes arranged in a logical ring.

Ring Algorithm
Ring Algorithm

When the leader dies, nodes pass an election token around the ring to decide the new leader:

  • Setup: Nodes are conceptually ordered in a ring (each node knows who the “next” node in the ring is). This doesn’t require a physical ring network – just a logical ordering (often by node ID).

  • Election Pass: A node that detects a leader failure creates an election message containing its own ID and sends it to its immediate neighbor in the ring. As the message circulates through each node in sequence, every node adds its own ID into the message.

  • Winner Determination: When the message comes full circle back to the starter, it now carries a list of all active nodes’ IDs. The highest ID in that list is the winner. The initiating node then sends out a new message announcing the node with the highest ID as the elected leader. Essentially, the ring “votes” and picks the top dog.

3. Paxos Algorithm (Leader Election via Consensus)

Paxos isn’t just a leader election algorithm – it’s a whole protocol for agreeing on values in a network of unreliable nodes. However, Paxos can incorporate leader election as part of its process.

In fact, an optimized form known as Multi-Paxos uses a stable leader to make decisions more efficient.

Paxos Algorithm
Paxos Algorithm

How Paxos Leader Election Works

In basic Paxos, any node can propose values, and a majority of nodes (called acceptors) must agree.

There’s no permanent leader in single-decree Paxos; a leader can be thought of as a coordinator for a given round.

Multi-Paxos, however, designates a leader so that not every decision requires a full two-phase negotiation. The leader election in Multi-Paxos typically happens like this:

  • A node that wants to be leader performs Phase 1 of Paxos (the prepare phase) with a unique proposal number. Essentially, it asks a majority of nodes, “Can I be the coordinator for proposals?”

  • If it gets permission (majority of acceptors respond with no objections or with info about past proposals), that node becomes the leader. It can then skip the expensive prepare phase for each new operation and go straight to proposing values. This speeds things up tremendously.

  • The leader holds a sort of lease or tenure: it remains leader until it fails or until another node claims leadership with a higher proposal number. If the leader crashes or its lease expires, another node can attempt Phase 1 to become the new leader.

4. Raft Algorithm

Raft is like Paxos’s friendly younger sibling – it was designed to be easier to understand while solving the same consensus and leader election problems.

Raft divides the problem into parts, one of which is leader election.

Raft Algorithm
Raft Algorithm

Here’s how Raft elects a leader in a cluster of nodes (e.g., think of systems like etcd or Consul which use Raft internally):

  • States: Each node in Raft can be a Follower, Candidate, or Leader. Normally everyone starts as a follower. There’s at most one leader at a time, and followers trust the leader to tell them what to do (i.e., replicate logs).

  • Election Timeout: Followers expect to get heartbeat messages from a current leader. If a follower doesn’t hear from a leader for a random timeout period (say 150–300ms, randomized per node), it assumes the leader might be down and becomes a Candidate.

  • Voting: The candidate node bumps its term (an epoch number), votes for itself, and asks all other nodes for their vote (via a RequestVote RPC). Other nodes will vote for this candidate if they haven’t voted in the current term and if the candidate’s log is up-to-date.

  • Winner: If the candidate gathers a majority of votes, it becomes the new Leader. It then starts sending out heartbeats to all other nodes to assert its leadership and prevent new elections.

  • If No Winner: It’s possible in a round of voting that no candidate gets a majority (e.g., a split vote). In that case, after a timeout, a new election term begins and perhaps a different node will randomly timeout first and become candidate. Randomizing the timeouts makes it unlikely for the split to happen repeatedly, so eventually someone wins.

5. ZooKeeper’s Leader Election (Ephemeral Nodes)

ZooKeeper is a distributed system that exposes a simple filesystem-like API, allowing distributed processes to coordinate with znodes (ZooKeeper’s data nodes).

It’s often used for things like configuration, naming, and yes, leader election.

Zookeeper Algorithm
Zookeeper Algorithm

How ZooKeeper Leader Election Works

One of ZooKeeper’s recipes for leader election uses ephemeral sequential znodes:

  • Each participant in the election creates an ephemeral sequential node in a designated path (e.g., /election path). “Ephemeral” means the node will vanish if the creator session disconnects (i.e., if the process dies), and “sequential” means ZooKeeper appends a unique sequence number to the node’s name.

  • For example, five nodes might create /election/node_00000001, /election/node_00000002, ... up to 00000005 in the order they connected. ZooKeeper assigns these sequence numbers automatically in increasing order.

  • All nodes then check the list of children in /election. The one with the smallest sequence number is the leader (it essentially signifies it was the first or oldest participant).

  • Each node sets a watch on the node that is just before it in sequence. For instance, node_00000005 watches node_00000004, etc. This way, if the leader (smallest node) dies and its znode disappears, the next smallest node’s watch triggers – it will see that its predecessor is gone, meaning it is now the leader. This cascades on down the line.

  • The new leader can then perform whatever coordinator duties are needed, and the system can even create a new ephemeral node to signify the new leader.

Comparison of Leader Election Algorithms

Let’s summarize the key differences among these approaches:

AlgorithmHow It WorksProsCons
BullyHighest-ID node wins by challenging othersSimple to implementHigh message overhead
RingToken circulates ring, highest ID winsEfficient message useNeeds recovery for broken rings
PaxosLeader elected via consensus roundStrong consistencyComplex protocol
RaftRandomized timeouts + votingEasier than Paxos, widely usedNeeds quorum to function
ZooKeeperLowest ephemeral znode winsFast and battle-testedRequires ZooKeeper service

Table: Overview of five leader election approaches, comparing their core ideas, strengths, and limitations.

Conclusion

Leader election is the secret sauce that keeps distributed systems coordinated.

Whether it’s the bully who asserts itself, a ring passing a token, or a complex dance of consensus protocols, each algorithm ensures that at any given time, one node is confidently calling the shots.

In practice, simpler algorithms like Bully or Ring might be used in small-scale systems or academic examples, while Paxos, Raft, and ZooKeeper’s methods underpin real-world databases, distributed caches, and large-scale services we use every day.

No leader lasts forever – but with a good election algorithm, your system will hardly miss a beat when it’s time to “vote” for a new one.

System Design Fundamentals
System Design Interview

What our users say

Nathan Thomas

My newest course recommendation for all of you is to check out Grokking the System Design Interview on designgurus.io. I'm working through it this month, and I'd highly recommend it.

AHMET HANIF

Whoever put this together, you folks are life savers. Thank you :)

Eric

I've completed my first pass of "grokking the System Design Interview" and I can say this was an excellent use of money and time. I've grown as a developer and now know the secrets of how to build these really giant internet systems.

More From Designgurus
Image
One-Stop Portal For Tech Interviews.
Copyright © 2025 Design Gurus, LLC. All rights reserved.