On this page
The Mechanics of Distributed Data
The Problem of Network Latency
Understanding Strong Consistency
The Synchronous Write Process
The Tradeoffs of Strict Accuracy
Understanding Eventual Consistency
The Asynchronous Write Process
The Risk of Stale Reads
Balancing the System With Quorums
Resolving Data Conflicts Automatically
Conflict Resolution Strategies
The Impact of Network Partitions
Conclusion
System Design Fundamentals: Eventual vs Strong Consistency


On This Page
The Mechanics of Distributed Data
The Problem of Network Latency
Understanding Strong Consistency
The Synchronous Write Process
The Tradeoffs of Strict Accuracy
Understanding Eventual Consistency
The Asynchronous Write Process
The Risk of Stale Reads
Balancing the System With Quorums
Resolving Data Conflicts Automatically
Conflict Resolution Strategies
The Impact of Network Partitions
Conclusion
Software platforms managing thousands of operations per second encounter a severe hardware bottleneck.
A single central database server can only process finite traffic before its processor maximizes and its memory fills completely.
Engineers solve this structural limitation by distributing the database workload across a network of multiple interconnected machines.
Splitting a single database across multiple physical machines creates an entirely new engineering challenge. Data saved on the first machine must be perfectly duplicated onto all the other machines in the cluster.
Because digital data cannot travel instantly across physical cables, there is always a brief window where the connected machines hold completely different information.
Understanding how software architecture handles this exact window of mismatched data is critical for system design.
The Mechanics of Distributed Data
When engineers build a distributed system, they connect independent computers together. Each computer within this active network is called a node.
These nodes work together to form one single logical database cluster.
The primary goal of this architecture is to ensure data remains completely safe even if physical hardware is destroyed.
To achieve this safety, the cluster must copy every piece of incoming data to multiple nodes. This automated background process is known as replication.
Replication guarantees that if a specific node breaks, the saved data survives on the remaining operational nodes.
The Problem of Network Latency
While replication provides necessary safety, it introduces a severe synchronization bottleneck. Data must actively travel between nodes through physical fiber optic cables.
This physical travel time creates an easily measurable delay called network latency.
Because of network latency, it is impossible for all nodes to receive updates at the exact same microsecond.
This unavoidable delay creates a brief period where the cluster is fractured. The receiving node that processed the initial data holds the absolute newest version.
The other nodes still hold the older version of the data.
The rules defining how the system responds to read requests during this fractured period are called consistency models.
Understanding Strong Consistency
Strong consistency is a strict set of architectural rules prioritizing absolute data accuracy. This model guarantees that any read request will always return the result of the most recent write operation.
If a piece of data is successfully updated, no node is allowed to return the older version.
This model makes the distributed cluster behave exactly like a single unified machine.
Developers building applications on top of the database never have to worry about receiving outdated information.
To enforce this strict uniformity, the database uses a mechanism called synchronous replication. This tight coordination requires a highly specific write process.
The Synchronous Write Process
When a client application sends a new data payload, the designated primary node receives the request.
The primary node applies a strict internal lock to that specific database record.
This digital lock actively blocks any other operations from reading or modifying the record. The primary node then saves the data locally.
Instead of confirming success to the client application, the primary node pauses its workflow. It forwards the exact same data payload across the network to all secondary replica nodes.
The primary node then sits in an idle waiting state. It actively waits for network confirmation responses from the other connected machines.
Every single secondary node must receive the data and successfully save it locally.
Once saved, each secondary node sends a success acknowledgment message back to the primary node.
Only after the primary node collects a confirmation from every machine does it unlock the data record.
Finally, the primary node sends a definitive success message back to the client application. Because the primary node forced every machine to update before finishing the operation, the entire cluster is perfectly synchronized.
Any subsequent read request will safely access the newly updated data.
The Tradeoffs of Strict Accuracy
The massive engineering benefit of strong consistency is perfect data reliability. However, this perfection comes with severe backend performance penalties.
The most significant penalty is incredibly high latency. The total time required to process a single write operation increases drastically because the primary node must wait for multiple network trips.
If the cluster contains ten database nodes, the primary node must wait for all ten nodes to finish their local saving process.
This mandatory waiting period severely limits how many write operations the database can process per second.
Furthermore, strong consistency heavily reduces overall system availability.
If a single secondary node loses power, it cannot send a confirmation message.
The primary node will keep waiting indefinitely for a response that will never arrive.
To strictly protect data accuracy, the database will completely reject new write requests until the broken node is repaired.
Understanding Eventual Consistency
Eventual consistency provides a completely different architectural approach to cluster synchronization.
This model guarantees that if no new data updates occur, all nodes will eventually hold the identical data state. It explicitly does not force the cluster to be perfectly synchronized at every given millisecond.
This model actively prioritizes massive processing speed and continuous system uptime. It intentionally allows different nodes to temporarily hold mismatched versions of the data.
To achieve this extreme processing speed, eventual consistency uses a mechanism called asynchronous replication.
The primary node processes incoming data without worrying about the immediate status of the secondary nodes.
The Asynchronous Write Process
When a client application sends new data, the primary node accepts the payload immediately.
The primary node instantly saves the data to its own local storage drive. It does not lock the database record globally across the cluster.
The very millisecond the local save is successfully completed, the primary node sends a success message back to the client.
The client application is instantly freed to move on to its next computational task.
The primary node completely bypasses the synchronous network waiting period entirely.
Behind the scenes, the primary node simply begins forwarding the new data to the secondary nodes. This is an invisible automated background process.
The secondary nodes will eventually receive the data and update their own local storage drives completely independently.
The Risk of Stale Reads
The biggest architectural advantage of asynchronous replication is lightning fast performance.
Write operations finish almost instantly because the primary node only interacts with its own hard drive.
The system can process enormous volumes of incoming data without getting bogged down by slow network communications.
This model also provides incredible system availability.
If half of the secondary nodes suddenly crash, the primary node continues to accept new writes normally.
The primary node simply queues the background updates and waits for the broken nodes to reboot.
The necessary operational cost of this speed is the occurrence of stale reads.
A stale read happens when a client queries a secondary node before the background update has fully arrived.
The secondary node checks its local storage and naturally returns the older version of the data.
For many large scale logging systems, serving a stale read is an acceptable architectural compromise.
The data is only mathematically inaccurate for a very brief window of time. Once the background synchronization naturally completes, the database heals itself automatically.
Balancing the System With Quorums
System design engineers are not forced to choose strictly between perfect accuracy and complete asynchronous freedom.
Modern distributed databases use a highly tunable mathematical concept called a quorum to control the consistency level.
A quorum is the absolute minimum number of nodes that must acknowledge an operation before the database considers it successful.
The write quorum defines exactly how many nodes must successfully save new data before confirming success to the application.
The read quorum defines how many different nodes the system must actively query during a read request.
When querying multiple nodes, the system compares the data versions and always returns the absolutely newest one.
Engineers ensure that the sum of the read quorum and the write quorum is strictly greater than the total number of nodes in the cluster.
If a database cluster has five total nodes, an engineer might set the write quorum to three and the read quorum to three. Because six is strictly greater than five, an overlap is mathematically guaranteed.
At least one node in the read quorum will absolutely hold the newest data from the write quorum. This overlapping technique provides perfect strong consistency. It operates significantly faster than pure synchronous replication because the primary node only waits for three nodes instead of all five.
Resolving Data Conflicts Automatically
Eventual consistency introduces a severe technical complication regarding simultaneous data updates.
Because nodes operate independently without locking data globally, two different client applications might attempt to modify the exact same record simultaneously. They might send completely conflicting updates to two different nodes at the exact same millisecond.
Because the receiving nodes are not coordinating synchronously, they both accept the writes locally.
The database cluster now fundamentally holds two completely different versions of the truth. When these nodes attempt to synchronize in the background, a data conflict officially occurs.
Conflict Resolution Strategies
The database must possess an automated mathematical rule to decide which piece of data to keep and which to destroy.
The most widely utilized conflict resolution strategy is called Last Write Wins.
When a client writes data to a receiving node, the database automatically attaches a highly precise digital timestamp to the payload.
When two nodes exchange their background updates and discover a collision, they strictly compare the attached timestamps. The database software strictly keeps the data payload featuring the most recent timestamp. The older data is permanently deleted from the entire cluster.
While highly efficient, it relies entirely on the absolute accuracy of server hardware clocks.
In a distributed physical network, it is physically impossible to keep hardware clocks perfectly synchronized. A slight clock difference can cause the database to accidentally delete the genuinely newer data.
To safely avoid relying on flawed hardware clocks, advanced database systems use vector clocks.
A vector clock is a purely logical counting mechanism. Instead of recording the physical time, the database attaches a mathematical array of integer counters directly to the data record.
Every time a node successfully updates the record, it automatically increments its specific counter by one. When nodes synchronize, the database engine compares these mathematical counters rather than hardware timestamps.
By analyzing the exact sequence of the internal counters, the system can definitively prove the chronological order of events.
The Impact of Network Partitions
To fully master these consistency models, engineers must carefully examine how systems handle catastrophic hardware failures.
The most difficult physical failure scenario is a network partition.
A partition occurs when routing equipment fails, completely severing the communication cables between different groups of nodes.
During a network partition, all the isolated nodes might still be powered on and processing local traffic perfectly. However, they are physically incapable of exchanging synchronization messages with the nodes on the other side. This absolute physical separation forces the database architecture into a mandatory technical dilemma.
The system must automatically choose whether to forcefully maintain data accuracy or maintain operational availability.
If configured for strong consistency, a network partition acts as an operational disaster. Because isolated nodes cannot verify identical data states, they cannot safely process new writes.
To strictly maintain internal mathematical accuracy, the strongly consistent system immediately sacrifices its availability.
The database will completely reject all new write operations from client applications. The entire platform will effectively freeze until infrastructure engineers correctly restore network communication.
If configured for eventual consistency, the database handles a partition with complete operational fluidity. The isolated nodes simply ignore the broken connections and continue accepting new traffic independently. They process read and write operations locally without attempting to communicate with the other side.
This ensures maximum system availability during severe physical network outages.
The deliberate tradeoff is that the isolated nodes will accumulate vast amounts of severely conflicting data.
Once the internal network is repaired, the nodes will automatically merge the divergent datasets using background processes.
Conclusion
Understanding the deep internal mechanics of data synchronization is a mandatory skill for modern backend development.
-
Distributing a database across multiple server nodes prevents single points of hardware failure.
-
Network latency causes an unavoidable synchronization delay when copying data between physical machines.
-
Strong consistency utilizes strictly synchronous replication to guarantee read operations return perfect data.
-
Synchronous replication heavily increases system latency and critically reduces overall database availability.
-
Eventual consistency utilizes asynchronous replication to provide massive processing speed and high uptime.
-
Asynchronous replication introduces the operational risk of clients reading temporarily outdated data records.
-
Quorums allow system engineers to mathematically tune the exact balance between latency and accuracy.
-
Conflict resolution strategies automatically handle simultaneous data updates using vector clocks or precise timestamps.
What our users say
Nathan Thomas
My newest course recommendation for all of you is to check out Grokking the System Design Interview on designgurus.io. I'm working through it this month, and I'd highly recommend it.
Steven Zhang
Just wanted to say thanks for your Grokking the system design interview resource (https://lnkd.in/g4Wii9r7) - it helped me immensely when I was interviewing from Tableau (very little system design exp) and helped me land 18 FAANG+ jobs!
pikacodes
I've tried every possible resource (Blind 75, Neetcode, YouTube, Cracking the Coding Interview, Udemy) and idk if it was just the right time or everything finally clicked but everything's been so easy to grasp recently with Grokking the Coding Interview!
Designgurus on Substack
Deep dives, systems design teardowns, and interview tactics delivered daily.
Access to 50+ courses
New content added monthly
Certificate of completion
$29.08
/month
Billed Annually
Recommended Course

Grokking the Object Oriented Design Interview
58,682+ students
3.9
Learn how to prepare for object oriented design interviews and practice common object oriented design interview questions. Master low level design interview.
View Course