Grokking System Design Fundamentals
Ask Author
Back to course home

0% completed

Vote For New Content
Trade-offs in CAP Theorem
Table of Contents

Contents are not accessible

Contents are not accessible

Contents are not accessible

Contents are not accessible

Contents are not accessible

Now let's discuss the heart of CAP: the trade-offs. Why can’t a distributed system have Consistency, Availability, and Partition tolerance all at once? The simple answer: when a network partition happens, the system has to make a tough choice:

  • If the system decides to remain consistent (C), it cannot allow conflicting updates on different sides of the partition. So at least one side of the partition must stop accepting operations (to avoid divergence). That means some part of the system becomes unavailable (not serving requests). This is a CP choice – consistency over availability during partitions.

  • If the system decides to remain available (A), it must keep serving requests on all sides of the partition. That means each side might proceed independently, leading to inconsistencies (different parts have different data until they can sync up later). This is an AP choice – availability over consistency.

Because a real distributed system cannot prevent partitions, it cannot have both C and A in a partition. That’s the essence of CAP. In normal operation (no partitions), you might get both C and A, but CAP is really about the behavior under failure conditions. In other words, because a distributed system must be partition tolerant, the only choices are deciding between availability and consistency..

CP, AP, and CA Systems (with Examples)

Depending on which trade-off a system makes, we categorize it as CP, AP, or CA under CAP theorem:

  • CP (Consistency + Partition tolerance, no guarantee of Availability): These systems choose consistency over availability when a failure happens. If a network partition occurs, a CP system will refuse to accept some requests or answers on one side in order to keep the data consistent. Example: Apache ZooKeeper is a CP system – it will shut down or refuse requests if it loses a quorum of nodes, ensuring no two clients see different data. In other words, if ZooKeeper cannot guarantee correct, consistent behavior, it will not respond at all (sacrificing availability). MongoDB (in its default replicated setup) is another example often considered CP. If the primary node in a MongoDB replica set goes down, the system halts writes until a new primary is elected, pausing availability briefly to maintain a single up-to-date source of truth. These systems are suitable for use cases where data integrity is paramount – for example, financial applications prefer CP, as seen in banking: it’s better for the system to refuse an operation than to allow inconsistent money transfers.

  • AP (Availability + Partition tolerance, no immediate Consistency guarantee): These systems choose availability over strict consistency. During a partition, they allow all nodes to continue operating, even if that means some clients see out-of-date data. The system will reconcile differences later (thus usually providing eventual consistency). Examples: Apache Cassandra is a classic AP datastore – it’s designed to always accept reads/writes on any node, even during network issues, and resolves conflicts asynchronously, so it “delivers availability and partition tolerance but can’t deliver consistency all the time.”. Similarly, Amazon’s DynamoDB (based on the Dynamo paper) is AP. In fact, Amazon’s Dynamo was explicitly designed for an “always-on” shopping cart service where the requirement was that the system must accept reads/writes even amid failures; to achieve this, Dynamo sacrifices consistency under certain failure scenarios. DNS (the domain name system) is another everyday example of AP: it’s virtually always available globally, but updates (like changing a domain’s IP) take time to propagate, meaning not everyone sees the change immediately. AP systems are great for use cases like caches, shopping carts, or social media feeds – where it’s more important that the system is up than absolutely in sync at every moment.

  • CA (Consistency + Availability, no Partition tolerance): These systems provide consistency and availability as long as the system is whole, but they are not partition-tolerant. In practice this means the system will fail or become totally unavailable if a partition occurs, so it can only run in scenarios where partitions are negligible (e.g., within a single server or local cluster with reliable network). A common example: a single-node relational database can be viewed as CA – it is consistent (thanks to ACID properties) and available (when the node is up), but if that one node fails, or if you consider a partition between the app and the db, the system doesn’t gracefully tolerate it (it just fails). Some relational DB clusters that use synchronous replication can also be seen as CA until a network split happens – at which point they usually stop functioning rather than allow divergence. Partition tolerance plays a “less important role” in these systems, but in a true distributed environment you can’t really avoid P. It’s important to note that pure CA systems are theoretical in distributed context, as for all practical purposes a fully CA distributed database cannot exist because you can’t avoid network faults. Thus, CA is usually only achievable in a single-site or tightly coupled system.

.....

.....

.....

Like the course? Get enrolled and start learning!

Table of Contents

Contents are not accessible

Contents are not accessible

Contents are not accessible

Contents are not accessible

Contents are not accessible