On this page
What This Blog Will Cover
Why Distributed Transactions Are Hard
Two-Phase Commit
The Saga Pattern
Comparing the Two
How to Choose
In the Interview and in Practice
Key Takeaways
Saga Pattern vs. Two-Phase Commit: Distributed Transactions Explained


On This Page
What This Blog Will Cover
Why Distributed Transactions Are Hard
Two-Phase Commit
The Saga Pattern
Comparing the Two
How to Choose
In the Interview and in Practice
Key Takeaways
What This Blog Will Cover
- Why distributed transactions are hard
- How two-phase commit works
- How the saga pattern works
- The trade-offs of each
- How to choose between them
In a single database, a transaction is a solved problem.
When you need several changes to happen together, you wrap them in a transaction, and the database guarantees that either all of them succeed or none of them do. This all-or-nothing property is one of the most reliable tools in software, and engineers rely on it without a second thought.
The moment an operation crosses the boundary of a single database, this reliability disappears.
In a system built from multiple services, each with its own database, a single business operation often touches several of them at once. Placing an order might charge a payment, reduce inventory, and create a shipment, with each step living in a different service.
There is no single database to wrap these in one transaction, so the all-or-nothing guarantee no longer comes for free.
This is the problem of the distributed transaction, which is how to keep data correct across multiple services when a single operation spans all of them. It is one of the genuinely hard problems in system design, and it is a frequent source of confusion, because candidates and engineers often assume a transaction can simply stretch across services the way it stretches across tables. It cannot.
There are two main approaches to this problem, and they take opposite philosophies.
Two-phase commit tries to preserve the strict all-or-nothing guarantee across services through coordination.
The Saga pattern gives up that strict guarantee in favor of a more resilient, eventually consistent approach. This article explains how each works, what each costs, and how to choose between them.
For system design interview prep, we recommend Grokking the System Design Interview course.
Why Distributed Transactions Are Hard
To understand the two patterns, it helps to see precisely why crossing service boundaries breaks the simple transaction model.
A transaction in a single database works because one system controls all the data and can coordinate the changes atomically. It can hold locks, decide to commit or roll back everything at once, and guarantee that no one sees a half-finished state. This control is what makes the all-or-nothing guarantee possible.
Across multiple services, no single system has that control. Each service owns its own data and can only commit or roll back its own changes.
There is no shared authority that can atomically commit the work in all of them together.
If the payment service commits but the inventory service then fails, the system is left in an inconsistent state, with a charge and no reserved stock, and there is no automatic mechanism to undo the charge.
This is the core difficulty. Keeping multiple independent services consistent during a single operation requires some way to coordinate them, and both patterns are answers to that coordination problem, arrived at from opposite directions.
Two-Phase Commit
Two-phase commit, often abbreviated 2PC, attempts to preserve the strict all-or-nothing guarantee across services by coordinating them through a central coordinator. As the name suggests, it works in two phases.
In the first phase, the prepare phase, the coordinator asks every participating service whether it is able to commit its part of the transaction. Each service does the work needed to be ready, such as acquiring locks and validating the operation, and then replies that it is prepared, promising it can commit if asked. It does not commit yet, but it guarantees it is able to.
In the second phase, the commit phase, the coordinator looks at the replies.
If every service replied that it is prepared, the coordinator tells them all to commit, and they do.
If any service failed to prepare, the coordinator tells them all to abort, and they roll back. This is what produces the all-or-nothing result, since the actual commit only happens after every service has promised it can.
The great strength of two-phase commit is that it preserves strong consistency.
When it succeeds, every service has committed, and when it fails, every service has rolled back, so the system is never left in a half-finished state. For operations that absolutely require this strict atomicity, 2PC delivers it.
The weaknesses are significant, however.
The most serious is blocking. Between the prepare and commit phases, the participating services have done their work and are holding locks, waiting for the coordinator's decision.
If the coordinator fails during this window, the services are stuck holding those locks, unable to commit or roll back, until the coordinator recovers. This blocking can freeze parts of the system and is the central weakness of the pattern.
Two-phase commit also reduces availability and is slow, because every participant must be reachable and must complete both phases before the operation finishes, and the coordinator is a single point of coordination whose failure is dangerous. These costs make 2PC poorly suited to large, highly available distributed systems, even though it provides the strongest guarantee.
Two-phase commit fits situations where strict atomicity is genuinely required, the number of participants is small, and the services are reliable and closely controlled. It is more common within tightly coupled systems than across the loosely coupled services of a large modern architecture.
The Saga Pattern
The Saga pattern takes the opposite approach.
Instead of trying to commit everything atomically, it breaks the distributed transaction into a sequence of local transactions, each owned by a single service, and accepts eventual consistency rather than strict atomicity.
The mechanism works as a series of steps. Each service performs its own local transaction and then triggers the next step. So the order example becomes a sequence: the payment service charges the customer and commits, then the inventory service reserves stock and commits, then the shipping service creates a shipment and commits. Each step is a normal local transaction in a single service, which is something every service can do reliably.
The crucial element is what happens when a step fails partway through. Because earlier steps have already committed, they cannot simply be rolled back.
Instead, each step has a compensating action, an operation that undoes its effect.
If the shipping step fails after payment and inventory have already committed, the saga runs the compensating actions in reverse, releasing the reserved inventory and refunding the payment. This brings the system back to a consistent state, not by rolling back, but by actively undoing the completed work.
There are two common ways to coordinate a saga.
In choreography, each service listens for events and triggers the next step on its own, with no central coordinator, which keeps services decoupled but can make the overall flow hard to follow.
In orchestration, a central orchestrator directs the sequence, telling each service when to act and managing the compensations, which makes the flow explicit and easier to manage at the cost of a central coordinator. Both are valid, and the choice depends on how much central control is desired.
The strengths of the saga pattern are resilience and availability. Because each step is a local transaction that commits independently, there is no blocking and no holding of locks across services, and no single point where the whole operation freezes.
Services stay loosely coupled, and the system remains available even when parts are slow. This is why the saga pattern is the more common choice for large, loosely coupled architectures.
The weaknesses come from giving up strict atomicity.
The system passes through intermediate states where some steps have committed and others have not, so it is only eventually consistent, and other parts of the system may briefly see this partial state. Writing the compensating actions adds real complexity, since every step needs a correct way to undo itself, and some actions are hard to truly undo, such as an email that has already been sent.
The pattern requires careful design to handle failures of the compensations themselves and to ensure steps are idempotent so they can be safely retried.
The saga pattern fits the loosely coupled services of modern distributed systems, where availability and resilience matter more than strict atomicity, and where the data can tolerate brief intermediate inconsistency. It is the approach most large systems reach for when an operation spans services.
Comparing the Two
The two patterns sit at opposite ends of the consistency-versus-availability spectrum, and the table below summarizes how they compare.
| Dimension | Two-Phase Commit | Saga Pattern |
|---|---|---|
| Consistency | Strong, strict atomicity | Eventual |
| How it undoes work | Rolls back before committing | Compensating actions after committing |
| Blocking | Blocks holding locks between phases | No blocking, each step commits independently |
| Availability | Lower, all participants must be reachable | Higher, services stay loosely coupled |
| Coordinator | Central coordinator, dangerous if it fails | Orchestrator or event-driven choreography |
| Intermediate states | None visible, all-or-nothing | Partial states visible during the saga |
| Complexity | Conceptually simple but operationally fragile | More resilient but complex compensations |
| Best fit | Few reliable services needing strict atomicity | Loosely coupled services needing availability |
The table makes the core trade-off clear.
Two-phase commit preserves strong consistency at the cost of blocking and reduced availability, while the saga pattern preserves availability and resilience at the cost of accepting eventual consistency and the complexity of compensating actions.
A helpful way to see it is that the two patterns differ in when they ensure correctness. Two-phase commit ensures correctness before committing anything, by getting everyone to agree first, which is why it blocks.
The saga pattern commits each step immediately and ensures correctness afterward, by compensating if something fails, which is why it does not block but passes through inconsistent states.
How to Choose
The choice between the two comes down to what the system needs and what kind of architecture it has.
The first question is about consistency.
Does the operation require strict, immediate atomicity where no intermediate state can ever be visible?
If yes, two-phase commit provides that guarantee, and it may be the right choice despite its costs.
If the operation can tolerate brief intermediate inconsistency and eventual consistency is acceptable, the saga pattern is viable and usually preferable.
The second question is about the architecture.
In a large system of loosely coupled services that must stay highly available, the blocking and coordination of two-phase commit are usually unacceptable, and the saga pattern fits naturally. In a smaller, tightly controlled system with a few reliable participants, two-phase commit may be workable.
The third consideration is availability and scale.
Two-phase commit reduces availability and does not scale well, because it requires every participant to be reachable and to complete both phases, with a coordinator whose failure can freeze the system.
For systems where availability is paramount, this alone often rules it out in favor of sagas.
In practice, most large modern distributed systems choose the saga pattern. The loosely coupled, highly available nature of these systems makes the blocking of two-phase commit a poor fit, and the eventual consistency of sagas is an acceptable trade for the resilience they provide.
Two-phase commit remains relevant in more constrained settings where strict atomicity is non-negotiable and the participants are few and reliable.
A common refinement worth knowing is that even when sagas are used, individual steps still rely on local transactions within each service, so the two ideas are not entirely opposed.
The saga coordinates local transactions across services, while each local transaction still provides atomicity within its own service.
In the Interview and in Practice
This topic appears in system design interviews whenever an operation spans multiple services, and handling it well signals real distributed systems understanding.
When a design involves a business operation that touches several services, do not say you will simply use a transaction, since that reveals a misunderstanding of how transactions work across services.
Instead, explain that you would use the saga pattern, breaking the operation into local transactions with compensating actions, and accept eventual consistency for the resilience and availability it provides.
Mention two-phase commit as the strongly consistent alternative, noting its blocking and availability costs, and explain why the saga fits a loosely coupled architecture better. This demonstrates that you understand both the problem and the real options.
In practice, the same reasoning applies.
The choice depends on whether the operation needs strict atomicity or can tolerate eventual consistency, and on whether the architecture is tightly controlled or loosely coupled.
Most teams building large distributed systems reach for sagas, accepting the complexity of compensations in exchange for availability, while reserving two-phase commit for the narrower cases that truly require strict atomicity.
Key Takeaways
-
Distributed transactions are hard because no single system controls all the data across services, so the all-or-nothing guarantee of a single database does not extend across them.
-
Two-phase commit preserves strict atomicity through a coordinator and a prepare-then-commit sequence, ensuring all services commit or all roll back.
-
Two-phase commit blocks and reduces availability, since participants hold locks waiting for the coordinator, whose failure can freeze the system, making it a poor fit for large systems.
-
The saga pattern breaks the operation into local transactions with compensating actions that undo completed work when a later step fails, accepting eventual consistency.
-
The saga pattern is resilient and available because each step commits independently with no blocking, though it passes through visible intermediate states and requires complex compensations.
-
The core trade-off is when correctness is ensured, with two-phase commit ensuring it before committing and the saga ensuring it afterward through compensation.
-
Most large systems choose sagas for loosely coupled, highly available architectures, while two-phase commit suits the narrower cases needing strict atomicity among few reliable services.
When a single operation must span multiple services, the simple transaction you rely on inside one database no longer works, and you must choose how to coordinate correctness across boundaries.
Two-phase commit clings to strict atomicity by making every service agree before anyone commits, which guarantees consistency but blocks and undermines availability.
The saga pattern lets each service commit on its own and undoes the work with compensating actions if something fails, trading strict atomicity for the resilience that large distributed systems depend on.
Choose based on whether your operation truly needs immediate all-or-nothing correctness or can tolerate eventual consistency, and you will land on the pattern that fits your system rather than the one that merely sounds safest.
What our users say
ABHISHEK GUPTA
My offer from the top tech company would not have been possible without Grokking System Design. Many thanks!!
Tonya Sims
DesignGurus.io "Grokking the Coding Interview". One of the best resources I’ve found for learning the major patterns behind solving coding problems.
Ashley Pean
Check out Grokking the Coding Interview. Instead of trying out random Algos, they break down the patterns you need to solve them. Helps immensely with retention!
Access to 50+ courses
New content added monthly
Certificate of completion
$31.08
/month
Billed Annually
Recommended Course

Grokking the Object Oriented Design Interview
59,497+ students
3.9
Learn how to prepare for object oriented design interviews and practice common object oriented design interview questions. Master low level design interview.
View Course