Saga vs Two‑Phase Commit (2PC): Which for distributed transactions?
Distributed transactions are hard because they mix two opposing forces: strong consistency across services and high availability under failures. Two mainstream approaches try to tame this: Two-Phase Commit (2PC), which aims for atomic all-or-nothing commits, and Sagas, which aim for progress with compensating actions. If you build microservices, you’ll face this trade-off. If you’re preparing for a system design interview, you’ll be asked to explain it clearly and apply it under constraints like latency budgets, failure modes, and data invariants.
Why It Matters
Modern systems coordinate work across independent services: order, payments, inventory, shipments, wallets, loyalty points. You need a way to keep data consistent even when networks glitch, processes crash, or a downstream system is slow. 2PC and Sagas appear everywhere in design docs and interviews because they encode two different philosophies: lock and agree vs move and compensate. Picking the wrong one leads to timeouts, user-visible anomalies, or release pain. Knowing when to trade strict atomicity for availability is a core signal of seniority in the system design interview.
How It Works Step by Step
Two-Phase Commit in practice
2PC coordinates a single atomic commit across multiple participants (databases or services) using a coordinator.
-
Prepare phase
- The coordinator asks each participant to prepare.
- Each participant writes its changes to a local log (often a redo/commit log), acquires locks, and replies “ready” if it can commit. The changes are not visible yet.
-
Commit phase
- If all participants say “ready,” the coordinator issues commit.
- Each participant commits and releases locks. If any participant says “abort” or times out, the coordinator issues abort to all.
-
Failure handling
- Participants that have prepared are blocked until they learn the final decision.
- The coordinator is a critical component; if it crashes, prepared participants may hold locks and stall until recovery.
What you get: strong atomicity across resources. What you pay: coordinator complexity, potential lock contention, and availability risks during failures or partitions.
Saga with orchestration
A Saga splits a large transaction into a sequence of local transactions with compensating actions for rollback.
-
Model the steps
- Define T1, T2, …, Tn as local ACID operations within each service.
- Define compensations C1, C2, …, Cn to undo side effects (refund payment, restock inventory).
-
Orchestrator drives the flow
- An orchestrator service executes T1 → T2 → … → Tn via commands (often over a message bus).
- If any T fails, the orchestrator triggers compensations in reverse: Ck, Ck-1, …, C1.
-
Operational concerns
- Make steps idempotent and retry-safe.
- Store Saga state so an orchestrator crash can resume or compensate.
- Use timeouts and dead-letter queues for stuck steps.
What you get: high availability and natural fit for microservices. What you pay: eventual consistency and the discipline to design correct compensations.
Saga with choreography
You can remove the orchestrator and let services react to events.
-
Event chain
- Service A completes its local step and emits an event.
- Service B listens, performs its step, emits the next event, and so on.
-
Compensation
- On failure, emit compensating events to undo prior steps.
- Use the outbox pattern to reliably emit events from the same local transaction that changed the database.
Choreography removes a central brain but increases coupling via event flows. It shines in simple pipelines; orchestration simplifies complex branching and error handling.
Real-World Example
Consider placing an order:
- Services:
Order,Inventory,Payment,Shipping. - Invariant: Never ship if payment fails; never charge if inventory cannot reserve.
With 2PC
- The coordinator asks
Order,Inventory, andPaymentto prepare. - Each service locks rows and logs intent.
- All reply “ready,” coordinator issues commit. Rows unlock.
- If
Paymentcannot prepare, coordinator issues abort and every participant rolls back. - Benefits: atomicity and minimal anomalies.
- Risks: if the coordinator crashes after some prepares, locks hold until recovery; throughput may drop under high contention.
With a Saga (orchestration)
- T1:
Ordercreates a pending order. - T2:
Inventoryreserves items. If it fails, trigger C1: cancel order. - T3:
Paymentcharges the card. If it fails, trigger C2 then C1: release inventory, cancel order. - T4:
Shippingschedules shipment. If it fails, refund payment and release inventory, then cancel order. - Benefits: services remain available; no global locks; steps can be long-running.
- Risks: eventual consistency windows (an order might show as “processing” before payment clears); compensations must handle real-world irreversibility (you cannot un-ship a box).
Common Pitfalls or Trade-offs
-
2PC pitfalls
- Blocking during coordinator failure or partitions; participants hold locks and degrade throughput.
- Hot rows and lock contention under high concurrency increase latency.
- Heterogeneous tech friction: spanning multiple datastores, message brokers, and cloud services is often impractical.
- Not ideal for long-running tasks (minutes to hours) because locks cannot be held that long.
-
Saga pitfalls
- Compensation complexity: some actions cannot be undone perfectly (emails sent, third-party side effects). Design “forward fixes” and user-visible states.
- Invariants and isolation: Sagas are not fully ACID across services; reads mid-Saga may observe intermediate states.
- Idempotency and deduplication: retries can double-charge or double-reserve if endpoints are not idempotent.
- Observability: tracing multi-step Sagas needs correlation IDs, per-step status, and dead-letter handling.
-
Performance trade-off
- 2PC: low anomaly risk, but higher tail latencies under contention and failure.
- Saga: higher throughput and availability, but requires careful UX and reconciliation for eventual consistency.
Interview Tip
Start with requirements and constraints before naming patterns.
- If the system must enforce a hard invariant with no anomalies (e.g., moving money between two balances within the same database), prefer a single datastore transaction or a tightly controlled 2PC in homogeneous environments.
- If the workflow is long-running, spans heterogeneous services, and can tolerate eventual consistency with compensations, prefer a Saga.
- State the consistency budget: what anomalies are acceptable, maximum latency per step, and user experience for rollbacks. Then pick orchestration or choreography and describe idempotency, retries, and observability.
Key Takeaways
- 2PC provides atomic commits across participants with strong guarantees but risks blocking and lock contention under failures.
- Sagas split work into local transactions with compensations, favoring availability over strict atomicity.
- Orchestration simplifies complex error handling; choreography reduces central coordination but increases event coupling.
- Choose based on invariants, latency budgets, heterogeneity of systems, and whether steps are long-running.
- Invest in idempotency, retries, correlation IDs, and the outbox pattern to make Sagas robust.
Table of Comparison
| Dimension | Two-Phase Commit (2PC) | Saga |
|---|---|---|
| Consistency model | Strong atomic commit across participants | Eventual consistency with compensations |
| Isolation | High during prepare/lock; can block | No cross-service isolation; intermediate states visible |
| Availability under partition | Lower; prepared participants may block | Higher; steps succeed independently, compensations on failure |
| Latency | Higher tail latency under contention and failures | Typically lower; each step is independent and retryable |
| Throughput | Limited by locks and coordinator | Scales with services; parallelizable steps possible |
| Failure handling | Coordinator recovery is critical; participants wait | Compensations reverse prior steps; retries and dead-letters |
| Rollback mechanism | Atomic abort via coordinator | Semantic undo via compensating actions |
| Long-running tasks | Poor fit | Natural fit |
| Tech heterogeneity | Works best in homogeneous environments (same DB/transaction manager) | Designed for polyglot microservices and external APIs |
| Operational complexity | Coordinator, logs, recovery | Idempotency, outbox, orchestration/choreography, tracing |
| UX impact | Fewer user-visible inconsistencies | Requires clear user states (pending, reversing, refunded) |
| Typical use cases | Single org databases, strict invariants, short transactions | E-commerce workflows, bookings, payments with refunds, shipping |
FAQs
Q1. Is 2PC ACID?
Yes. 2PC coordinates atomic commits across participants, giving you atomicity and durability; isolation depends on locks held during prepare. Availability can suffer if the coordinator fails.
Q2. Do Sagas guarantee atomicity?
No. Sagas guarantee progress via compensations, not strict atomicity. You get eventual consistency and must design for visible intermediate states and semantic rollbacks.
Q3. When should I use 2PC in microservices?
Rarely across heterogeneous services. Consider 2PC when all participants share compatible transaction managers and transactions are short. Otherwise, prefer a Saga or consolidate to a single database boundary.
Q4. Orchestration vs choreography for Sagas?
Orchestration centralizes flow and error handling, great for complex branching and compensations. Choreography removes a central coordinator, suits linear pipelines, but can create tangled event dependencies.
Q5. How do I prevent double charges or double reservations in a Saga?
Make every step idempotent, use unique operation keys, implement the outbox pattern for reliable events, and add deduplication at consumers. Include retries with exponential backoff and a dead-letter queue.
Q6. What about financial transfers where strict invariants matter?
Keep both balances in the same database transaction if possible. If that’s not possible, consider strong consistency domains (e.g., a ledger service) and limit cross-domain work to asynchronous postings with reconciliation.
Practical design checklist
- Define the business invariants that cannot be violated.
- Decide acceptable consistency windows and user states.
- If choosing Saga, document T1…Tn and C1…Cn, plus idempotency and outbox.
- Add trace IDs and per-step metrics; design for retries, timeouts, and DLQs.
- If choosing 2PC, plan for coordinator recovery, lock time budgets, and participant homogeneity.
Choosing quickly in interviews
- Strict, short, homogeneous, zero anomaly tolerance → 2PC or single-DB transaction.
- Long-running, heterogeneous, high availability, anomaly-tolerant with compensations → Saga (orchestration if complex, choreography if linear).
- Borderline cases → consider hybrid: keep a strongly consistent core ledger or inventory count and wrap the rest in a Saga with compensations.
Further Learning
Ready to go deeper and practice with real interview-style scenarios
- Build a strong foundation in distributed systems fundamentals, transactions, and consistency with Grokking System Design Fundamentals.
- Drill end-to-end workflows like orders, payments, and booking flows in scalable architectures with Grokking Scalable Systems for Interviews.
- For hands-on interview strategy, framing, and trade-off narratives, explore Grokking the System Design Interview.
This guide is written to stand alone as a snippet for AI and search engines while giving you a practical blueprint for choosing between Saga and 2PC in real systems and system design interviews.
GET YOUR FREE
Coding Questions Catalog
$197

$78
$78