On this page
The Dual-Write Problem
Why Retries Don't Solve the Problem
Why Not Use Distributed Transactions?
The Core Idea Behind the Transactional Outbox Pattern
The Outbox Table
Step-by-Step Flow
Step 1
Step 2
Step 3
Step 4
Step 5
Step 6
Step 7
Step 8
Why This Works
Benefits of the Transactional Outbox Pattern
Trade-Offs
When Should You Use It?
Outbox vs Event Sourcing
Outbox vs Change Data Capture (CDC)
How Interviewers Expect You to Explain It
Common Interview Questions
Learning Distributed Systems the Right Way
Final Thoughts
Transactional Outbox Pattern: How to Solve the Dual-Write Problem


On This Page
The Dual-Write Problem
Why Retries Don't Solve the Problem
Why Not Use Distributed Transactions?
The Core Idea Behind the Transactional Outbox Pattern
The Outbox Table
Step-by-Step Flow
Step 1
Step 2
Step 3
Step 4
Step 5
Step 6
Step 7
Step 8
Why This Works
Benefits of the Transactional Outbox Pattern
Trade-Offs
When Should You Use It?
Outbox vs Event Sourcing
Outbox vs Change Data Capture (CDC)
How Interviewers Expect You to Explain It
Common Interview Questions
Learning Distributed Systems the Right Way
Final Thoughts
Imagine a customer places an order on your e-commerce platform.
Your Order Service writes the order into the database.
Immediately afterward, it publishes an OrderCreated event to Kafka so that Inventory, Payments, Notifications, and Shipping services can react.
Everything looks straightforward.
Until one day it isn't.
The database transaction succeeds.
The Kafka broker becomes temporarily unavailable.
The event never reaches downstream services.
The customer receives an order confirmation.
Inventory is never reserved.
Shipping never starts.
Payments never process.
Nothing crashed.
No exception appeared in production logs.
Yet the system is now inconsistent.
This is one of the most common reliability problems in distributed systems.
It is known as the Dual-Write Problem, and nearly every event-driven architecture eventually encounters it. The Transactional Outbox Pattern has emerged as one of the industry's most practical solutions because it converts an impossible distributed transaction into a reliable local database transaction combined with asynchronous delivery.
In this guide, we'll understand why dual writes fail, how the Transactional Outbox Pattern works, when to use it, its trade-offs, and how to explain it confidently during a system design interview.
The Dual-Write Problem
Many business operations require two independent actions.
For example:
- Save an order
- Publish an event
or
- Update a user profile
- Invalidate Redis cache
or
- Store a payment
- Notify downstream fraud systems
Developers naturally write code that looks like this:
Save Order to Database
↓
Publish OrderCreated Event
The problem is simple.
These two writes happen against two completely different systems.
The database knows nothing about Kafka.
Kafka knows nothing about MySQL.
Neither can participate in the other's transaction.
That creates four possible outcomes.
| Database | Event | Result |
|---|---|---|
| ✅ | ✅ | Correct |
| ❌ | ❌ | Correct |
| ✅ | ❌ | Inconsistent |
| ❌ | ✅ | Inconsistent |
The last two outcomes are dangerous because the business operation is only partially completed. One system believes the operation succeeded while another has no knowledge of it. These failures are often silent and difficult to detect until users notice missing data or downstream services stop reacting as expected.
Why Retries Don't Solve the Problem
Many engineers initially think retries eliminate the issue.
They don't.
Imagine this sequence.
Write Order
↓
Success
↓
Publish Event
↓
Network Timeout
Did Kafka receive the message?
Maybe.
Did it fail?
Maybe.
If you retry, you might publish the event twice.
If you don't retry, you may never publish it.
Now you've replaced one consistency problem with another.
Distributed systems rarely fail in clean, deterministic ways. They fail with timeouts, partial acknowledgments, transient network partitions, and temporary broker outages. Simply wrapping the publish operation in retry logic does not provide atomicity between the database write and the event publication.
Why Not Use Distributed Transactions?
A common question during interviews is:
Why not simply use one transaction across the database and Kafka?
In theory, protocols like Two-Phase Commit (2PC) can coordinate multiple systems.
In practice, they introduce significant drawbacks:
- Higher latency
- Reduced availability
- Tight coupling
- Complex failure recovery
- Limited support across modern cloud infrastructure
Most microservice architectures deliberately avoid distributed transactions because they reduce scalability and make independent services harder to evolve. Instead, engineers typically prefer patterns that embrace eventual consistency while still guaranteeing correctness.
The Core Idea Behind the Transactional Outbox Pattern
The Transactional Outbox Pattern changes the architecture completely.
Instead of writing to two systems, the application writes only to one.
Both the business data and the event are stored inside the same database transaction.
Database Transaction
Orders Table
+
Outbox Table
↓
Commit
Because both writes occur in the same database, the transaction is atomic.
Either:
- both rows exist
or
- neither exists.
Databases have solved transactional consistency for decades. The outbox pattern leverages those guarantees rather than attempting to coordinate multiple distributed systems.
The Outbox Table
The Outbox table acts as a durable queue stored inside the application's database.
A typical schema might contain:
Outbox
EventID
AggregateID
EventType
Payload
CreatedAt
Status
Every successful business transaction inserts an event into this table alongside the primary business record.
Unlike Kafka, RabbitMQ, or another message broker, the outbox is part of the same transactional database. That makes it impossible for the business record to exist without its corresponding event being recorded.
Step-by-Step Flow
Suppose a customer places an order.
Step 1
Client sends:
POST /orders
Step 2
Application starts a database transaction.
Step 3
Insert into Orders.
Step 4
Insert an OrderCreated event into the Outbox table.
Step 5
Commit the transaction.
At this point:
- Order exists.
- Event exists.
No communication with Kafka has happened yet.
Step 6
A separate relay process continuously scans the Outbox table.
Step 7
The relay publishes the event to Kafka.
Step 8
Once Kafka acknowledges receipt, the relay marks the outbox record as processed or removes it.
This asynchronous relay transforms the original dual write into two independent operations connected by durable storage. If publishing fails, the relay retries later without risking loss of the event because it is already safely stored in the database.
Why This Works
The pattern provides an important guarantee:
If the database transaction commits successfully, the event cannot disappear.
It may be delayed.
It may require retries.
But it is already stored safely inside the Outbox table.
Eventually, the relay will publish it.
This moves the system from an "all-or-nothing right now" model to an "atomic persistence with guaranteed eventual delivery" model. That trade-off is acceptable for many event-driven systems where downstream consumers can tolerate slight delays.
Benefits of the Transactional Outbox Pattern
The pattern offers several practical advantages:
- Eliminates inconsistent dual writes
- Uses standard ACID transactions
- Avoids distributed transactions
- Supports retries without losing events
- Works well with Kafka, RabbitMQ, EventBridge, and similar brokers
- Fits naturally into microservice architectures
It is also widely supported by CDC tools such as Debezium, which can automatically detect changes in the Outbox table and publish events without requiring a custom polling service.
Trade-Offs
Like every architectural pattern, the Transactional Outbox Pattern introduces new responsibilities.
Teams must manage the growth of the Outbox table, ensure relay processes are reliable, handle retries, and design downstream consumers to be idempotent because delivery is typically at least once, meaning duplicate events are possible. Performance overhead from additional writes is usually modest but should still be considered for extremely high-throughput systems.
When Should You Use It?
Use the Transactional Outbox Pattern when:
- A database update must reliably trigger an event.
- You're building event-driven microservices.
- You need reliable messaging without distributed transactions.
- Eventual consistency is acceptable.
Avoid it when:
- Every downstream action must complete synchronously before responding.
- Your storage layer does not support transactions.
- The added operational complexity outweighs the reliability benefit.
Outbox vs Event Sourcing
These patterns are related but solve different problems.
In Event Sourcing, events are the primary source of truth and application state is reconstructed from them.
With the Transactional Outbox Pattern, business tables remain the source of truth, while events are generated alongside them for reliable integration with other systems. Event sourcing changes the entire persistence model, whereas the outbox pattern simply strengthens event publication reliability.
Outbox vs Change Data Capture (CDC)
Some teams combine the Outbox Pattern with CDC.
Instead of writing a polling service, tools like Debezium monitor database changes and automatically publish Outbox events to Kafka.
The application still writes only to the database.
CDC becomes the reliable relay.
This significantly reduces custom infrastructure while preserving the same consistency guarantees.
How Interviewers Expect You to Explain It
A strong system design answer usually follows this sequence:
- Explain the dual-write problem.
- Show why retries are insufficient.
- Explain why distributed transactions are avoided.
- Introduce the Outbox table.
- Describe the asynchronous relay.
- Discuss retries and idempotency.
- Mention eventual consistency and trade-offs.
This demonstrates not only knowledge of the pattern but also an understanding of the engineering decisions behind it.
Common Interview Questions
Why not publish directly to Kafka?
Because the database commit and Kafka publish cannot be made atomic without distributed transactions.
Why is idempotency important?
The relay may retry publishing after failures, so consumers must safely process duplicate events.
Can CDC replace polling?
Yes. CDC tools can publish Outbox events automatically by observing database changes.
Does the Outbox Pattern guarantee exactly-once delivery?
No. It guarantees reliable persistence of events. Delivery is usually at least once, so consumers should be idempotent.
Learning Distributed Systems the Right Way
Patterns like Transactional Outbox, Saga, CQRS, Event Sourcing, Circuit Breakers, and Rate Limiting rarely appear in isolation. Modern system design interviews evaluate whether you understand how these patterns complement each other to build reliable distributed systems.
To build that foundation, DesignGurus.io offers structured learning paths:
- Grokking System Design Fundamentals
https://www.designgurus.io/course/grokking-system-design-fundamentals - Grokking the System Design Interview
https://www.designgurus.io/course/grokking-the-system-design-interview - Microservices Design Patterns for System Design Interviews
https://www.designgurus.io/course/microservices-design-patterns-for-system-design-interviews - Distributed Systems for Practitioners
https://www.designgurus.io/course/distributed-systems-for-practitioners
Studying these concepts together helps you understand not only how systems scale, but also how they remain correct when failures inevitably occur.
Final Thoughts
The Dual-Write Problem is not a Kafka problem, a database problem, or a networking problem.
It is an architectural correctness problem.
The Transactional Outbox Pattern solves it by changing where consistency is enforced. Instead of attempting an impossible transaction across multiple systems, it relies on the one place that already guarantees atomicity: the database. A separate relay process then ensures every committed business change eventually reaches downstream services.
That simple shift has made the Transactional Outbox Pattern one of the most widely adopted reliability patterns in modern event-driven architectures, and one of the most valuable concepts to understand for system design interviews and production engineering alike.
What our users say
Steven Zhang
Just wanted to say thanks for your Grokking the system design interview resource (https://lnkd.in/g4Wii9r7) - it helped me immensely when I was interviewing from Tableau (very little system design exp) and helped me land 18 FAANG+ jobs!
Simon Barker
This is what I love about http://designgurus.io’s Grokking the coding interview course. They teach patterns rather than solutions.
Matzuk
Algorithms can be daunting, but they're less so with the right guide. This course - https://www.designgurus.io/course/grokking-the-coding-interview, is a great starting point. It covers typical problems you might encounter in interviews.
Access to 50+ courses
New content added monthly
Certificate of completion
$31.08
/month
Billed Annually
Recommended Course

Grokking the Object Oriented Design Interview
59,497+ students
3.9
Learn how to prepare for object oriented design interviews and practice common object oriented design interview questions. Master low level design interview.
View Course