On this page
The Three Delivery Guarantees
At-Most-Once
At-Least-Once
Exactly-Once
How These Guarantees Map to Real Systems
Why Exactly-Once Is Genuinely Hard
How Kafka Achieves Exactly-Once
Idempotent Producers: Deduplication at the Broker
Transactions: Atomic Writes Across Partitions
How a Transaction Actually Executes
Transactional Consumers: Read Committed Isolation
Where Kafka's Guarantees End
A Concrete Example: Payment Processing
The Performance Trade-Off
How This Shows Up in System Design Interviews
Common Mistakes
Conclusion: Key Takeaways
Exactly-Once Delivery: Why It's Harder Than You Think (And How Kafka Solves It)


On This Page
The Three Delivery Guarantees
At-Most-Once
At-Least-Once
Exactly-Once
How These Guarantees Map to Real Systems
Why Exactly-Once Is Genuinely Hard
How Kafka Achieves Exactly-Once
Idempotent Producers: Deduplication at the Broker
Transactions: Atomic Writes Across Partitions
How a Transaction Actually Executes
Transactional Consumers: Read Committed Isolation
Where Kafka's Guarantees End
A Concrete Example: Payment Processing
The Performance Trade-Off
How This Shows Up in System Design Interviews
Common Mistakes
Conclusion: Key Takeaways
What This Blog Covers
- The three delivery guarantees explained
- Why exactly-once is theoretically impossible (and practically achievable)
- How Kafka implements exactly-once with idempotent producers and transactions
- The boundary where Kafka's guarantees end
- How to discuss this in system design interviews
You are building a payment processing system.
A customer clicks "pay." Your service publishes a message to a queue: "charge $50 to card ending in 4242."
The message broker acknowledges receipt. Done.
Except what happens if the network drops between the broker acknowledging the write and your service receiving that acknowledgment?
Your service does not know if the message was saved or lost.
So it retries.
Now the broker has two copies of the same message.
The customer gets charged 100 instead of 50.
This is the fundamental problem of message delivery in distributed systems.
The network is unreliable.
Processes crash at the worst possible moment.
And the question "was this message delivered?" does not have a simple yes-or-no answer when the asking and the answering happen on different machines separated by an unreliable network.
There are three delivery guarantees a messaging system can provide: at-most-once (messages may be lost, never duplicated), at-least-once (messages may be duplicated, never lost), and exactly-once (messages are neither lost nor duplicated).
Most systems default to at-least-once because losing messages is usually worse than duplicating them. But for payment processing, inventory management, financial ledgers, and any system where a duplicate has real consequences, at-least-once is not good enough.
This guide explains why exactly-once delivery is genuinely hard, how Kafka achieves it, where Kafka's guarantees end, and how to reason about this in system design interviews.
The Three Delivery Guarantees
At-Most-Once
The producer sends a message and does not wait for confirmation.
If the message is lost in transit, it is gone.
No retries.
The advantage is speed: the producer never blocks waiting for acknowledgment.
The disadvantage is data loss.
This is appropriate for non-critical data like metrics, analytics events, or log entries where losing a few data points is acceptable.
At-Least-Once
The producer sends a message and waits for the broker to acknowledge it.
If the acknowledgment does not arrive (network failure, broker crash, timeout), the producer retries. This guarantees the message is not lost, but the retry might create a duplicate if the original write actually succeeded and only the acknowledgment was lost.
At-least-once is the default for most messaging systems, including Kafka before version 0.11. It is safe for systems where duplicates can be handled downstream (through deduplication or idempotent consumers). It is not safe for systems where a duplicate message causes a duplicate real-world action, like a double charge or a double inventory deduction.
The irony is that at-least-once is the most commonly used guarantee, yet it is also the most dangerous if your consumers are not designed for it.
Most production incidents involving message processing are not caused by lost messages. They are caused by duplicate messages that trigger duplicate side effects.
A duplicate "send welcome email" is annoying.
A duplicate "debit $500 from account" is a legal and financial problem.
Exactly-Once
Each message is delivered exactly one time.
No loss.
No duplicates.
This is what every system wants, but it is the hardest to achieve because it requires coordination between the producer, the broker, and the consumer in the presence of failures at any point.
How These Guarantees Map to Real Systems
Understanding which guarantee you need starts with understanding the cost of failure for your specific use case.
At-most-once fits metrics collection, application logging, and clickstream analytics where losing a small percentage of events is tolerable and speed matters more than completeness.
At-least-once fits email notifications, search index updates, and cache invalidation events where processing a message twice is harmless because the consumer is naturally idempotent (sending the same email twice is annoying but not catastrophic, and updating a search index with the same document twice produces the same result).
Exactly-once fits payment processing, inventory management, financial ledger entries, and any system where a duplicate action has material real-world consequences that cannot be easily reversed.
For understanding how Kafka compares to traditional message queues and why the delivery model matters, Kafka vs Message Queue: Why You Are Probably Using the Wrong One covers the architectural differences.
Why Exactly-Once Is Genuinely Hard
The difficulty of exactly-once delivery comes from the Two Generals' Problem: two parties communicating over an unreliable channel cannot reach absolute certainty that a message was received.
No matter how many acknowledgments you send, the last acknowledgment can always be lost, leaving one side uncertain.
Consider the simplest case: a producer sends a message to a broker.
Scenario 1: The producer sends the message. The broker writes it. The broker sends an acknowledgment. The acknowledgment is lost. The producer thinks the message was not delivered and retries. The broker now has two copies.
Scenario 2: The producer sends the message. The broker crashes before writing it. The producer times out and retries. The retry succeeds. No duplicate. But the producer had no way to know whether Scenario 1 or Scenario 2 happened. The timeout looks the same from the producer's perspective.
This is why the common wisdom is "exactly-once delivery is impossible in distributed systems."
And strictly speaking, it is true. You cannot guarantee that a single network send results in exactly one delivery.
But you can make the system behave as if each message was delivered exactly once, by making duplicate deliveries harmless. This is what Kafka does.
How Kafka Achieves Exactly-Once
Kafka's exactly-once implementation, introduced in version 0.11 and refined through subsequent releases, relies on three mechanisms working together: idempotent producers, transactions, and transactional consumers.
Idempotent Producers: Deduplication at the Broker
When idempotent production is enabled, Kafka assigns each producer instance a unique Producer ID (PID).
Every message the producer sends includes the PID and a sequence number that increments with each message, per partition.
The broker tracks the last sequence number it received from each producer for each partition. When a new message arrives, the broker checks: is this sequence number exactly one more than the last one I saw?
If yes, the message is new and is written.
If no (because it is a retry of an already-written message), the broker acknowledges the write without actually writing a duplicate.
This makes retries completely safe.
The producer can retry as many times as it wants.
The broker silently deduplicates based on the PID and sequence number.
From the producer's perspective, the message is "sent exactly once."
From the broker's perspective, the message is "written exactly once."
Since Kafka 3.0, idempotent production is enabled by default for all producers (enable.idempotence=true).
The performance overhead is minimal: just a few extra numeric fields per message batch.
Transactions: Atomic Writes Across Partitions
Idempotent producers solve the duplicate problem for a single partition. But many real-world operations need to write to multiple partitions or topics atomically.
An order processing pipeline might need to write the order event to the "orders" topic and the inventory update to the "inventory" topic as a single atomic unit.
If the order event is written but the inventory update fails, the system is in an inconsistent state.
Kafka's transactional API solves this.
A producer can begin a transaction, send messages to multiple topics and partitions, and then commit or abort the transaction.
Either all messages in the transaction are visible to consumers, or none of them are.
The mechanism works through a transaction coordinator running on one of the Kafka brokers. The producer registers a stable transactional.id that survives restarts.
The coordinator tracks the transaction state and ensures that:
Only one producer instance with a given transactional ID can be active at any time (fencing out zombies).
All messages in a committed transaction are visible to consumers. All messages in an aborted transaction are invisible to consumers.
This is critical for stream processing, where the pattern is read from an input topic, process, and write to an output topic.
The transaction wraps both the output writes and the consumer offset commit into a single atomic operation.
If the process crashes mid-transaction, the uncommitted messages and offsets are discarded, and the next instance reprocesses from the last committed offset.
How a Transaction Actually Executes
Here is the step-by-step flow of a Kafka transaction:
Step 1: Begin
The producer calls beginTransaction(). This records the transaction start locally but does not contact the coordinator yet.
Step 2: Produce
The producer sends messages to one or more topics and partitions. Each message is tagged with the transactional ID. The first produce request in a transaction registers the target partition with the transaction coordinator.
Step 3: Commit offsets (for read-process-write)
If this is a consume-transform-produce pipeline, the producer sends the consumer offsets for the input partitions to the transaction coordinator. This ties the "I have consumed up to offset X" to the same transaction as "I have produced these output messages."
Step 4: Commit or abort
The producer calls commitTransaction() or abortTransaction().
The coordinator writes a commit marker to all partitions involved in the transaction.
Consumers with read_committed isolation only see messages up to the last commit marker.
If the producer crashes between Step 2 and Step 4, the transaction coordinator eventually times out the transaction and aborts it.
The uncommitted messages become invisible to read_committed consumers.
No partial results.
No inconsistency.
For Kafka Streams applications, this entire cycle is handled automatically. Setting processing.guarantee=exactly_once_v2 enables transactional processing across the read-process-write pipeline.
The Streams runtime manages transactions, offset commits, and fault recovery internally.
For understanding how message brokers handle these internals, How Message Brokers Actually Work (Kafka vs RabbitMQ Internals Explained) covers the architecture.
Transactional Consumers: Read Committed Isolation
On the consumer side, Kafka provides an isolation level setting. When set to read_committed, consumers only see messages from committed transactions.
Messages from in-progress or aborted transactions are invisible.
This is the consumer's half of the exactly-once equation.
The producer ensures messages are written exactly once (via idempotency) and atomically (via transactions).
The consumer ensures it only reads committed results (via read_committed isolation).
Together, they provide end-to-end exactly-once semantics within the Kafka ecosystem.
Where Kafka's Guarantees End
This is the part most blog posts skip, and it is the most important part for practical engineering.
Kafka's exactly-once semantics apply within Kafka.
The guarantee covers: producer to broker (no duplicate writes), broker to consumer (no duplicate reads of committed data), and the atomic read-process-write cycle within Kafka Streams.
Kafka's guarantee does not cover what happens after the consumer processes the message.
If your consumer reads a message from Kafka and makes an HTTP call to a payment gateway, Kafka cannot guarantee that the payment happens exactly once.
The payment gateway is outside Kafka's transactional boundary.
This is where idempotent consumers become essential.
If your consumer writes to an external database, you need the database operation to be idempotent.
Common strategies include using a unique message ID as a deduplication key in the database (if you have already processed message ID abc123, skip it), using database upserts instead of inserts (so processing the same message twice produces the same result), and wrapping the consumer logic and offset commit in a single database transaction (if the database write fails, the offset is not committed, and the message is reprocessed).
The practical takeaway: Kafka gives you exactly-once within its ecosystem. For end-to-end exactly-once across external systems, you need to combine Kafka's guarantees with idempotent consumer design.
A Concrete Example: Payment Processing
Consider a payment consumer that reads charge events from Kafka and calls a payment gateway.
Here is how to make it end-to-end exactly-once:
The consumer reads a message: {order_id: "ord_123", amount: 50.00, idempotency_key: "pay_abc"}.
Before calling the payment gateway, it checks its local database: "have I already processed pay_abc?"
If yes, it skips the call and commits the offset. If no, it calls the payment gateway with the idempotency key.
The payment gateway (Stripe, for example) uses the idempotency key to ensure that even if the call is made twice, the charge happens only once.
After the payment succeeds, the consumer writes a record to its database marking pay_abc as processed, and then commits the Kafka offset.
If the consumer crashes after the payment succeeds but before committing the offset, Kafka redelivers the message.
The consumer checks its database, sees that pay_abc was already processed, skips the payment call, and commits the offset.
No double charge.
No lost payment.
This pattern (check-before-process, idempotent external call, record-then-commit) is the standard approach for extending Kafka's exactly-once guarantee to external systems.
For understanding whether Kafka is the right choice for your system in the first place, Do You Need to Know Kafka for System Design Interviews? covers when Kafka is the right tool and when simpler alternatives work.
The Performance Trade-Off
Exactly-once semantics are not free.
Each mechanism adds overhead.
Idempotent producers add minimal overhead. The sequence number tracking requires a small amount of extra memory on the broker and a few extra bytes per message batch.
In benchmarks, the throughput difference between an idempotent and non-idempotent producer is typically less than 3%.
Transactions add more overhead. Each transaction requires coordination with the transaction coordinator, which adds latency to commits.
Small, frequent transactions (one message per transaction) have proportionally higher overhead than large batches (hundreds of messages per transaction).
The transaction commit itself takes a few milliseconds.
Transactional consumers (read_committed) add read latency because the consumer must wait for transactions to be committed before reading messages.
During the commit window, messages are buffered.
For most applications where exactly-once matters (payments, financial systems, inventory), the throughput reduction is acceptable because correctness is more valuable than raw speed.
For high-throughput, low-latency systems where duplicates can be handled downstream (analytics pipelines, logging), at-least-once with consumer-side deduplication is often a better trade-off.
For understanding when Kafka is the right tool versus alternatives, System Design Interview: Stop Using Kafka for Everything (And What to Say Instead) covers the decision framework.
How This Shows Up in System Design Interviews
Message delivery semantics come up whenever an interviewer asks about data consistency in event-driven systems, payment processing, or any system where "process this exactly once" is a requirement.
Here is how to present it:
"For the payment processing pipeline, I need exactly-once semantics because a duplicate message means a double charge. I would use Kafka with idempotent producers enabled so that retries do not create duplicate messages in the topic. For the consume-process-produce cycle between the order validation and payment execution services, I would use Kafka transactions to atomically commit the output messages and the consumer offsets. On the consumer that calls the payment gateway, since that is outside Kafka's transactional boundary, I would make the consumer idempotent by storing a deduplication key (the order ID) in the payment database. If the consumer processes the same message twice, the database upsert is a no-op."
That answer shows understanding of where Kafka's guarantees apply, where they end, and how to bridge the gap with idempotent consumers.
Common Mistakes
-
Assuming exactly-once means end-to-end: Kafka's exactly-once semantics cover producer-to-broker and the Kafka Streams read-process-write cycle. They do not cover what your consumer does with the message after reading it. If your consumer calls an external API, you need idempotent consumer design on top of Kafka's guarantees.
-
Using exactly-once when at-least-once is sufficient: If your consumer is naturally idempotent (database upserts, last-write-wins updates), at-least-once with
acks=allgives you the correctness you need with less overhead. Do not pay the transaction cost if deduplication is cheap downstream. -
Forgetting to set read_committed on consumers: If your producer uses transactions but your consumer uses the default
read_uncommittedisolation level, the consumer sees uncommitted and aborted messages. The exactly-once guarantee is broken on the read side. -
Using a different transactional ID on restart: The
transactional.idmust be stable across restarts for the same logical producer. If you generate a random ID on each start, Kafka cannot fence out zombie producers or resume in-progress transactions. Use a deterministic ID derived from the application instance and input partitions. -
Not handling the external boundary: The most common and most costly production bug: Kafka delivers the message exactly once, the consumer processes it, calls an external service, and the external call times out. The consumer retries, Kafka redelivers, and the external action happens twice. Always design the external call to be idempotent.
Conclusion: Key Takeaways
-
Three delivery guarantees exist: at-most-once, at-least-once, and exactly-once. Most systems default to at-least-once. Exactly-once is needed when duplicates cause real-world harm (double charges, double inventory deductions).
-
Exactly-once delivery is theoretically impossible but practically achievable. You cannot guarantee a single network send results in exactly one delivery. But you can make duplicates invisible through idempotent writes and transactional atomicity.
-
Kafka achieves exactly-once through three mechanisms. Idempotent producers (PID + sequence number deduplication), transactions (atomic writes across partitions), and transactional consumers (read_committed isolation).
-
Kafka's guarantee ends at its boundary. For external systems (databases, payment gateways, APIs), you must design idempotent consumers that handle duplicate processing gracefully.
-
The performance trade-off is real but usually acceptable. Idempotent producers add minimal overhead. Transactions add commit latency. For systems where correctness matters more than raw throughput, the cost is worth it.
-
In interviews, distinguish between Kafka-internal and end-to-end exactly-once. Showing that you understand where the guarantee ends and how to extend it with idempotent consumers is the senior-level insight.
What our users say
ABHISHEK GUPTA
My offer from the top tech company would not have been possible without this course. Many thanks!!
Eric
I've completed my first pass of "grokking the System Design Interview" and I can say this was an excellent use of money and time. I've grown as a developer and now know the secrets of how to build these really giant internet systems.
Roger Cruz
The world gets better inch by inch when you help someone else. If you haven't tried Grokking The Coding Interview, check it out, it's a great resource!
Access to 50+ courses
New content added monthly
Certificate of completion
$29.08
/month
Billed Annually
Recommended Course

Grokking the System Design Interview
169,039+ students
4.7
Grokking the System Design Interview is a comprehensive course for system design interview. It provides a step-by-step guide to answering system design questions.
View CourseRead More
Designing a Parking System for Interviews: A Step-by-Step Guide
Arslan Ahmad
FAANG Interview Prep: The Complete Guide (Coding, System Design, Behavioral)
Arslan Ahmad
High Availability in System Design: 15 Strategies for Always-On Systems
Arslan Ahmad
Educative vs. DesignGurus.io: System Design Courses Compared
Arslan Ahmad