On this page

The Three Delivery Guarantees

At-Most-Once

At-Least-Once

Exactly-Once

How These Guarantees Map to Real Systems

Why Exactly-Once Is Genuinely Hard

How Kafka Achieves Exactly-Once

Idempotent Producers: Deduplication at the Broker

Transactions: Atomic Writes Across Partitions

How a Transaction Actually Executes

Transactional Consumers: Read Committed Isolation

Where Kafka's Guarantees End

A Concrete Example: Payment Processing

The Performance Trade-Off

How This Shows Up in System Design Interviews

Common Mistakes

Conclusion: Key Takeaways

Exactly-Once Delivery: Why It's Harder Than You Think (And How Kafka Solves It)

Arslan Ahmad

April 21st, 2026

A duplicate message means a double charge. Here is why preventing that is one of the hardest problems in distributed systems.

On This Page

The Three Delivery Guarantees

At-Most-Once

At-Least-Once

Exactly-Once

How These Guarantees Map to Real Systems

Why Exactly-Once Is Genuinely Hard

How Kafka Achieves Exactly-Once

Idempotent Producers: Deduplication at the Broker

Transactions: Atomic Writes Across Partitions

How a Transaction Actually Executes

Transactional Consumers: Read Committed Isolation

Where Kafka's Guarantees End

A Concrete Example: Payment Processing

The Performance Trade-Off

How This Shows Up in System Design Interviews

Common Mistakes

Conclusion: Key Takeaways

What This Blog Covers

The three delivery guarantees explained
Why exactly-once is theoretically impossible (and practically achievable)
How Kafka implements exactly-once with idempotent producers and transactions
The boundary where Kafka's guarantees end
How to discuss this in system design interviews

You are building a payment processing system.

A customer clicks "pay." Your service publishes a message to a queue: "charge $50 to card ending in 4242."

The message broker acknowledges receipt. Done.

Except what happens if the network drops between the broker acknowledging the write and your service receiving that acknowledgment?

Your service does not know if the message was saved or lost.

So it retries.

Now the broker has two copies of the same message.

The customer gets charged $100 instead of$ 50.

This is the fundamental problem of message delivery in distributed systems.

The network is unreliable.

Processes crash at the worst possible moment.

And the question "was this message delivered?" does not have a simple yes-or-no answer when the asking and the answering happen on different machines separated by an unreliable network.

There are three delivery guarantees a messaging system can provide: at-most-once (messages may be lost, never duplicated), at-least-once (messages may be duplicated, never lost), and exactly-once (messages are neither lost nor duplicated).

Most systems default to at-least-once because losing messages is usually worse than duplicating them. But for payment processing, inventory management, financial ledgers, and any system where a duplicate has real consequences, at-least-once is not good enough.

This guide explains why exactly-once delivery is genuinely hard, how Kafka achieves it, where Kafka's guarantees end, and how to reason about this in system design interviews.

The Three Delivery Guarantees

At-Most-Once

The producer sends a message and does not wait for confirmation.

If the message is lost in transit, it is gone.

No retries.

The advantage is speed: the producer never blocks waiting for acknowledgment.

The disadvantage is data loss.

This is appropriate for non-critical data like metrics, analytics events, or log entries where losing a few data points is acceptable.

At-Least-Once

The producer sends a message and waits for the broker to acknowledge it.

If the acknowledgment does not arrive (network failure, broker crash, timeout), the producer retries. This guarantees the message is not lost, but the retry might create a duplicate if the original write actually succeeded and only the acknowledgment was lost.

At-least-once is the default for most messaging systems, including Kafka before version 0.11. It is safe for systems where duplicates can be handled downstream (through deduplication or idempotent consumers). It is not safe for systems where a duplicate message causes a duplicate real-world action, like a double charge or a double inventory deduction.

The irony is that at-least-once is the most commonly used guarantee, yet it is also the most dangerous if your consumers are not designed for it.

Most production incidents involving message processing are not caused by lost messages. They are caused by duplicate messages that trigger duplicate side effects.

A duplicate "send welcome email" is annoying.

A duplicate "debit $500 from account" is a legal and financial problem.

Exactly-Once

Each message is delivered exactly one time.

No loss.

No duplicates.

This is what every system wants, but it is the hardest to achieve because it requires coordination between the producer, the broker, and the consumer in the presence of failures at any point.

How These Guarantees Map to Real Systems

Understanding which guarantee you need starts with understanding the cost of failure for your specific use case.

At-most-once fits metrics collection, application logging, and clickstream analytics where losing a small percentage of events is tolerable and speed matters more than completeness.

At-least-once fits email notifications, search index updates, and cache invalidation events where processing a message twice is harmless because the consumer is naturally idempotent (sending the same email twice is annoying but not catastrophic, and updating a search index with the same document twice produces the same result).

Exactly-once fits payment processing, inventory management, financial ledger entries, and any system where a duplicate action has material real-world consequences that cannot be easily reversed.

For understanding how Kafka compares to traditional message queues and why the delivery model matters, Kafka vs Message Queue: Why You Are Probably Using the Wrong One covers the architectural differences.

Why Exactly-Once Is Genuinely Hard

The difficulty of exactly-once delivery comes from the Two Generals' Problem: two parties communicating over an unreliable channel cannot reach absolute certainty that a message was received.

No matter how many acknowledgments you send, the last acknowledgment can always be lost, leaving one side uncertain.

Consider the simplest case: a producer sends a message to a broker.

Scenario 1: The producer sends the message. The broker writes it. The broker sends an acknowledgment. The acknowledgment is lost. The producer thinks the message was not delivered and retries. The broker now has two copies.

Scenario 2: The producer sends the message. The broker crashes before writing it. The producer times out and retries. The retry succeeds. No duplicate. But the producer had no way to know whether Scenario 1 or Scenario 2 happened. The timeout looks the same from the producer's perspective.

This is why the common wisdom is "exactly-once delivery is impossible in distributed systems."

And strictly speaking, it is true. You cannot guarantee that a single network send results in exactly one delivery.

But you can make the system behave as if each message was delivered exactly once, by making duplicate deliveries harmless. This is what Kafka does.

How Kafka Achieves Exactly-Once

Kafka's exactly-once implementation, introduced in version 0.11 and refined through subsequent releases, relies on three mechanisms working together: idempotent producers, transactions, and transactional consumers.

Idempotent Producers: Deduplication at the Broker

When idempotent production is enabled, Kafka assigns each producer instance a unique Producer ID (PID).

Every message the producer sends includes the PID and a sequence number that increments with each message, per partition.

The broker tracks the last sequence number it received from each producer for each partition. When a new message arrives, the broker checks: is this sequence number exactly one more than the last one I saw?

If yes, the message is new and is written.

If no (because it is a retry of an already-written message), the broker acknowledges the write without actually writing a duplicate.

This makes retries completely safe.

The producer can retry as many times as it wants.

The broker silently deduplicates based on the PID and sequence number.

From the producer's perspective, the message is "sent exactly once."

From the broker's perspective, the message is "written exactly once."

Since Kafka 3.0, idempotent production is enabled by default for all producers (enable.idempotence=true).

The performance overhead is minimal: just a few extra numeric fields per message batch.

Transactions: Atomic Writes Across Partitions

Idempotent producers solve the duplicate problem for a single partition. But many real-world operations need to write to multiple partitions or topics atomically.

An order processing pipeline might need to write the order event to the "orders" topic and the inventory update to the "inventory" topic as a single atomic unit.

If the order event is written but the inventory update fails, the system is in an inconsistent state.

Kafka's transactional API solves this.

A producer can begin a transaction, send messages to multiple topics and partitions, and then commit or abort the transaction.

Either all messages in the transaction are visible to consumers, or none of them are.

The mechanism works through a transaction coordinator running on one of the Kafka brokers. The producer registers a stable transactional.id that survives restarts.

The coordinator tracks the transaction state and ensures that:

Only one producer instance with a given transactional ID can be active at any time (fencing out zombies).

All messages in a committed transaction are visible to consumers. All messages in an aborted transaction are invisible to consumers.

This is critical for stream processing, where the pattern is read from an input topic, process, and write to an output topic.

The transaction wraps both the output writes and the consumer offset commit into a single atomic operation.

If the process crashes mid-transaction, the uncommitted messages and offsets are discarded, and the next instance reprocesses from the last committed offset.

How a Transaction Actually Executes

Here is the step-by-step flow of a Kafka transaction:

Step 1: Begin

The producer calls beginTransaction(). This records the transaction start locally but does not contact the coordinator yet.

Step 2: Produce

The producer sends messages to one or more topics and partitions. Each message is tagged with the transactional ID. The first produce request in a transaction registers the target partition with the transaction coordinator.

Step 3: Commit offsets (for read-process-write)

If this is a consume-transform-produce pipeline, the producer sends the consumer offsets for the input partitions to the transaction coordinator. This ties the "I have consumed up to offset X" to the same transaction as "I have produced these output messages."

Step 4: Commit or abort

The producer calls commitTransaction() or abortTransaction().

The coordinator writes a commit marker to all partitions involved in the transaction.

Consumers with read_committed isolation only see messages up to the last commit marker.

If the producer crashes between Step 2 and Step 4, the transaction coordinator eventually times out the transaction and aborts it.

The uncommitted messages become invisible to read_committed consumers.

No partial results.

No inconsistency.

For Kafka Streams applications, this entire cycle is handled automatically. Setting processing.guarantee=exactly_once_v2 enables transactional processing across the read-process-write pipeline.

The Streams runtime manages transactions, offset commits, and fault recovery internally.

For understanding how message brokers handle these internals, How Message Brokers Actually Work (Kafka vs RabbitMQ Internals Explained) covers the architecture.

Transactional Consumers: Read Committed Isolation

On the consumer side, Kafka provides an isolation level setting. When set to read_committed, consumers only see messages from committed transactions.

Messages from in-progress or aborted transactions are invisible.

This is the consumer's half of the exactly-once equation.

The producer ensures messages are written exactly once (via idempotency) and atomically (via transactions).

The consumer ensures it only reads committed results (via read_committed isolation).

Together, they provide end-to-end exactly-once semantics within the Kafka ecosystem.

Where Kafka's Guarantees End

This is the part most blog posts skip, and it is the most important part for practical engineering.

Kafka's exactly-once semantics apply within Kafka.

The guarantee covers: producer to broker (no duplicate writes), broker to consumer (no duplicate reads of committed data), and the atomic read-process-write cycle within Kafka Streams.

Kafka's guarantee does not cover what happens after the consumer processes the message.

If your consumer reads a message from Kafka and makes an HTTP call to a payment gateway, Kafka cannot guarantee that the payment happens exactly once.

The payment gateway is outside Kafka's transactional boundary.

This is where idempotent consumers become essential.

If your consumer writes to an external database, you need the database operation to be idempotent.

Common strategies include using a unique message ID as a deduplication key in the database (if you have already processed message ID abc123, skip it), using database upserts instead of inserts (so processing the same message twice produces the same result), and wrapping the consumer logic and offset commit in a single database transaction (if the database write fails, the offset is not committed, and the message is reprocessed).

The practical takeaway: Kafka gives you exactly-once within its ecosystem. For end-to-end exactly-once across external systems, you need to combine Kafka's guarantees with idempotent consumer design.

A Concrete Example: Payment Processing

Consider a payment consumer that reads charge events from Kafka and calls a payment gateway.

Here is how to make it end-to-end exactly-once:

The consumer reads a message: {order_id: "ord_123", amount: 50.00, idempotency_key: "pay_abc"}.

Before calling the payment gateway, it checks its local database: "have I already processed pay_abc?"

If yes, it skips the call and commits the offset. If no, it calls the payment gateway with the idempotency key.

The payment gateway (Stripe, for example) uses the idempotency key to ensure that even if the call is made twice, the charge happens only once.

After the payment succeeds, the consumer writes a record to its database marking pay_abc as processed, and then commits the Kafka offset.

If the consumer crashes after the payment succeeds but before committing the offset, Kafka redelivers the message.

The consumer checks its database, sees that pay_abc was already processed, skips the payment call, and commits the offset.

No double charge.

No lost payment.

This pattern (check-before-process, idempotent external call, record-then-commit) is the standard approach for extending Kafka's exactly-once guarantee to external systems.

For understanding whether Kafka is the right choice for your system in the first place, Do You Need to Know Kafka for System Design Interviews? covers when Kafka is the right tool and when simpler alternatives work.

The Performance Trade-Off

Exactly-once semantics are not free.

Each mechanism adds overhead.

Idempotent producers add minimal overhead. The sequence number tracking requires a small amount of extra memory on the broker and a few extra bytes per message batch.

In benchmarks, the throughput difference between an idempotent and non-idempotent producer is typically less than 3%.

Transactions add more overhead. Each transaction requires coordination with the transaction coordinator, which adds latency to commits.

Small, frequent transactions (one message per transaction) have proportionally higher overhead than large batches (hundreds of messages per transaction).

The transaction commit itself takes a few milliseconds.

Transactional consumers (read_committed) add read latency because the consumer must wait for transactions to be committed before reading messages.

During the commit window, messages are buffered.

For most applications where exactly-once matters (payments, financial systems, inventory), the throughput reduction is acceptable because correctness is more valuable than raw speed.

For high-throughput, low-latency systems where duplicates can be handled downstream (analytics pipelines, logging), at-least-once with consumer-side deduplication is often a better trade-off.

For understanding when Kafka is the right tool versus alternatives, System Design Interview: Stop Using Kafka for Everything (And What to Say Instead) covers the decision framework.

How This Shows Up in System Design Interviews

Message delivery semantics come up whenever an interviewer asks about data consistency in event-driven systems, payment processing, or any system where "process this exactly once" is a requirement.

Here is how to present it:

"For the payment processing pipeline, I need exactly-once semantics because a duplicate message means a double charge. I would use Kafka with idempotent producers enabled so that retries do not create duplicate messages in the topic. For the consume-process-produce cycle between the order validation and payment execution services, I would use Kafka transactions to atomically commit the output messages and the consumer offsets. On the consumer that calls the payment gateway, since that is outside Kafka's transactional boundary, I would make the consumer idempotent by storing a deduplication key (the order ID) in the payment database. If the consumer processes the same message twice, the database upsert is a no-op."

That answer shows understanding of where Kafka's guarantees apply, where they end, and how to bridge the gap with idempotent consumers.

Common Mistakes

Assuming exactly-once means end-to-end: Kafka's exactly-once semantics cover producer-to-broker and the Kafka Streams read-process-write cycle. They do not cover what your consumer does with the message after reading it. If your consumer calls an external API, you need idempotent consumer design on top of Kafka's guarantees.
Using exactly-once when at-least-once is sufficient: If your consumer is naturally idempotent (database upserts, last-write-wins updates), at-least-once with acks=all gives you the correctness you need with less overhead. Do not pay the transaction cost if deduplication is cheap downstream.
Forgetting to set read_committed on consumers: If your producer uses transactions but your consumer uses the default read_uncommitted isolation level, the consumer sees uncommitted and aborted messages. The exactly-once guarantee is broken on the read side.
Using a different transactional ID on restart: The transactional.id must be stable across restarts for the same logical producer. If you generate a random ID on each start, Kafka cannot fence out zombie producers or resume in-progress transactions. Use a deterministic ID derived from the application instance and input partitions.
Not handling the external boundary: The most common and most costly production bug: Kafka delivers the message exactly once, the consumer processes it, calls an external service, and the external call times out. The consumer retries, Kafka redelivers, and the external action happens twice. Always design the external call to be idempotent.

Conclusion: Key Takeaways

Three delivery guarantees exist: at-most-once, at-least-once, and exactly-once. Most systems default to at-least-once. Exactly-once is needed when duplicates cause real-world harm (double charges, double inventory deductions).
Exactly-once delivery is theoretically impossible but practically achievable. You cannot guarantee a single network send results in exactly one delivery. But you can make duplicates invisible through idempotent writes and transactional atomicity.
Kafka achieves exactly-once through three mechanisms. Idempotent producers (PID + sequence number deduplication), transactions (atomic writes across partitions), and transactional consumers (read_committed isolation).
Kafka's guarantee ends at its boundary. For external systems (databases, payment gateways, APIs), you must design idempotent consumers that handle duplicate processing gracefully.
The performance trade-off is real but usually acceptable. Idempotent producers add minimal overhead. Transactions add commit latency. For systems where correctness matters more than raw throughput, the cost is worth it.
In interviews, distinguish between Kafka-internal and end-to-end exactly-once. Showing that you understand where the guarantee ends and how to extend it with idempotent consumers is the senior-level insight.

What our users say

ABHISHEK GUPTA

My offer from the top tech company would not have been possible without Grokking System Design. Many thanks!!

Eric

I've completed my first pass of "grokking the System Design Interview" and I can say this was an excellent use of money and time. I've grown as a developer and now know the secrets of how to build these really giant internet systems.

Roger Cruz

The world gets better inch by inch when you help someone else. If you haven't tried Grokking The Coding Interview, check it out, it's a great resource!

Recommended Course

Grokking the System Design Interview

180,060+ students

4.7

The #1 system design course for FAANG interviews, built by ex-FAANG hiring managers.

View Course

Data Engineering Interviews: How to Stand Out in a Competitive Field

Arslan Ahmad

Jan 9th, 2025

3 Major Mistakes to Avoid in Your Resume

Arslan Ahmad

Apr 7th, 2024

The 14-Day System Design Interview Plan That Actually Works

Arslan Ahmad

May 1st, 2026

How To Clear System Design Interview: A Quick Guide

Arslan Ahmad

Apr 22nd, 2024

One-Stop Portal For Tech Interviews.

Exactly-Once Delivery: Why It's Harder Than You Think (And How Kafka Solves It)

The Three Delivery Guarantees

At-Most-Once

At-Least-Once

Exactly-Once

How These Guarantees Map to Real Systems

Why Exactly-Once Is Genuinely Hard

How Kafka Achieves Exactly-Once

Idempotent Producers: Deduplication at the Broker

Transactions: Atomic Writes Across Partitions

How a Transaction Actually Executes

Step 1: Begin

Step 2: Produce

Step 3: Commit offsets (for read-process-write)

Step 4: Commit or abort

Transactional Consumers: Read Committed Isolation

Where Kafka's Guarantees End

A Concrete Example: Payment Processing

The Performance Trade-Off

How This Shows Up in System Design Interviews

Common Mistakes

Conclusion: Key Takeaways

What our users say

Recommended Course

Join our Newsletter

Read More