What are practical paths to exactly‑once processing in Kafka?

Exactly once processing sounds magical in distributed systems. Networks drop packets, consumers restart, producers retry, and duplicates sneak in. The good news is that Kafka gives you real building blocks to reach an exactly once effect for many streaming topologies. You combine idempotent producers, transactions, read committed consumers, and careful sink design. Outside Kafka you can extend the guarantee with patterns that make side effects safe. This guide gives you practical paths you can actually ship.

Why It Matters

Interviewers care because exactly once forces you to reason about correctness, not only throughput. In production, duplicates can double charge a customer or create ghost inventory. At scale, even tiny duplication rates cost real money. In a system design interview the ability to map guarantees to each hop shows you know the difference between delivery and effect, and that you can design a scalable architecture with predictable behavior during retries and failures.

How It Works step by step

Below are the practical paths teams use. Pick the one that matches your topology and sinks.

Path A Native Kafka transactions for read process write

Configure the producer with enable.idempotence=true and a stable transactional.id.
Use a consumer with isolation.level=read_committed so it only sees committed records.
In a loop read from input topic, process the record, produce to output topic using the transactional producer.
Call sendOffsetsToTransaction to include the consumer offset commit for the same group inside the ongoing transaction.
Commit the transaction. Either the output records and the offset become visible together or neither does. This yields exactly once effect from input topic to output topic.

When to use This is ideal for a Kafka to Kafka pipeline where your sink is another topic. It also works when a downstream component will consume from that output topic.

Notes for seniors Use a stable transactional.id across restarts to enable producer fencing. Keep transactions short to avoid timeouts. The guarantee is per partition. If you repartition, reason about keys so a given key stays on a single partition.

Path B Kafka Streams with exactly once

Set processing.guarantee=exactly_once_v2 in your Streams configuration.
Streams assigns one transactional producer per task and manages commit boundaries.
State store updates and output topic writes are committed atomically with offsets.
Consumers reading your output with read committed see each effect once.

When to use For joins, aggregations, and windowed computations where you want library managed exactly once without wiring transactions yourself. Streams hides many foot guns.

Notes for seniors Version two uses fewer producers and better batching which lowers coordination overhead. Monitor aborted and ongoing transactions and set a sane commit interval to bound checkpoint latency.

Path C Transactional outbox with change data capture

Your service updates business tables and inserts an outbox row with a unique event id in the same database transaction.
A CDC tool such as Debezium tails the commit log and publishes the outbox row to Kafka with that unique id as the key.
Downstream consumers treat the event id as an idempotency key and write idempotently to sinks or compacted topics.

When to use When the source of truth is a database and you want an exactly once effect from that database into Kafka and beyond. The database commit becomes the single source of truth and Kafka mirrors it without gaps or duplicates.

Notes for seniors Use a compacted outbox topic keyed by the id to tolerate occasional upstream retries. Keep the outbox narrow with only what downstream needs. Archive outbox rows after CDC.

Path D Idempotent sinks and deduplication at the edge

Accept that Kafka delivery will be at least once to the consumer.
Make the write to the external system idempotent by key. Use a unique constraint, upsert, or a conditional write based on a version or sequence.
Optionally store a processed set in Redis or a database to drop duplicates early.
If the sink is a side effect such as sending an email include an idempotency key so a retry is a no op.

When to use Whenever you cannot use Kafka transactions end to end because the sink is outside Kafka. Payments, search indexes, and third party APIs typically use this path.

Notes for seniors Idempotent sinks localize retry complexity and tend to scale better than distributed two phase commit. Ensure the id or version is derived deterministically from the event, not from a random generator in the consumer.

Path E Kafka Connect with exactly once where supported

For source connectors that pull from databases via CDC, exactly once into Kafka is practical because the connector groups records by upstream transaction and uses idempotent production.
For sink connectors, true exactly once into external systems depends on the sink capability. Many achieve an exactly once effect with upsert or idempotent writes rather than strict two phase commit.

When to use When you prefer managed ingestion or egestion. Verify the connector guarantee matrix and plan for idempotent operations in the sink.

Real World Example

Think of an Amazon style checkout pipeline. The Order service writes an order row and an outbox row in one database transaction. CDC publishes the outbox event to Kafka. A Streams topology enriches it with inventory and fraud checks using processing.guarantee=exactly_once_v2. The payment service consumes the enriched event and calls the gateway using an idempotency key derived from the order id. Ledger writes to a compacted topic keyed by transaction id. If any consumer or producer restarts, retries may happen, but every ledger entry and payment attempt lands exactly once by effect.

Common Pitfalls or Trade offs

Confusing delivery with effect Exactly once delivery is not the same as exactly once effect. Kafka provides exactly once effect from input topic to output topic with transactions. Once you leave Kafka you need an idempotent or transactional sink.

Offsets committed outside the transaction If you commit consumer offsets separately you can create a gap where offsets advance but output is not written. Always include offsets in the same transaction using sendOffsetsToTransaction.

Unstable transactional id If transactional.id changes on each deploy, fencing will kill the old producer and the new one will start a fresh sequence that breaks idempotence. Derive it deterministically from the app id and task or partition.

Long transactions and timeouts Very large batches can hit the transaction timeout and cause aborts. Keep transactions short and tune linger, batch size, and commit interval to balance throughput and latency.

Hot partitions and key skew Exactly once is per partition. If one key dominates a partition, you can see slower progress and longer commit times. Consider better partitioning functions or key bucketing.

External sink constraints Relational sinks need real unique keys or version checks. Search indexes need idempotent upsert. Third party APIs need idempotency tokens. Design these details early or you will chase ghosts in production.

Interview Tip

If asked to design a pipeline that consumes from a topic, enriches data, publishes to another topic, and also updates a database with exactly once effect, present a two leg answer. Leg one uses a transactional producer with read committed on the Kafka path so input offsets and output writes commit atomically. Leg two uses an outbox in the database so the update to the business row and the outbox row commit together, then a CDC source publishes the outbox to Kafka. Close by stating that the database write uses a natural unique key so retries do not double write.

Key Takeaways

Kafka gives exactly once effect within Kafka using idempotent producers, transactions, and read committed consumers
Kafka Streams makes this simpler with a single config that manages tasks, state stores, and atomic commits
Outside Kafka use transactional outbox and CDC or idempotent sink writes to achieve an exactly once effect
Stable transactional id, short transactions, and per partition reasoning are the keys to reliable operation
Always distinguish delivery from effect when you explain guarantees in a system design interview

Table of Comparison

Approach	Where You Get Exactly Once	Best Suited For	Main Caveats
Native Kafka Transactions	From input topic to output topic inside Kafka	Read-process-write pipelines within Kafka	Transaction tuning, stable transactional ID, per-partition scope
Kafka Streams Exactly Once	State store updates plus output topics	Joins, aggregations, windowed processing	Commit interval tuning, monitoring aborted transactions
Transactional Outbox with CDC	From database commit into Kafka and beyond	Services with a database source of truth	Outbox maintenance, CDC deployment, schema evolution
Idempotent Sink Writes with Dedup	At the external system effect	Payments, search index, third-party APIs	Designing unique keys, dedup store sizing, eventual consistency
Kafka Connect (Where Supported)	Source connectors into Kafka, some sinks via upsert	Managed data movement	Guarantee depends on connector and sink capability

FAQs

Q1. Does Kafka guarantee exactly once delivery?

Kafka guarantees exactly once effect between topics when you use transactions with idempotent production and read committed consumption. Delivery to consumers is still replayable after failures, which is why read committed matters.

Q2. What is the difference between idempotent producer and transactions?

Idempotent production prevents duplicates for a single partition from a single producer session. Transactions add atomic commit across multiple partitions and offset commits so read process write becomes one unit.

Q3. Do I need read committed on consumers?

Yes. If consumers read uncommitted data they can see records that later get aborted and will reprocess or miscompute. Read committed aligns what readers see with what producers commit.

Q4. Can I get exactly once when writing to an external database?

Yes by designing the sink write to be idempotent or by using a transactional outbox with CDC. Without that you only have at least once at the database.

Q5. How does this scale across many partitions?

Each partition has an independent sequence. Transactions can span many partitions, but coordination overhead grows with more partitions. Keep transactions short and distribute keys to avoid hotspots.

Q6. What is the performance cost?

Transactions add coordination and commit latency. With good batching and commit intervals the overhead is often acceptable for business critical correctness.

Further Learning

Level up your streaming designs with the hands on coverage in Grokking Scalable Systems for Interviews. For a complete foundation in guarantees, partitions, and state, study the basics inside Grokking System Design Fundamentals. If you want end to end interview practice that ties these pieces together, check the case studies in Grokking the System Design Interview.