How do you design retry strategies (exponential backoff, jitter)?

Message queues and stream logs both carry events from producers to consumers, yet they solve different shapes of problems. A message queue hands each message to one consumer inside a group and removes it once acknowledged. A stream log keeps an ordered append only record for a defined retention period and allows many consumers to read the same event at their own pace.

Choosing between them is a design decision that affects latency, cost, failure handling, analytics, and migration paths later. This guide gives you selection criteria you can use with confidence in a system design interview and in real distributed systems.

Why It Matters

Pick a message queue when the core goal is work distribution. Think payments to be settled, images to be resized, orders to be shipped. Each job must be processed once by some worker. You want dead letter routing, backpressure, and visibility into unacked messages. Pick a stream log when the core goal is shared truth and replay. Think click stream analytics, audit trails, feed generation, and fraud models. You want long retention, ordered partitions, consumer offsets, and reprocessing.

Make the wrong choice and you lock yourself into brittle delivery semantics, limited observability, and expensive migrations. Interviewers look for the reasoning, not the brand name. They want to hear how your choice supports scalable architecture, recovery, and future feature growth.

How It Works Step by Step

Message queue flow

Producer publishes a message to a queue or topic.
Broker stores the message in an internal data structure optimized for dequeue.
A consumer group subscribes. The broker delivers each message to one consumer instance in the group.
The consumer processes and then acknowledges. The broker deletes the message when acked.
On failure or timeout, the broker redelivers. Many systems support exponential retry, dead letter queues, and message TTL.
Ordering is typically best effort within a single queue but can be broken by parallelism and retries.
Delivery semantics are usually at least once. Exactly once requires idempotent handlers or transactional outbox.

Stream log flow

Producer appends records to a partitioned log. Each record gets a monotonically increasing offset.
Broker retains data for a time window or size bound. Nothing is removed by ack.
Consumers track their own offsets per partition. A consumer group can scale horizontally by sharing partitions.
Reprocessing is easy. Reset offsets and reread any time.
Ordering is guaranteed within a partition. Cross partition order is not defined.
Delivery semantics are at least once by default. Exactly once requires idempotent producers and transactional reads with writes or external dedupe.
Advanced features include compaction to keep latest value per key, tiered storage to lower cost, and stream processing with stateful windows.

Selection criteria you can apply quickly

Primary goal
- Work distribution for jobs or tasks use a message queue.
- Shared history for analytics or materialized views use a stream log.
Retention
- Short lived with deletion after ack use a message queue.
- Long lived with replay use a stream log.
Consumers
- One consumer per message inside a group use a message queue.
- Many teams reading the same data independently use a stream log.
Ordering
- Soft ordering and simple retries prefer a message queue.
- Per key order and partition aware processing prefer a stream log.
Backpressure
- Built in visibility and dead letter handling favor a message queue.
- Rate control and consumer lag metrics favor a stream log.
Cost profile
- Cheaper for transient jobs use a message queue.
- Cheaper for write once read many patterns use a stream log with tiered storage.
Future proofing
- If you expect many new downstream consumers, start with a stream log and fan out to queues for worker pools when needed.

Real World Example

Retail order pipeline

Orders arrive at an API. Immediate tasks like payment capture, inventory reservation, and email notifications are jobs. A message queue feeds worker pools that perform these tasks exactly once from a business point of view. If a worker crashes, the message is retried or sent to a dead letter queue for later investigation. Short retention keeps costs low and the operational surface clear. Downstream, the same order events are also needed by analytics, recommendation, fraud detection, and a customer service timeline. A stream log stores these events for many hours or days.

Analytics systems can join order events with click stream data. The fraud team can reprocess a full day when they adjust a model. New consumers can be added without touching the producers.

Platforms at scale often combine both. Producers write once to a central stream log. That log feeds online materialization and also pushes a subset into message queues that drive task oriented worker pools.

Common Pitfalls or Trade offs

Assuming delete on ack equals exactly once At least once delivery can still duplicate. Design idempotent handlers or use a transactional outbox.
Overestimating ordering guarantees Message queues rarely guarantee strict order with parallel consumers. Stream logs keep order only within a partition. Choose partition keys carefully to balance load and maintain per key order.
Using a queue to power analytics Once a message is deleted, replay is gone. Analytics needs replay. Use a stream log for that.
Ignoring poison messages A malformed message can block a queue. Use dead letter routing and alerting. In stream logs a bad record does not block others but consumers must handle parse errors.
Unbounded partitions In stream systems, a skewed key can create a hot partition. Use hashing keys or a composite key to spread load while preserving order where needed.
Cost blind retention Long retention without compaction or tiered storage can explode costs. Set retention per topic and use compaction for latest value semantics.

Interview Tip

When asked to choose, start from product goals. Say the goal in one sentence, map it to retention and consumer patterns, then decide. For example, if the interviewer asks for a click stream pipeline with audit grade history and multiple downstream teams, say you will choose a partitioned stream log with at least once semantics, consumer groups per team, and a compacted topic for user profile changes.

If they pivot to image processing or thumbnail generation, switch to a message queue with worker autoscaling, retries, and a dead letter queue. This shows principled thinking.

Key Takeaways

Message queue is best for work distribution with delete on ack and simple retries.
Stream log is best for durable history, replay, and many independent consumers.
Ordering exists per queue or per partition, not across the whole system.
Delivery semantics are usually at least once. Plan for idempotency or transactions.
For large platforms, write once to a central log and fan out to task queues where needed.

Table of Comparison

Criterion	Message Queue	Stream Log	When to Pick
Primary use	Job distribution to workers	Shared ordered history with replay	Jobs → queue, analytics or many readers → log
Retention	Short lived, deleted on ack	Time or size based retention	Need replay → log
Delivery model	One consumer in a group per message	All subscribed groups read same record	Multiple teams → log
Ordering	Best effort	Per partition order	Per key order → log with key-based partitioning
Backpressure	Retries and dead letter routing	Consumer lag metrics and rate control	Poison messages common → queue with DLQ
Reprocessing	Hard once deleted	Reset offsets and reread	Audit or rebuild → log
Typical latency	Low for single delivery	Low append and read, varies with fan out	Ultra low single delivery → queue
Cost profile	Cheaper for transient workloads	Cheaper for write-once-read-many with tiered storage	Long history → log
Examples	Image resize workers, email senders	Click stream, audit trail, feed fan out	Hybrid setup for large platforms

FAQs

Q1. What is the core difference between a message queue and a stream log?

A message queue delivers each message to one consumer in a group and deletes it after ack. A stream log keeps an ordered record for a set time so many consumers can read and replay independently.

Q2. Which one gives stronger ordering guarantees?

Stream logs guarantee order within a partition. Message queues usually provide best effort order and parallel consumers can change relative order.

Q3. How do I get exactly once processing?

Plan for idempotency and transactional patterns. With queues use a transactional outbox or dedupe keys. With logs use idempotent producers, consumer offset transactions, and external dedupe where needed.

Q4. Can I build both job processing and analytics from a single pipeline?

Yes. A common pattern is write once to a central stream log, then create derived topics or connectors that feed message queues for worker pools. This keeps history while enabling low latency task execution.

Q5. When should I avoid a stream log?

If you only need transient work distribution with strict cost control and no replay needs, a queue is simpler and cheaper. Stream systems shine when many consumers and long retention are required.

Further Learning

To master these trade offs with hands on patterns and interview ready frameworks, explore the course Grokking the System Design Interview for a complete decision process and practice prompts. If you want deeper coverage of pipelines, partitions, and consumer group design, enroll in Grokking Scalable Systems for Interviews and build intuition for real production scale.