Message queues vs stream logs: selection criteria?
Ordering across partitions means delivering or applying events in a predictable sequence even when data is sharded for scale. In a single machine log this is easy since there is one append order.
In a distributed system with many partitions the network can reorder messages, retries can duplicate them, and different machines can assign different clocks. The result is that naive sorting by arrival time breaks. In a system design interview you will be asked to keep correct user observable order without killing throughput. This guide shows practical patterns that real teams use and how to reason about the trade offs.
Why It Matters
Ordering is a visible correctness property. Users expect a chat app to show messages in the order they were sent, a payments ledger to reflect debits before credits, and a feed to avoid older actions jumping above newer ones. Product features like audit logs, search indexing, and analytics pipelines also rely on consistent order.
The challenge is to meet ordering goals while sustaining high write volume, low tail latency, and elastic scaling. Getting this wrong leads to user confusion, double processing, or corrupted aggregates. In interviews, candidates who first classify the exact ordering requirement and then map it to the right primitive stand out.
How It Works step by step
-
Classify the ordering requirement Per key order Keep order for a single entity such as a user, account, or conversation.
Global order Keep a single total order across all events in the system. Causal order Respect happens before relationships without forcing a single global sequence. This classification determines complexity and cost.
-
Choose the ordering primitive Route by key to one partition Hash a stable key so all related events land on the same partition. Brokers like Kafka guarantee order within a partition. This gives per key order with high throughput.
Global sequence service A dedicated sequencer assigns a monotonically increasing number to every event before it is written to any partition. Consumers sort by this sequence. This provides total order at the cost of a new bottleneck. Timestamp service with commit wait Services such as TrueTime paired with multi shard transactions can assign globally ordered commit timestamps. This trades latency for strong external consistency. Causal metadata Attach logical or vector clock metadata and enforce happens before without total order. This scales well when strict global order is unnecessary.
-
Design the event envelope Add fields that make reordering robust.
Event ID globally unique for deduplication. Event time what the producer believes is the occurrence time. Sequence number per key monotonic counter or a global sequence. Source partition and offset so consumers can checkpoint progress. Previous sequence pointer optional allows gap detection for per key streams.
-
Producer rules Producers must use stable keys. For per key order, a user or conversation ID works well. If retries occur, reuse the same Event ID and Sequence number to avoid duplicates appearing newer than the original send. Consider a small local store to persist the last emitted sequence per key.
-
Broker and partitioning strategy Pick a hash function that spreads keys and supports virtual nodes to reduce hot spots. Keep keys sticky during consumer group rebalancing. If you must reshard, use a dual write or mirror step so that order within a key is preserved while keys migrate.
-
Consumer side merge When reading many partitions to build a timeline, maintain a bounded priority queue. Sort by Sequence number if you have a global sequencer. Otherwise sort by Event time with a watermark to tolerate small disorder. A watermark is a threshold time such that anything earlier is safe to emit. This avoids unbounded buffering when late events trickle in.
-
Outbox and database first pattern For transactional systems use a transactional outbox table in the same database as your writes. Each commit appends an event row with an auto increasing sequence. A change data capture job reads this outbox and publishes to the log. The outbox gives a durable per entity order that mirrors commit order.
-
Failure handling and idempotence Real networks duplicate and reorder messages. Consumers should be idempotent. Keep a per key high water mark and discard events with a Sequence number at or below that mark. If you rely on Event time sorting, add deduplication by Event ID. For exactly once like outcomes pair idempotent producers with idempotent consumers and transactional commits within the broker or database.
-
Testing for order under stress Build property based tests that inject clock skew, packet delay, and partition rebalancing. Assert monotonicity per key and correctness of the final materialized views. Run backfill and replay simulations to validate that reorder buffers and watermarks do not explode during catch up.
Real World Example
Consider a global chat service that stores messages in a log with many partitions. The user expectation is clear. Messages from the same conversation must appear in send order. Messages across different conversations do not need a single global order. Design recipe.
Partitioning Use the conversation ID as the key so all messages for a chat land on one partition. This guarantees per conversation order.
Event envelope Each message carries an auto increasing Sequence number generated by the chat service for that conversation, plus an Event ID for deduplication.
Fan out The notifications service subscribes to many partitions and merges streams only for users who belong to many conversations. Since we only require per conversation order we never attempt a global order.
Edge cases During resharding, conversation keys are moved with a brief drain and replay so no cross partition interleaving occurs for that key. During retries, the producer resends with the same Event ID and Sequence number so consumers do not see a newer duplicate.
Common Pitfalls or Trade offs
-
Relying on wall clock. Different machines have different clocks. Using Event time without a watermark permits misordered outputs.
-
Ignoring key design. If the hashing key is not aligned with the ordering requirement related events split across partitions and you lose the only cheap ordering guarantee you had.
-
Unbounded reorder buffers. If late events can be arbitrarily late the consumer priority queue grows without limit. Bound lateness with a watermark and move truly late events to a repair path.
-
Single sequencer bottleneck. A naive global sequence service becomes the hottest dependency and increases tail latency. Use sharded sequences with ranges and ensure the downstream consumer can merge by sequence across shards.
-
Resharding without sticky keys. Moving keys between partitions mid stream can interleave old and new partitions. Use dual reads with a cutover point or mirror and drain.
Interview Tip
Start by asking what level of ordering is required. Per key order is often enough and far cheaper than global order. Propose routing by key to one partition and show how you will handle retries with idempotence. If a global timeline is truly required, present a global sequencer and explain the latency and availability impact. Close with how you will test with clock skew and resharding.
Key Takeaways
-
Ordering has levels. Per key order is common, causal order is lighter than global order, and global order is the most expensive.
-
The cheapest strong guarantee is to route a key to one partition and keep per partition order.
-
Global order needs a sequencer or a timestamp service and usually adds latency.
-
Consumer merge with watermarks provides a practical way to tame small disorder.
-
Idempotence and explicit sequence numbers convert at least once delivery into a correct final state.
Table of Comparison
| Approach | What it guarantees | Typical pattern | Latency impact | Throughput impact | Hotspot risk | Operational complexity | Works best for |
|---|---|---|---|---|---|---|---|
| Route by key to one partition | Strict per key order | Consistent hashing with sticky keys | Low | High | Medium if a few keys are very hot | Low | Chats, user timelines, account ledgers |
| Global sequence service | Total order across all events | Central sequencer or sharded sequence ranges | Medium to high | Medium | High at the sequencer | High | Audit logs, replayable analytics, strict serialization |
| Timestamp service with commit wait | Global external consistency by commit time | TrueTime like API with two phase commit | High | Medium | Medium | High | Financial systems, cross region transactions |
| Causal metadata | Preserves happens before | Logical or vector clocks in headers | Low | High | Low | Medium | Feeds, collaboration, social actions |
| Consumer merge with watermarks | Deterministic view with bounded disorder | Priority queue sorted by event field plus watermark | Low to medium | High | Low | Medium | Stream joins, multi partition timelines |
| Outbox with change data capture | Commit order per entity | Transactional outbox table plus capture process | Low to medium | High | Low | Medium | Service integration, reliable event sourcing |
FAQs
Q1. What is the simplest way to ensure ordering across partitions?
If your requirement is per key order the simplest and fastest solution is to route all events with the same key to one partition and rely on the broker guarantee that a partition is a log with ordered appends.
Q2. Do I need global order for a user feed or notification system?
Usually no. A per user or per conversation order is enough. Build a consumer merge that sorts by event field with a small watermark to handle late arrivals from different partitions.
Q3. How do watermarks control out of order events?
A watermark is a moving time boundary. The consumer buffers events and only emits items earlier than the watermark. This bounds memory and gives predictable behavior with small disorder while still allowing late events to be handled on a repair path.
Q4. Can I combine exactly once processing with strong ordering?
Yes in practice by pairing idempotent producers, idempotent consumers, and transactional commits in the broker or database. The combination yields a correct final state with per key order or with a global sequencer.
Q5. What happens to ordering during resharding or consumer rebalancing?
Keys should remain sticky. Migrate keys with a mirror and drain sequence so the same key does not produce interleaved reads from two partitions. Consumers should checkpoint per partition offsets so recovery continues from a consistent point.
Q6. How can I test ordering guarantees at scale?
Inject clock skew, random delay, duplication, and forced retries in a staging cluster. Rebuild materialized views from scratch to validate that sequence numbers and watermarks produce the same final result as an ideal single machine replay.
Further Learning
Strengthen your foundation in partitioning, consistency, and event logs with Grokking System Design Fundamentals. Practice full interview scale designs that balance ordering, latency, and throughput in Grokking the System Design Interview. For end to end case studies with streams and large scale partitions enroll in Grokking Scalable Systems for Interviews.
GET YOUR FREE
Coding Questions Catalog
$197

$78
$78