Designing systems using event-driven architecture principles

Question

Design Gurus · Accepted Answer

Event-driven architecture (EDA) is a design pattern where system components communicate by producing and consuming events—notifications of state changes—rather than making direct synchronous calls to each other. When a customer places an order, the order service publishes an OrderPlaced event; the payment service, inventory service, notification service, and analytics service each consume that event independently, without the order service knowing they exist. This decoupling is EDA's core strength and its core complexity. In system design interviews, interviewers test whether you understand when EDA is the right choice—not just what it is. The best system designers know that most service communication should remain synchronous, and EDA should be reserved for specific scenarios where its benefits outweigh its significant complexity.

Key Takeaways

EDA decouples producers from consumers: the service that publishes an event does not know—or care—which services consume it. This enables independent deployment, scaling, and evolution of services.  
Three core EDA patterns appear in interviews: pub/sub (one-to-many broadcast), event sourcing (storing events as the source of truth), and CQRS (separating read and write models). Know when to apply each.  
Kafka is the default event streaming platform for system design interviews in 2026. Mention it with specifics: topics, partitions, consumer groups, offset management, and retention policies.  
The saga pattern handles distributed transactions in EDA—coordinating multi-service operations through compensating transactions when a step fails.  
EDA is not always the answer. If a service needs to call another service and wait for the result, use a synchronous API call. If you have a fixed set of known integrations, direct calls are simpler. Reserve EDA for scenarios requiring true decoupling, fan-out to unknown consumers, event replay, or real-time stream processing.

Event-Driven vs Request-Response: The Core Trade-Off

Dimension Request-Response (Synchronous) Event-Driven (Asynchronous)
Coupling Tight—caller knows the callee Loose—producer does not know consumers
Latency Immediate response No immediate response; eventual processing
Failure handling Caller handles errors directly Events are retried from the broker; consumer failures are isolated
Scaling Both services must scale together Producer and consumers scale independently
Complexity Low—simple request/response flow High—event ordering, idempotency, eventual consistency
Debugging Linear request trace Distributed event flow across multiple consumers
Best for "Do this and tell me the result" "Something happened—react as you see fit"

Interview insight: Do not default to EDA for all communication. The order service charging a payment card needs a synchronous response—"Did the charge succeed?" That is a direct API call. The order service notifying the analytics service that an order was placed is a fire-and-forget event. Mix both patterns in the same system based on the specific interaction.

The Three Core EDA Patterns

1. Publish/Subscribe (Pub/Sub)

How it works: A producer publishes an event to a topic. Multiple consumers subscribe to the topic and each receives a copy of the event. The producer does not know how many consumers exist or what they do with the event.

Example: When a user uploads a photo, the upload service publishes a PhotoUploaded event. The thumbnail service generates thumbnails. The content moderation service scans for policy violations. The notification service alerts followers. The analytics service logs the upload. All four consumers operate independently—adding a fifth consumer requires zero changes to the upload service.

When to use: Fan-out scenarios where one event triggers multiple independent reactions. Systems where new consumers are frequently added. Integration platforms where external developers subscribe to events (Shopify publishes order events; thousands of merchant apps consume them).

Implementation: Kafka topics with consumer groups. Each consumer group receives all messages independently. Within a consumer group, messages are distributed across instances for parallel processing.

2. Event Sourcing

How it works: Instead of storing only the current state of an entity (the traditional CRUD model), event sourcing stores the full sequence of events that produced the current state. The current state is derived by replaying the event log.

Example: A banking account stores every transaction as an event: AccountOpened, DepositMade($500), WithdrawalMade($200), DepositMade($300). The current balance ($600) is computed by replaying these events. The event log is immutable—no data is ever deleted or overwritten.

Benefits: Complete audit trail of every change. Ability to rebuild the current state at any point in time. Ability to replay events through new business logic—a fraud detection system can retroactively analyze historical transactions through updated rules. Investment banks replay months of trades through updated risk models.

Trade-offs: Event storage grows indefinitely (requires snapshotting for performance). Rebuilding state from millions of events is slow without periodic snapshots. Increased complexity compared to simple CRUD. Not appropriate for systems where the event history has no business value.

When to use: Financial systems requiring audit trails. Systems where temporal queries matter ("What was the state at time T?"). Scenarios where replaying history through new logic has business value.

Designing systems using event-driven architecture principles

Key Takeaways

Event-Driven vs Request-Response: The Core Trade-Off

The Three Core EDA Patterns

1. Publish/Subscribe (Pub/Sub)

2. Event Sourcing

3. CQRS (Command Query Responsibility Segregation)

Kafka: The Interview-Standard Event Platform

The Saga Pattern: Distributed Transactions in EDA

The Outbox Pattern: Reliable Event Publishing

When NOT to Use Event-Driven Architecture

Frequently Asked Questions

What is event-driven architecture in system design?

When should I use event-driven architecture in an interview?

What is the difference between pub/sub and event sourcing?

How does the saga pattern work?

Why is Kafka the default for system design interviews?

What is the outbox pattern?

What is CQRS and when should I use it?

How do I handle event ordering in a distributed system?

What are the main challenges of event-driven architecture?

Should I always choose EDA over synchronous communication?

TL;DR

Dimension	Request-Response (Synchronous)	Event-Driven (Asynchronous)
Coupling	Tight—caller knows the callee	Loose—producer does not know consumers
Latency	Immediate response	No immediate response; eventual processing
Failure handling	Caller handles errors directly	Events are retried from the broker; consumer failures are isolated
Scaling	Both services must scale together	Producer and consumers scale independently
Complexity	Low—simple request/response flow	High—event ordering, idempotency, eventual consistency
Debugging	Linear request trace	Distributed event flow across multiple consumers
Best for	"Do this and tell me the result"	"Something happened—react as you see fit"