How to design a real-time analytics system for high-volume data streams

Question

Design Gurus · Accepted Answer

A real-time analytics system ingests high-volume data streams—user clicks, transactions, sensor readings, log events—processes them within seconds of generation, computes aggregations and metrics, and delivers results to dashboards, alerts, and downstream services. Unlike batch systems that process data hours after collection, real-time systems provide insights as events occur: detecting fraud within milliseconds, tracking live user behavior on a dashboard, and triggering alerts when error rates spike. In system design interviews, this problem tests your understanding of stream processing architectures, message brokers, windowing strategies, exactly-once semantics, and the trade-off between processing latency and computational cost. LinkedIn processes over 7 trillion messages per day through Kafka. Uber computes real-time surge pricing from millions of location events per second. The architecture behind these systems follows a predictable pattern that every senior engineer should know.

Key Takeaways

The real-time analytics pipeline has four layers: ingestion (Kafka), processing (Flink/Spark Streaming), storage (time-series or OLAP database), and serving (dashboards and APIs). Each layer is decoupled, scaled independently, and designed for different latency characteristics.  
Kafka is the industry-standard ingestion layer in 2026. It handles millions of events per second with durable, replayable storage. Kafka 4.0 with KRaft eliminates the ZooKeeper dependency, simplifying cluster management.  
Apache Flink is the standard stream processor for stateful computations—supporting exactly-once semantics, event-time windowing, and complex aggregations. Confluent has shifted its strategic focus to Flink as the stream processing standard for new deployments.  
Windowing is how stream processors group unbounded data into finite chunks for aggregation: tumbling (fixed, non-overlapping), sliding (overlapping), and session (activity-based gaps). Choosing the right window type determines the accuracy and timeliness of your analytics.  
Late-arriving data is the hardest operational challenge. Events arrive out of order due to network delays, mobile connectivity gaps, and timezone differences. Watermarks define how long the system waits for late data before closing a window and emitting results.

Step 1: Requirements and Scope

Functional requirements:

Ingest events from multiple sources (web clickstream, mobile app events, server logs, IoT sensors) at 100,000+ events per second. Compute real-time aggregations: count, sum, average, percentile, distinct count over configurable time windows (1 minute, 5 minutes, 1 hour). Power a live dashboard with sub-5-second data freshness. Trigger alerts when metrics breach thresholds (error rate > 1%, latency p99 > 500ms). Store processed results for historical querying (last 30 days at minute granularity, last 1 year at hourly granularity).

Non-functional requirements:

Latency: Events are queryable within 5 seconds of generation (end-to-end). Throughput: Sustain 100K events/second with bursts to 500K. Durability: No event loss—every event is processed at least once. Accuracy: Aggregations must be correct (exactly-once semantics for counters). Availability: 99.99% uptime for the ingestion and serving layers.

Interview tip: Clarify with the interviewer: "Are we designing the entire pipeline from ingestion to dashboard, or focusing on the stream processing layer?" Also ask about acceptable latency—5 seconds versus 100 milliseconds changes the architecture dramatically.

Step 2: Architecture — The Four-Layer Pipeline

Layer 1: Ingestion (Kafka)

Kafka is the entry point. Producers (web servers, mobile apps, IoT devices) publish events to Kafka topics. Kafka provides durable, ordered, replayable message storage.

Configuration for high-volume analytics:

Topics partitioned by event source or user_id for parallelism (50–100 partitions for 100K events/sec). Retention of 7 days enables replay if the processing layer needs to recompute results after a bug fix or schema change. Kafka 4.0 with KRaft eliminates ZooKeeper, reducing operational complexity.

Why Kafka, not a traditional message queue: Kafka retains messages after consumption, enabling replay. Multiple consumer groups read the same topic independently—the processing layer, the archival pipeline, and the real-time alerting system all consume from the same Kafka topic without interference. RabbitMQ and SQS delete messages after delivery, making replay impossible.

Layer 2: Processing (Apache Flink)

Flink reads from Kafka, computes stateful aggregations (counts, sums, percentiles), and writes results to the storage layer.

Why Flink over alternatives:

Flink provides exactly-once state semantics—critical for accurate counters. A "page views in the last 5 minutes" counter that double-counts due to processing duplicates produces incorrect analytics. Flink supports event-time processing—aggregating events based on when they occurred, not when they arrived at the processor. This handles out-of-order events correctly. Flink's checkpointing mechanism periodically snapshots the processing state to durable storage, enabling recovery without data loss on failure.

Alternatives: Spark Structured Streaming (micro-batch model, higher latency but simpler), ksqlDB (SQL-based streaming on Kafka, being replaced by Flink for new deployments), and Kafka Streams (lightweight library, no separate cluster required, limited for complex stateful processing).

Layer 3: Storage

Processed results need two storage systems optimized for different query patterns.

Real-time serving store (hot path): Redis or Apache Druid for sub-second dashboard queries. Stores the last 1–24 hours of minute-granularity aggregations. Optimized for fast reads on recent, pre-aggregated data.

Analytical store (warm/cold path): ClickHouse, Apache Druid, or TimescaleDB for historical queries and ad-hoc analysis. Stores 30 days to 1 year of data at minute or hourly granularity. Optimized for columnar scans and complex analytical queries.

Store Latency Retention Query Pattern Use Case
Redis <1ms 1–24 hours Key-value lookups Live dashboard counters
Apache Druid 50–200ms 30 days–1 year OLAP, slice-and-dice Interactive analytics
ClickHouse 50–500ms 1 year+ Columnar analytics Ad-hoc historical queries
TimescaleDB 10–100ms 30 days–1 year Time-series queries Metric trend analysis

Layer 4: Serving (Dashboard and Alerts)

Dashboard delivery: WebSocket connections push updated metrics to the dashboard every 1–5 seconds. The dashboard service reads from Redis for live counters and from Druid/ClickHouse for trend charts.

Alert engine: A separate Flink job or dedicated alerting service monitors metric streams and triggers alerts when thresholds are breached. "If error_rate > 1% for 3 consecutive minutes, fire a PagerDuty alert."

Step 3: Windowing — Aggregating Unbounded Streams

Stream data is unbounded—it never stops. Windowing groups events into finite time intervals for aggregation.

Tumbling Windows

Fixed-size, non-overlapping windows. Every event belongs to exactly one window.

Example: "Count page views per 1-minute window." Window 1: [10:00:00–10:01:00), Window 2: [10:01:00–10:02:00). Each window produces one output when it closes.

Best for: Dashboard metrics refreshing at fixed intervals.

Sliding Windows

Fixed-size, overlapping windows that advance by a configurable slide interval.

Example: "Count page views in a 5-minute window, updating every 1 minute." At 10:05, the window covers [10:00–10:05). At 10:06, it covers [10:01–10:06). The 4 minutes of overlap produce smoother trends.

Best for: Moving averages, trend lines.

How to design a real-time analytics system for high-volume data streams

Key Takeaways

Step 1: Requirements and Scope

Step 2: Architecture — The Four-Layer Pipeline

Layer 1: Ingestion (Kafka)

Layer 2: Processing (Apache Flink)

Layer 3: Storage

Layer 4: Serving (Dashboard and Alerts)

Step 3: Windowing — Aggregating Unbounded Streams

Tumbling Windows

Sliding Windows

Session Windows

Step 4: Handling Late-Arriving Data

Step 5: Exactly-Once Semantics

Step 6: Scaling and Backpressure

Frequently Asked Questions

What is the difference between real-time and batch analytics?

Why is Kafka the standard for real-time analytics ingestion?

What is the difference between tumbling, sliding, and session windows?

How do you handle late-arriving data in stream processing?

What is exactly-once semantics and why does it matter?

Should I use Flink or Spark Streaming?

What database should I use for real-time analytics results?

What is backpressure in stream processing?

How do I scale a real-time analytics pipeline?

What is the Lambda architecture and should I use it?

TL;DR

Store	Latency	Retention	Query Pattern	Use Case
Redis	<1ms	1–24 hours	Key-value lookups	Live dashboard counters
Apache Druid	50–200ms	30 days–1 year	OLAP, slice-and-dice	Interactive analytics
ClickHouse	50–500ms	1 year+	Columnar analytics	Ad-hoc historical queries
TimescaleDB	10–100ms	30 days–1 year	Time-series queries	Metric trend analysis