How to design a real-time analytics system for high-volume data streams

A real-time analytics system ingests high-volume data streams—user clicks, transactions, sensor readings, log events—processes them within seconds of generation, computes aggregations and metrics, and delivers results to dashboards, alerts, and downstream services. Unlike batch systems that process data hours after collection, real-time systems provide insights as events occur: detecting fraud within milliseconds, tracking live user behavior on a dashboard, and triggering alerts when error rates spike. In system design interviews, this problem tests your understanding of stream processing architectures, message brokers, windowing strategies, exactly-once semantics, and the trade-off between processing latency and computational cost. LinkedIn processes over 7 trillion messages per day through Kafka. Uber computes real-time surge pricing from millions of location events per second. The architecture behind these systems follows a predictable pattern that every senior engineer should know.

Key Takeaways

  • The real-time analytics pipeline has four layers: ingestion (Kafka), processing (Flink/Spark Streaming), storage (time-series or OLAP database), and serving (dashboards and APIs). Each layer is decoupled, scaled independently, and designed for different latency characteristics.
  • Kafka is the industry-standard ingestion layer in 2026. It handles millions of events per second with durable, replayable storage. Kafka 4.0 with KRaft eliminates the ZooKeeper dependency, simplifying cluster management.
  • Apache Flink is the standard stream processor for stateful computations—supporting exactly-once semantics, event-time windowing, and complex aggregations. Confluent has shifted its strategic focus to Flink as the stream processing standard for new deployments.
  • Windowing is how stream processors group unbounded data into finite chunks for aggregation: tumbling (fixed, non-overlapping), sliding (overlapping), and session (activity-based gaps). Choosing the right window type determines the accuracy and timeliness of your analytics.
  • Late-arriving data is the hardest operational challenge. Events arrive out of order due to network delays, mobile connectivity gaps, and timezone differences. Watermarks define how long the system waits for late data before closing a window and emitting results.

Step 1: Requirements and Scope

Functional requirements:

Ingest events from multiple sources (web clickstream, mobile app events, server logs, IoT sensors) at 100,000+ events per second. Compute real-time aggregations: count, sum, average, percentile, distinct count over configurable time windows (1 minute, 5 minutes, 1 hour). Power a live dashboard with sub-5-second data freshness. Trigger alerts when metrics breach thresholds (error rate > 1%, latency p99 > 500ms). Store processed results for historical querying (last 30 days at minute granularity, last 1 year at hourly granularity).

Non-functional requirements:

Latency: Events are queryable within 5 seconds of generation (end-to-end). Throughput: Sustain 100K events/second with bursts to 500K. Durability: No event loss—every event is processed at least once. Accuracy: Aggregations must be correct (exactly-once semantics for counters). Availability: 99.99% uptime for the ingestion and serving layers.

Interview tip: Clarify with the interviewer: "Are we designing the entire pipeline from ingestion to dashboard, or focusing on the stream processing layer?" Also ask about acceptable latency—5 seconds versus 100 milliseconds changes the architecture dramatically.

Step 2: Architecture — The Four-Layer Pipeline

Layer 1: Ingestion (Kafka)

Kafka is the entry point. Producers (web servers, mobile apps, IoT devices) publish events to Kafka topics. Kafka provides durable, ordered, replayable message storage.

Configuration for high-volume analytics:

Topics partitioned by event source or user_id for parallelism (50–100 partitions for 100K events/sec). Retention of 7 days enables replay if the processing layer needs to recompute results after a bug fix or schema change. Kafka 4.0 with KRaft eliminates ZooKeeper, reducing operational complexity.

Why Kafka, not a traditional message queue: Kafka retains messages after consumption, enabling replay. Multiple consumer groups read the same topic independently—the processing layer, the archival pipeline, and the real-time alerting system all consume from the same Kafka topic without interference. RabbitMQ and SQS delete messages after delivery, making replay impossible.

Flink reads from Kafka, computes stateful aggregations (counts, sums, percentiles), and writes results to the storage layer.

Why Flink over alternatives:

Flink provides exactly-once state semantics—critical for accurate counters. A "page views in the last 5 minutes" counter that double-counts due to processing duplicates produces incorrect analytics. Flink supports event-time processing—aggregating events based on when they occurred, not when they arrived at the processor. This handles out-of-order events correctly. Flink's checkpointing mechanism periodically snapshots the processing state to durable storage, enabling recovery without data loss on failure.

Alternatives: Spark Structured Streaming (micro-batch model, higher latency but simpler), ksqlDB (SQL-based streaming on Kafka, being replaced by Flink for new deployments), and Kafka Streams (lightweight library, no separate cluster required, limited for complex stateful processing).

Layer 3: Storage

Processed results need two storage systems optimized for different query patterns.

Real-time serving store (hot path): Redis or Apache Druid for sub-second dashboard queries. Stores the last 1–24 hours of minute-granularity aggregations. Optimized for fast reads on recent, pre-aggregated data.

Analytical store (warm/cold path): ClickHouse, Apache Druid, or TimescaleDB for historical queries and ad-hoc analysis. Stores 30 days to 1 year of data at minute or hourly granularity. Optimized for columnar scans and complex analytical queries.

StoreLatencyRetentionQuery PatternUse Case
Redis<1ms1–24 hoursKey-value lookupsLive dashboard counters
Apache Druid50–200ms30 days–1 yearOLAP, slice-and-diceInteractive analytics
ClickHouse50–500ms1 year+Columnar analyticsAd-hoc historical queries
TimescaleDB10–100ms30 days–1 yearTime-series queriesMetric trend analysis

Layer 4: Serving (Dashboard and Alerts)

Dashboard delivery: WebSocket connections push updated metrics to the dashboard every 1–5 seconds. The dashboard service reads from Redis for live counters and from Druid/ClickHouse for trend charts.

Alert engine: A separate Flink job or dedicated alerting service monitors metric streams and triggers alerts when thresholds are breached. "If error_rate > 1% for 3 consecutive minutes, fire a PagerDuty alert."

Step 3: Windowing — Aggregating Unbounded Streams

Stream data is unbounded—it never stops. Windowing groups events into finite time intervals for aggregation.

Tumbling Windows

Fixed-size, non-overlapping windows. Every event belongs to exactly one window.

Example: "Count page views per 1-minute window." Window 1: [10:00:00–10:01:00), Window 2: [10:01:00–10:02:00). Each window produces one output when it closes.

Best for: Dashboard metrics refreshing at fixed intervals.

Sliding Windows

Fixed-size, overlapping windows that advance by a configurable slide interval.

Example: "Count page views in a 5-minute window, updating every 1 minute." At 10:05, the window covers [10:00–10:05). At 10:06, it covers [10:01–10:06). The 4 minutes of overlap produce smoother trends.

Best for: Moving averages, trend lines.

Session Windows

Dynamic-size windows defined by activity gaps. A new window starts when an event arrives after a specified inactivity period.

Example: "Group user clicks into sessions with a 30-minute inactivity timeout." If a user clicks at 10:00, 10:05, 10:10, then nothing until 10:50, the system creates one session [10:00–10:10] and starts a new session at 10:50.

Best for: User engagement analytics, session-based metrics.

Step 4: Handling Late-Arriving Data

Events arrive out of order due to network delays, mobile reconnection, and cross-timezone delivery. A click that occurred at 10:01:30 might arrive at the processor at 10:02:15—after the 10:01 tumbling window has already closed.

Watermarks: A watermark is a timestamp that tells the processor "no events older than this timestamp are expected to arrive." When the watermark advances past a window's end time, the window closes and results are emitted. Watermarks are derived from the event timestamps in the stream—typically set to the oldest unprocessed event minus an allowed lateness (e.g., 30 seconds).

Allowed lateness: Configure a grace period after a window closes. Events arriving within the grace period update the already-emitted result. Events arriving after the grace period are dropped or routed to a dead letter queue for separate processing.

Interview application: "I would set watermarks with 30 seconds of allowed lateness. This means a 1-minute tumbling window closing at 10:01:00 will accept late events until 10:01:30. Events arriving after the grace period are written to a dead letter topic for reconciliation. The trade-off: longer allowed lateness increases accuracy but delays result emission."

Step 5: Exactly-Once Semantics

For counters and financial aggregations, processing an event twice produces incorrect results. Exactly-once semantics ensure each event affects the output exactly once, even in the presence of failures and retries.

Flink's approach: Flink uses distributed snapshots (Chandy-Lamport algorithm) to periodically checkpoint operator state and Kafka consumer offsets atomically. On failure, Flink restores from the latest checkpoint and replays events from Kafka—without duplicating the effect on state.

End-to-end exactly-once: Requires exactly-once from source (Kafka) through processing (Flink) to sink (database). Flink's Kafka connector supports transactional writes to Kafka sinks. For database sinks, use upsert operations (insert-or-update) with idempotent keys to handle duplicate writes gracefully.

Step 6: Scaling and Backpressure

Scaling ingestion: Add Kafka partitions and brokers. Each partition is consumed by one Flink task, so partition count determines processing parallelism.

Scaling processing: Increase Flink parallelism (more task slots). Flink redistributes state across tasks automatically on rescaling. Each task processes a subset of Kafka partitions.

Backpressure handling: When the processing layer is slower than the ingestion rate, events accumulate in Kafka. Kafka's retention policy provides buffering. Flink's backpressure mechanism slows consumption from Kafka when downstream operators are saturated, preventing out-of-memory crashes. Monitor consumer lag (the gap between the latest Kafka offset and the consumer's current offset) as the primary health indicator.

Interview application: "At 100K events/second with 100 Kafka partitions, each Flink task processes 1,000 events/second—well within capacity. If traffic spikes to 500K, I would increase partitions to 200 and Flink parallelism accordingly. Kafka absorbs the burst during the scaling delay. I would monitor consumer lag with Prometheus and alert when lag exceeds 10,000 messages."

For structured practice on real-time analytics and other data-intensive system design problems, Grokking the System Design Interview covers streaming architecture as a core design pattern. For advanced stream processing patterns including exactly-once semantics, multi-region streaming, and production-scale Flink deployments, Grokking the Advanced System Design Interview builds the depth required for L6+ interviews. The system design interview guide provides the broader framework for approaching any system design problem.

Frequently Asked Questions

What is the difference between real-time and batch analytics?

Batch analytics processes data in scheduled intervals (hourly, daily), delivering insights hours after generation. Real-time analytics processes events as they arrive, delivering insights within seconds. Batch is cheaper and simpler; real-time enables immediate decision-making (fraud detection, live dashboards, operational alerts).

Why is Kafka the standard for real-time analytics ingestion?

Kafka provides durable, ordered, replayable message storage. Multiple consumers read the same topic independently. Messages are retained after consumption, enabling replay for reprocessing. Kafka handles millions of events per second. LinkedIn processes 7 trillion+ messages daily through Kafka.

What is the difference between tumbling, sliding, and session windows?

Tumbling windows are fixed-size, non-overlapping (every event in exactly one window). Sliding windows are fixed-size, overlapping (smoother trends). Session windows are dynamic-size, defined by inactivity gaps (user engagement analytics). Choose based on the metric type and desired update frequency.

How do you handle late-arriving data in stream processing?

Use watermarks to define how long the system waits for late events before closing a window. Set an allowed lateness grace period for updates to already-emitted results. Events arriving after the grace period are dropped or routed to a dead letter queue. Longer allowed lateness increases accuracy but delays results.

What is exactly-once semantics and why does it matter?

Exactly-once ensures each event affects the output exactly once, even during failures and retries. Without it, counters double-count and financial aggregations become incorrect. Flink achieves this through distributed snapshots that atomically checkpoint operator state and Kafka offsets.

Flink for true real-time processing with sub-second latency, exactly-once state semantics, and event-time windowing. Spark Structured Streaming for micro-batch processing where latency of 1–10 seconds is acceptable and you already use Spark for batch jobs. Flink is the industry standard for new streaming deployments in 2026.

What database should I use for real-time analytics results?

Redis for live dashboard counters (sub-ms reads, 1–24 hour retention). Apache Druid or ClickHouse for interactive analytical queries (50–500ms, 30 days–1 year retention). TimescaleDB for time-series trend queries. Use multiple stores optimized for different query patterns.

What is backpressure in stream processing?

Backpressure occurs when the processing layer is slower than the ingestion rate. Events accumulate in Kafka (buffering). Flink slows consumption to prevent out-of-memory crashes. Monitor consumer lag as the primary health indicator. Scale processing parallelism when lag consistently grows.

How do I scale a real-time analytics pipeline?

Scale ingestion by adding Kafka partitions and brokers. Scale processing by increasing Flink parallelism (more task slots). Scale storage by adding read replicas or sharding the analytical database. Kafka partition count determines processing parallelism—each partition maps to one Flink task.

What is the Lambda architecture and should I use it?

Lambda runs parallel batch and stream layers, merging results in a serving layer. It provides both accuracy (batch) and timeliness (stream) but doubles operational complexity. Kappa architecture simplifies by processing everything as streams, using Kafka replay for reprocessing instead of a separate batch layer. In 2026, Kappa with Kafka + Flink is preferred for most new systems.

TL;DR

A real-time analytics system follows a four-layer pipeline: ingestion (Kafka—durable, replayable, millions of events/second), processing (Flink—exactly-once semantics, event-time windowing, stateful aggregations), storage (Redis for live counters, Druid/ClickHouse for historical analytics), and serving (WebSocket dashboards, alert engines). Windowing groups unbounded streams into finite intervals: tumbling (fixed, non-overlapping), sliding (overlapping for trends), and session (activity-gap-based). Late-arriving data is handled with watermarks and allowed lateness grace periods. Exactly-once semantics prevent double-counting through Flink's distributed snapshot checkpointing. Scale by increasing Kafka partitions (determines processing parallelism) and Flink task slots. Monitor consumer lag as the primary health metric. Kappa architecture (Kafka + Flink, single streaming layer) is preferred over Lambda (dual batch + stream layers) for new systems in 2026. LinkedIn processes 7T+ messages/day through Kafka; Uber computes real-time surge pricing from millions of location events per second—this architecture handles both.

TAGS
System Design Interview
System Design Fundamentals
CONTRIBUTOR
Design Gurus Team
-

GET YOUR FREE

Coding Questions Catalog

Design Gurus Newsletter - Latest from our Blog
Boost your coding skills with our essential coding questions catalog.
Take a step towards a better tech career now!
Explore Answers
How to write code for API?
How do you manage API versioning in microservices architecture?
Is Splunk in high demand?
What are the 5 rounds of Google interview?
Detecting patterns in recurring system design interview questions
How do I change the author and committer name/email for multiple commits?
Related Courses
Course image
Grokking the Coding Interview: Patterns for Coding Questions
Grokking the Coding Interview Patterns in Java, Python, JS, C++, C#, and Go. The most comprehensive course with 476 Lessons.
4.6
Discounted price for Your Region

$197

Course image
Grokking Modern AI Fundamentals
Master the fundamentals of AI today to lead the tech revolution of tomorrow.
3.9
Discounted price for Your Region

$72

Course image
Grokking Data Structures & Algorithms for Coding Interviews
Unlock Coding Interview Success: Dive Deep into Data Structures and Algorithms.
4
Discounted price for Your Region

$78

Image
One-Stop Portal For Tech Interviews.
Copyright © 2026 Design Gurus, LLC. All rights reserved.