On This Page
Persistent Connections: WebSockets vs Long Polling
Ensuring Reliable Message Delivery (Acknowledgments & Retries)
Supporting Group Chats
Scaling to Millions of Users with Low Latency
FAQs

How to Design a Real-Time Chat Application (WhatsApp/Slack)

This blog explains how to maintain a live connection (WebSockets vs long polling), ensure every message gets delivered (with acknowledgments and retries), support group chats, and scale the system to millions of users with minimal latency.
Real-time messaging apps are part of our daily lives.
Whether it’s coordinating with colleagues on Slack or texting friends on WhatsApp, we expect our messages to be delivered instantaneously and reliably.
But under the hood, supporting millions of concurrent users with minimal delay is a complex challenge.
How do they maintain a steady connection for live updates?
What if a user is offline or there’s a group chat with hundreds of members?
In this article, we’ll break down these challenges in simple terms and design a chat system that meets them.
Persistent Connections: WebSockets vs Long Polling
To deliver messages in real time, a chat app needs a persistent connection between the client (the user’s device) and the server.
Modern systems use WebSockets for this purpose.
A WebSocket is like an open channel that stays alive, allowing the server to push new messages to the client instantly without repeated requests.
WhatsApp uses persistent WebSocket connections from each user’s device to its servers for instant message delivery.
If WebSockets aren’t available (due to older browsers or network restrictions), the app can fall back to HTTP long polling.
With long polling, the client frequently asks the server for updates, and the server responds whenever it has new data.
This approach works for near-real-time updates but is less efficient than WebSockets.
Mobile apps also use push notifications to signal new messages when the user is offline.
WebSockets remain the preferred method, with long polling (and others) only as fallbacks.
Ensuring Reliable Message Delivery (Acknowledgments & Retries)
Sending a message is only half the story – we also need to guarantee it reaches its destination (and inform the sender of that).
Chat apps achieve this with delivery receipts and acknowledgments (acks) at each stage:
-
Sent (Server Ack): When you send a message, the server receives it, saves it (e.g. in a database), and immediately sends an acknowledgment back to your app. Your app now marks the message as “sent” because the server has it.
-
Delivered (Client Ack): The server then delivers the message to the recipient’s device. If the recipient is online, their device receives the message in real time and sends an ack back to the server, confirming delivery. Your app then marks the message as “delivered”.
-
Read (Read Receipt): When the recipient opens the chat and reads the message, their app notifies the server. The server notifies your app that the message was read.
These acknowledgments ensure every message is tracked.
If an ack isn’t received at some step (the recipient was offline and hasn’t gotten the message yet), the system will retry sending.
Messages for an offline user are stored in an offline queue or database until that user comes online.
Once they reconnect, the server delivers all pending messages to them. This combination of persistent storage and acknowledgments ensures that messages won’t get lost, even during outages or offline periods.
Supporting Group Chats
In a group chat, the server needs to deliver each message to all members of the group.
The server typically fans out a copy of the message to every online member immediately, while storing copies for offline members so they can receive them when they come online.
For small groups, the app might show who has read the message, but in very large groups tracking every read receipt is impractical.
Many platforms aggregate or disable per-user read indicators in large groups – the main goal is simply to ensure the message reaches everyone.
Scaling to Millions of Users with Low Latency
Finally, how do we scale this chat system to support millions of concurrent users while keeping latency low?
The solution is to scale out horizontally and optimize each layer of the architecture. Key strategies include:
-
Horizontal Scaling & Load Balancing: Run many chat server instances in parallel and use a load balancer to distribute user connections among them. We often use sticky sessions (or consistent hashing) so each user stays on the same server during a session. As the user base grows, we add more servers behind the load balancer. Servers also communicate among themselves (or via a message bus) so that a user on one server can reach a user on another without issue.
-
Database Sharding & Caching: We partition data (shard by conversation or user ID) so each database shard handles only a fraction of the traffic, and we use in-memory caches for hot data (like recent messages or online status) to reduce database reads.
-
Geo-Distributed Servers: To minimize latency worldwide, we deploy servers in multiple geographic regions. Users connect to the nearest server, reducing message travel time, while the system ensures cross-region messages still get delivered quickly.
-
High Availability & Fault Tolerance: Every critical component (servers, databases, etc.) is replicated. Key data is stored with multiple copies (across different machines or data centers). If one server or database goes down, a backup seamlessly takes over, so the service stays online and no messages are lost.
These are classic challenges in system design interviews.
To explore these concepts further and strengthen your system design skills, check out Grokking System Design Fundamentals or Grokking the System Design Interview.
FAQs
Q1: Why use WebSockets for chat instead of HTTP long polling?
WebSockets allow the server to send data to the client as soon as new information is available, over a single always-open connection. This makes them ideal for real-time chat since messages appear almost instantly without the client constantly requesting updates. Long polling also achieves near-instant updates but with more overhead – the client has to repeatedly ask the server for new messages. Thus, WebSockets are preferred for efficient, low-latency communication, with long polling as a backup when WebSockets aren’t feasible.
Q2: How do chat apps handle offline users to ensure no message is lost?
If a user is offline, the chat server stores their incoming messages. When the user comes back online, the server delivers all pending messages to their device. No messages get lost this way. Plus, delivery acknowledgments tell the server which messages were delivered, so it will keep retrying any that remain unacknowledged.
Q3: How can a chat application scale to millions of concurrent users without lag?
The application must scale horizontally. That means using multiple servers and load balancing rather than relying on one big server. User connections are spread across many servers, and data (like messages and user info) is sharded across databases to distribute the load. Caching frequently accessed data helps reduce database work. By horizontally adding capacity and optimizing each layer (servers and databases), the chat app can handle millions of users in real time without noticeable lag.
What our users say
Arijeet
Just completed the “Grokking the system design interview”. It's amazing and super informative. Have come across very few courses that are as good as this!
Simon Barker
This is what I love about http://designgurus.io’s Grokking the coding interview course. They teach patterns rather than solutions.
Eric
I've completed my first pass of "grokking the System Design Interview" and I can say this was an excellent use of money and time. I've grown as a developer and now know the secrets of how to build these really giant internet systems.