[!NOTE] A chat system like WhatsApp handles 100 billion messages per day with real-time delivery, offline support, and end-to-end encryption. This is one of the most popular system design interview questions because it covers WebSockets, message queuing, presence tracking, and persistence — a comprehensive systems design challenge.
Step 1: Requirements
| Feature | Requirement |
|---|---|
| 1:1 messaging | Real-time, with offline delivery |
| Group chat | Up to 500 members |
| Presence (online/offline) | Near-real-time status |
| Read receipts | Delivered + Read indicators |
| Push notifications | For offline users |
| Scale | 50M DAU, 100B messages/day |
Step 2: High-Level Design (v1)
┌──────────────┐
[User A] ──WebSocket──→ │ │
│ Chat Server │
[User B] ──WebSocket──→ │ (Stateful) │
│ │
└──────┬───────┘
│
┌──────▼───────┐
│ Database │
│(msg storage) │
└──────────────┘
This works for a small scale, but a single server cannot handle millions of concurrent WebSocket connections. We need to scale out.
Step 3: Scaled Architecture (v2)
┌──────────────────┐
│ Service Discovery │
│ (which server has │
│ which user?) │
└────────┬─────────┘
│
[User A] ──WS──→ [Chat Server 1]
│
[Message Queue / Redis Pub/Sub]
│
[User B] ──WS──→ [Chat Server 2]
│
┌────▼─────┐ ┌──────────┐
│ Message │ │ Push │
│ Store │ │ Service │
│ (Cassandra)│ │ (FCM/APNs)│
└──────────┘ └──────────┘
Message Flow (1:1)
- User A sends a message via WebSocket to Chat Server 1.
- Chat Server 1 persists the message to the message store (status: SENT).
- Chat Server 1 looks up which server User B is connected to (via service discovery / Redis).
- If User B is online: route message via Redis pub/sub to Chat Server 2 → deliver via WebSocket → update status to DELIVERED.
- If User B is offline: send a push notification via FCM/APNs. When User B comes online, fetch undelivered messages from the message store.
Message Flow (Group Chat)
For a group with 500 members:
- Fan-out on write: When User A sends a message to a group, write a copy to each member''s inbox. Expensive for large groups, but read is O(1).
- Fan-out on read: Store the message once, and when each member reads, they query the group''s message feed. Cheaper writes, but reads are more expensive.
WhatsApp uses fan-out on write for small groups (<256 members) and fan-out on read for larger groups. This is a common hybrid approach.
Step 4: Data Model
Messages table (Cassandra — partitioned by conversation_id):
┌───────────────────┬─────────────────┬──────────┬──────────┬─────────┐
│ conversation_id │ message_id │ sender_id│ content │ status │
│ (partition key) │ (clustering key)│ │ │ │
├───────────────────┼─────────────────┼──────────┼──────────┼─────────┤
│ conv_ab_123 │ snowflake_001 │ user_a │ "Hello" │DELIVERED│
│ conv_ab_123 │ snowflake_002 │ user_b │ "Hi!" │ READ │
└───────────────────┴─────────────────┴──────────┴──────────┴─────────┘
Sorting: message_id (Snowflake) sorts by time automatically.
Why Cassandra? Chat messages are write-heavy (100B writes/day), partitioned by conversation (each conversation is a natural partition), and queried in time order (Snowflake IDs sort naturally). Cassandra excels at all three.
Step 5: Presence System
Showing "online" / "last seen" for each user:
- Heartbeat approach: Each connected client sends a heartbeat every 30 seconds. If no heartbeat for 60 seconds, mark as offline.
- Store presence in Redis:
presence:{user_id} → {server_id, last_heartbeat}with a TTL of 60s. - When a friend opens a chat, check Redis for that user''s presence status.
Optimization for groups: Don''t broadcast individual presence updates to all 500 members. Instead, lazy-load presence when a user opens the group info screen.
Step 6: Read Receipts
Message lifecycle:
SENT → DELIVERED → READ
User A sends message (status: SENT)
Server stores and delivers to User B (status: DELIVERED, notify User A)
User B opens the chat (status: READ, notify User A)
Read receipts are sent back as lightweight WebSocket events. For group chats, aggregate: "Read by 15 of 20 members."
Message Delivery State Machine
Message Lifecycle:
[Client A sends]
↓
SENT (stored on server, ACK sent to Client A)
↓
DELIVERED (pushed to Client B''s device, ACK sent back)
↓
READ (Client B opens conversation, read receipt sent)
Server States per message:
status: PENDING → SENT → DELIVERED → READ
Each transition triggers a push to Client A (via WebSocket if online)
Offline handling:
If Client B is offline → message stays as SENT
When Client B reconnects → server pushes all SENT messages
Client B ACKs each → status moves to DELIVERED
Cassandra Schema for Messages
CREATE TABLE messages ( conversation_id UUID, message_id TIMEUUID, -- Snowflake-style, time-sortable sender_id UUID, content TEXT, content_type TEXT, -- ''text'', ''image'', ''video'' created_at TIMESTAMP, PRIMARY KEY (conversation_id, message_id)) WITH CLUSTERING ORDER BY (message_id DESC);
-- Partition key: conversation_id-- → All messages in a conversation are on the same node-- → Efficient range queries: "get messages after ID X"
-- Query: Get latest 50 messages in a conversationSELECT
FROM messagesWHERE conversation_id = ?ORDER BY message_id DESCLIMIT 50;
Group Chat: Fan-Out Strategy
| Group Size | Strategy | How It Works |
|---|---|---|
| Small (2-100 members) | Fan-out on write | Server pushes message to each member''s inbox immediately. Fast reads. |
| Large (100-10K members) | Fan-out on read | Message stored once in group''s channel. Members pull on open. Saves writes. |
| Broadcast (10K+) | Pub/Sub channel | Members subscribe. Only online users receive in real-time. Others pull on reconnect. |
WhatsApp''s approach: Groups are limited to 1,024 members. They use fan-out on write for all groups, keeping the architecture simple. The cap avoids the complexity of large-group fan-out.
End-to-End Encryption (E2EE)
WhatsApp and Signal use the Signal Protocol for E2EE. Key insight: the server never sees plaintext messages. It only stores encrypted blobs. This means the server cannot search, moderate, or read message content. The tradeoff: server-side features like search are impossible without client-side indexing.
Message flow with E2EE:
Client A encrypts message with Client B''s public key
Encrypted message → Server (stores encrypted blob)
Server pushes encrypted blob to Client B
Client B decrypts with their private key Server sees: "aGVsbG8gd29ybGQ=" (encrypted) Server cannot: search, moderate, recommend, or read content
Common Mistakes
- ❌ Using a relational database for messages — SQL databases struggle with the write volume and partition requirements of chat. Use Cassandra, ScyllaDB, or similar.
- ❌ Storing messages only in memory — users expect to see message history when they reinstall the app. Always persist to durable storage.
- ❌ Broadcasting presence to all contacts — if a user has 1,000 contacts, going online sends 1,000 updates. Use lazy presence loading.
- ❌ Not handling message ordering — use Snowflake/TIMEUUID IDs for consistent ordering across distributed servers.
- ❌ No offline message queue — if a user is offline, messages must be queued and delivered on reconnect. Don''t drop them.
[!TIP] Key Takeaways:
• WebSocket for real-time delivery; push notifications for offline users.
• Redis pub/sub to route messages between chat servers.
• Cassandra for message storage: write-optimized, partitioned by conversation, time-sorted.
• Fan-out on write for small groups, fan-out on read for large groups.
• Message state machine: SENT → DELIVERED → READ. Each transition triggers a push update.
• E2EE means the server only stores encrypted blobs — cannot search or moderate.
• Presence via heartbeats + Redis TTL. Lazy-load for groups.