Skip to content
QuizMaker logoQuizMaker
Activity
System Design: The Complete Guide
2. Intermediate Architecture
1. Introduction to System Design
2. Vertical vs Horizontal Scaling
3. Load Balancing
4. Caching Strategies
5. CDNs (Content Delivery Networks)
6. SQL vs NoSQL
7. Database Sharding & Partitioning
8. The CAP Theorem
9. Microservices Architecture
10. Message Queues & Event Streaming
12. Design BookMyShow (Ticket Booking)
14. Design Dropbox (Cloud File Storage)
15. How to Approach Any System Design Interview
16. Back-of-the-Envelope Estimation
17. Consistent Hashing
18. Bloom Filters & Probabilistic Data Structures
19. Database Replication
20. Leader Election & Consensus (Raft & Paxos)
21. Distributed Transactions (Saga, 2PC, Outbox)
22. Event Sourcing & CQRS
23. Unique ID Generation at Scale
24. Rate Limiting Algorithms
25. Circuit Breakers & Bulkhead Pattern
26. API Gateway, Proxies & Service Mesh
27. Real-Time Communication
28. Observability (Tracing, Logging, SLOs)
30. Design a Chat System (WhatsApp)
31. Design YouTube (Video Streaming)
32. Design a Web Crawler
CONTENTS

10. Message Queues & Event Streaming

Asynchronous processing with RabbitMQ and Kafka.

Feb 22, 20263 views0 likes0 fires
18px

The Problem with Synchronous Processing

Some operations are inherently slow: PDF generation, video transcoding, email fanout to millions of users, payment processing with third-party gateways. If you perform this work inside the HTTP request path, the user stares at a spinner for 30 seconds and your server capacity collapses under load.

Real-World Example: YouTube Video Uploads

When you upload a video to YouTube, you get a success message almost immediately. But in the background, YouTube''s pipeline is transcoding your video into dozens of formats and resolutions (144p through 4K, VP9, AV1, H.264). This takes minutes, even hours for long videos. The upload API just enqueues the transcoding job and returns—the heavy lifting happens asynchronously via message queues.

The Async Pattern

  1. API validates the request and enqueues a job/event to the message queue.
  2. API returns HTTP 202 Accepted immediately to the user.
  3. Background workers consume and process jobs at their own pace.

[!NOTE] Queues act like shock absorbers. They protect your live API from traffic spikes by buffering work and letting workers process at a sustainable rate. During Black Friday, Amazon enqueues order processing events rather than trying to handle everything synchronously.

Traditional Queues vs Event Streams

Message Queues (RabbitMQ, AWS SQS, ActiveMQ)

  • Model: Each message is processed exactly once by one consumer, then deleted.
  • Retention: Messages are removed after acknowledgment. Failed jobs retry or move to a Dead Letter Queue (DLQ).
  • Best for: Background jobs, email/SMS notifications, retry logic, rate smoothing.

Stripe''s Example: When you make a payment, Stripe''s webhook system uses message queues to reliably deliver payment events to merchant servers. If a merchant''s server is down, the queue retries with exponential backoff for up to 72 hours.

Event Streaming (Apache Kafka, AWS Kinesis, Pulsar)

  • Model: Durable, append-only event log. Messages persist and multiple consumers can read independently at their own offsets.
  • Retention: Events remain for a configurable duration (days, months, or forever). Consumers track their own position.
  • Best for: Analytics pipelines, Change Data Capture (CDC), event-driven architectures, real-time stream processing.

LinkedIn (Kafka''s Birthplace): LinkedIn created Apache Kafka to handle their massive data pipeline. Today, LinkedIn processes over 7 trillion messages per day through Kafka—tracking user activity, search queries, and recommendations in real-time across hundreds of consumers.

Pub/Sub Pattern

In the Publish/Subscribe pattern, publishers emit events without knowing who will consume them. Multiple subscribers can independently listen to the same topic and process events for their own purposes.

Example: When a new user signs up, a single "UserCreated" event might trigger:

  • The Email Service sends a welcome email.
  • The Analytics Service records the signup.
  • The Recommendation Service initializes suggestions.
  • The CRM Service creates a customer record.

No service knows about the others. This extreme decoupling is why event-driven architectures scale so well organizationally.

Delivery Semantics and Idempotency

  • At-most-once: Messages may be lost, but never duplicated. Acceptable for non-critical analytics.
  • At-least-once: No message loss, but duplicates are possible. Most common in production.
  • Exactly-once: Extremely hard to achieve. Usually approximated with idempotent consumers + transactional outbox pattern.

Critical design principle: Always make consumers idempotent—processing the same message twice should produce the same final state. Use unique event IDs and database constraints to detect and skip duplicates.

Operational Must-Haves

  • Timeouts + retries with exponential backoff—avoid retry storms that amplify failures.
  • Dead Letter Queues (DLQ)—capture poison messages that consistently fail, for manual investigation.
  • Backpressure and rate limiting—when consumers lag, slow down producers rather than crashing consumers.
  • Observability—correlation IDs across services, consumer lag metrics, processing time dashboards. Uber uses custom tooling to monitor millions of Kafka consumer offsets in real-time.

[!TIP] Interview tip: Whenever you see "send notifications to users" or "process this in the background" in a system design question, immediately mention a message queue. It shows you understand async processing, which is fundamental to every production system at scale.

Share this article

Share on TwitterShare on LinkedInShare on FacebookShare on WhatsAppShare on Email

Test your knowledge

Take a quick quiz based on this chapter.

hardSystem Design
Quiz: Queues
5 questions5 min

Continue Learning

12. Design BookMyShow (Ticket Booking)

Advanced
24 min

14. Design Dropbox (Cloud File Storage)

Advanced
7 min
Lesson 5 of 5 in 2. Intermediate Architecture
Previous in 2. Intermediate Architecture
9. Microservices Architecture
Completed
You finished this lesson → take the quiz
5 questions • 5 min
Next section: 3. Advanced Case Studies
← Back to System Design: The Complete Guide
Back to System Design: The Complete GuideAll Categories