Skip to content
QuizMaker logoQuizMaker
Activity
System Design: The Complete Guide
5. Advanced Concepts & Patterns
1. Introduction to System Design
2. Vertical vs Horizontal Scaling
3. Load Balancing
4. Caching Strategies
5. CDNs (Content Delivery Networks)
6. SQL vs NoSQL
7. Database Sharding & Partitioning
8. The CAP Theorem
9. Microservices Architecture
10. Message Queues & Event Streaming
12. Design BookMyShow (Ticket Booking)
14. Design Dropbox (Cloud File Storage)
15. How to Approach Any System Design Interview
16. Back-of-the-Envelope Estimation
17. Consistent Hashing
18. Bloom Filters & Probabilistic Data Structures
19. Database Replication
20. Leader Election & Consensus (Raft & Paxos)
21. Distributed Transactions (Saga, 2PC, Outbox)
22. Event Sourcing & CQRS
23. Unique ID Generation at Scale
24. Rate Limiting Algorithms
25. Circuit Breakers & Bulkhead Pattern
26. API Gateway, Proxies & Service Mesh
27. Real-Time Communication
28. Observability (Tracing, Logging, SLOs)
30. Design a Chat System (WhatsApp)
31. Design YouTube (Video Streaming)
32. Design a Web Crawler
CONTENTS

24. Rate Limiting Algorithms

Five algorithms to control request flow, prevent abuse, and protect your systems from overload.

Mar 5, 202620 views0 likes0 fires
18px

[!NOTE] Rate limiting controls how many requests a client can make in a given time window. It protects against DDoS attacks, prevents abuse, controls costs, and ensures fair usage among users. Every major API (GitHub, Stripe, Twitter, AWS) enforces rate limits.

  1. Token Bucket

Imagine a bucket that holds a maximum number of tokens. Tokens are added at a fixed rate (e.g., 10 per second). Each request consumes one token. If the bucket is empty, the request is rejected (or queued).

Bucket capacity: 10 tokens
Refill rate: 10 tokens/second

Time 0s:   Bucket = 10/10.  Burst of 10 requests → all allowed. Bucket = 0/10.
Time 0.5s: Bucket = 5/10 (refilled 5). 3 requests → allowed. Bucket = 2/10.
Time 1.0s: Bucket = 10/10 (refilled, capped at max).
  • Pro: Allows short bursts (up to bucket capacity) while enforcing average rate. Simple and memory-efficient.
  • Con: A burst at the start can exhaust all tokens instantly.
  • Used by: AWS API Gateway, NGINX.

  1. Leaky Bucket

Requests enter a queue (the "bucket") and are processed at a fixed, constant rate — like water leaking from a hole at the bottom. If the queue is full, new requests are dropped.

  • Pro: Produces a perfectly smooth, constant output rate — ideal for APIs that must process requests at a predictable pace.
  • Con: Does not allow bursts at all. A sudden spike of legitimate traffic fills the queue, causing delays.
  • Used by: Shopify (for smoothing webhook deliveries).

  1. Fixed Window Counter

Divide time into fixed windows (e.g., 1-minute windows). Count requests per window. If the count exceeds the limit, reject.

Window: 12:00:00 – 12:00:59 → Limit: 100 requests
Counter: 87 requests so far → Request #88 allowed
Counter: 100 → Request #101 REJECTED (429 Too Many Requests)

New window starts at 12:01:00 → Counter resets to 0
  • Pro: Dead simple. O(1) memory per counter.
  • Con: Boundary spike problem — a user can send 100 requests at 12:00:59 and another 100 at 12:01:00 (200 requests in 2 seconds), doubling the intended rate.

  1. Sliding Window Log

Store the timestamp of every request. When a new request arrives, remove all timestamps older than the window, count the remaining, and compare to the limit.

  • Pro: Perfectly precise — no boundary spike problem.
  • Con: Memory-expensive — storing a timestamp for every request adds up fast at high QPS. A user making 10,000 requests/minute stores 10,000 timestamps.

  1. Sliding Window Counter (Hybrid)

Combines fixed window counter with a weighted calculation from the previous window:

Current window (70% elapsed): 35 requestsPrevious window: 80 requests
Weighted count = 80 × (1

0.7) + 35 = 80 × 0.3 + 35 = 59 requestsLimit: 100 → Request allowed

  • Pro: Good approximation of sliding window without storing individual timestamps. Low memory, no boundary spike.
  • Con: Approximate, not exact.
  • Used by: Cloudflare, majority of production rate limiters.

Algorithm Comparison

AlgorithmMemoryPrecisionAllows Bursts?Complexity
Token BucketO(1)GoodYes (up to bucket size)Low
Leaky BucketO(queue size)ExactNo (constant rate)Low
Fixed WindowO(1)Poor (boundary spike)Yes (at boundaries)Trivial
Sliding Window LogO(N) per userPerfectNoMedium
Sliding Window CounterO(1)Good approximationLimitedLow

Distributed Rate Limiting with Redis

In a multi-server deployment, rate limit counters must be shared. Redis is the standard choice because of its speed and atomic operations.

-- Redis Lua script for atomic sliding window counter
local key = KEYS[1]
local window = tonumber(ARGV[1])
local limit = tonumber(ARGV[2])
local now = tonumber(ARGV[3])

-- Remove expired entries
redis.call(''ZREMRANGEBYSCORE'', key, 0, now - window)
-- Count current entries
local count = redis.call(''ZCARD'', key)

if count < limit then
  redis.call(''ZADD'', key, now, now .. math.random())
  redis.call(''EXPIRE'', key, window)
  return 1  -- allowed
else
  return 0  -- rejected
end

The Lua script runs atomically on the Redis server, preventing race conditions between concurrent requests.

HTTP Rate Limit Headers

Standard headers to communicate rate limit status to clients:

  • X-RateLimit-Limit: 100 — maximum requests allowed in the window
  • X-RateLimit-Remaining: 47 — requests remaining in the current window
  • X-RateLimit-Reset: 1625014800 — Unix timestamp when the window resets
  • Retry-After: 30 — seconds until the client should retry (on 429 response)

Real-World Rate Limits

ServiceLimitAlgorithm
GitHub API5,000 req/hr (authenticated)Token Bucket
Stripe API100 req/sec (per key)Token Bucket
DiscordPer-route + global limitsToken Bucket
Twitter/X API300 req / 15-min windowFixed Window

Common Mistakes

  • ❌ Rate limiting only at the application layer — add rate limiting at the API gateway too for DDoS protection.
  • ❌ Using fixed window without understanding the boundary spike — users can double-burst at window boundaries.
  • ❌ Not returning rate limit headers — good API design tells clients their remaining quota.
  • ❌ Non-atomic Redis operations — use Lua scripts or MULTI/EXEC to avoid race conditions.

[!TIP] Key Takeaways:
• Token Bucket: best default — allows bursts, low memory, simple. Used by AWS, NGINX.
• Sliding Window Counter: best precision/memory tradeoff. Used by Cloudflare.
• Fixed Window: simplest but beware boundary spikes.
• Distributed rate limiting: use Redis + Lua for atomic, shared counters.
• Always return X-RateLimit-* headers and 429 Too Many Requests with Retry-After.

Share this article

Share on TwitterShare on LinkedInShare on FacebookShare on WhatsAppShare on Email

Test your knowledge

Take a quick quiz based on this chapter.

mediumSystem Design
Quiz: Rate Limiting
5 questions5 min

Continue Learning

25. Circuit Breakers & Bulkhead Pattern

Intermediate
14 min

26. API Gateway, Proxies & Service Mesh

Intermediate
14 min

27. Real-Time Communication

Intermediate
14 min
Lesson 8 of 12 in 5. Advanced Concepts & Patterns
Previous in 5. Advanced Concepts & Patterns
23. Unique ID Generation at Scale
Next in 5. Advanced Concepts & Patterns
25. Circuit Breakers & Bulkhead Pattern
← Back to System Design: The Complete Guide
Back to System Design: The Complete GuideAll Categories