24. Rate Limiting Algorithms

[!NOTE] Rate limiting controls how many requests a client can make in a given time window. It protects against DDoS attacks, prevents abuse, controls costs, and ensures fair usage among users. Every major API (GitHub, Stripe, Twitter, AWS) enforces rate limits.

Token Bucket

Imagine a bucket that holds a maximum number of tokens. Tokens are added at a fixed rate (e.g., 10 per second). Each request consumes one token. If the bucket is empty, the request is rejected (or queued).

Bucket capacity: 10 tokens
Refill rate: 10 tokens/second

Time 0s:   Bucket = 10/10.  Burst of 10 requests → all allowed. Bucket = 0/10.
Time 0.5s: Bucket = 5/10 (refilled 5). 3 requests → allowed. Bucket = 2/10.
Time 1.0s: Bucket = 10/10 (refilled, capped at max).

Pro: Allows short bursts (up to bucket capacity) while enforcing average rate. Simple and memory-efficient.
Con: A burst at the start can exhaust all tokens instantly.
Used by: AWS API Gateway, NGINX.

Leaky Bucket

Requests enter a queue (the "bucket") and are processed at a fixed, constant rate — like water leaking from a hole at the bottom. If the queue is full, new requests are dropped.

Pro: Produces a perfectly smooth, constant output rate — ideal for APIs that must process requests at a predictable pace.
Con: Does not allow bursts at all. A sudden spike of legitimate traffic fills the queue, causing delays.
Used by: Shopify (for smoothing webhook deliveries).

Fixed Window Counter

Divide time into fixed windows (e.g., 1-minute windows). Count requests per window. If the count exceeds the limit, reject.

Window: 12:00:00 – 12:00:59 → Limit: 100 requests
Counter: 87 requests so far → Request #88 allowed
Counter: 100 → Request #101 REJECTED (429 Too Many Requests)

New window starts at 12:01:00 → Counter resets to 0

Pro: Dead simple. O(1) memory per counter.
Con: Boundary spike problem — a user can send 100 requests at 12:00:59 and another 100 at 12:01:00 (200 requests in 2 seconds), doubling the intended rate.

Sliding Window Log

Store the timestamp of every request. When a new request arrives, remove all timestamps older than the window, count the remaining, and compare to the limit.

Pro: Perfectly precise — no boundary spike problem.
Con: Memory-expensive — storing a timestamp for every request adds up fast at high QPS. A user making 10,000 requests/minute stores 10,000 timestamps.

Sliding Window Counter (Hybrid)

Combines fixed window counter with a weighted calculation from the previous window:

Current window (70% elapsed): 35 requestsPrevious window: 80 requests
Weighted count = 80 × (1

0.7) + 35 = 80 × 0.3 + 35 = 59 requestsLimit: 100 → Request allowed

Pro: Good approximation of sliding window without storing individual timestamps. Low memory, no boundary spike.
Con: Approximate, not exact.
Used by: Cloudflare, majority of production rate limiters.

Algorithm Comparison

Algorithm	Memory	Precision	Allows Bursts?	Complexity
Token Bucket	O(1)	Good	Yes (up to bucket size)	Low
Leaky Bucket	O(queue size)	Exact	No (constant rate)	Low
Fixed Window	O(1)	Poor (boundary spike)	Yes (at boundaries)	Trivial
Sliding Window Log	O(N) per user	Perfect	No	Medium
Sliding Window Counter	O(1)	Good approximation	Limited	Low

Distributed Rate Limiting with Redis

In a multi-server deployment, rate limit counters must be shared. Redis is the standard choice because of its speed and atomic operations.

-- Redis Lua script for atomic sliding window counter
local key = KEYS[1]
local window = tonumber(ARGV[1])
local limit = tonumber(ARGV[2])
local now = tonumber(ARGV[3])

-- Remove expired entries
redis.call(''ZREMRANGEBYSCORE'', key, 0, now - window)
-- Count current entries
local count = redis.call(''ZCARD'', key)

if count < limit then
  redis.call(''ZADD'', key, now, now .. math.random())
  redis.call(''EXPIRE'', key, window)
  return 1  -- allowed
else
  return 0  -- rejected
end

The Lua script runs atomically on the Redis server, preventing race conditions between concurrent requests.

HTTP Rate Limit Headers

Standard headers to communicate rate limit status to clients:

X-RateLimit-Limit: 100 — maximum requests allowed in the window
X-RateLimit-Remaining: 47 — requests remaining in the current window
X-RateLimit-Reset: 1625014800 — Unix timestamp when the window resets
Retry-After: 30 — seconds until the client should retry (on 429 response)

Real-World Rate Limits

Service	Limit	Algorithm
GitHub API	5,000 req/hr (authenticated)	Token Bucket
Stripe API	100 req/sec (per key)	Token Bucket
Discord	Per-route + global limits	Token Bucket
Twitter/X API	300 req / 15-min window	Fixed Window

Common Mistakes

❌ Rate limiting only at the application layer — add rate limiting at the API gateway too for DDoS protection.
❌ Using fixed window without understanding the boundary spike — users can double-burst at window boundaries.
❌ Not returning rate limit headers — good API design tells clients their remaining quota.
❌ Non-atomic Redis operations — use Lua scripts or MULTI/EXEC to avoid race conditions.

[!TIP] Key Takeaways:
• Token Bucket: best default — allows bursts, low memory, simple. Used by AWS, NGINX.
• Sliding Window Counter: best precision/memory tradeoff. Used by Cloudflare.
• Fixed Window: simplest but beware boundary spikes.
• Distributed rate limiting: use Redis + Lua for atomic, shared counters.
• Always return X-RateLimit-* headers and 429 Too Many Requests with Retry-After.

Token Bucket

Leaky Bucket

Fixed Window Counter

Sliding Window Log

Sliding Window Counter (Hybrid)

Algorithm Comparison

Distributed Rate Limiting with Redis

HTTP Rate Limit Headers

Real-World Rate Limits

Common Mistakes

Share this article

Test your knowledge

Continue Learning

25. Circuit Breakers & Bulkhead Pattern

26. API Gateway, Proxies & Service Mesh

27. Real-Time Communication