[!NOTE] Rate limiting controls how many requests a client can make in a given time window. It protects against DDoS attacks, prevents abuse, controls costs, and ensures fair usage among users. Every major API (GitHub, Stripe, Twitter, AWS) enforces rate limits.
- Token Bucket
Imagine a bucket that holds a maximum number of tokens. Tokens are added at a fixed rate (e.g., 10 per second). Each request consumes one token. If the bucket is empty, the request is rejected (or queued).
Bucket capacity: 10 tokens
Refill rate: 10 tokens/second
Time 0s: Bucket = 10/10. Burst of 10 requests → all allowed. Bucket = 0/10.
Time 0.5s: Bucket = 5/10 (refilled 5). 3 requests → allowed. Bucket = 2/10.
Time 1.0s: Bucket = 10/10 (refilled, capped at max).
- Pro: Allows short bursts (up to bucket capacity) while enforcing average rate. Simple and memory-efficient.
- Con: A burst at the start can exhaust all tokens instantly.
- Used by: AWS API Gateway, NGINX.
- Leaky Bucket
Requests enter a queue (the "bucket") and are processed at a fixed, constant rate — like water leaking from a hole at the bottom. If the queue is full, new requests are dropped.
- Pro: Produces a perfectly smooth, constant output rate — ideal for APIs that must process requests at a predictable pace.
- Con: Does not allow bursts at all. A sudden spike of legitimate traffic fills the queue, causing delays.
- Used by: Shopify (for smoothing webhook deliveries).
- Fixed Window Counter
Divide time into fixed windows (e.g., 1-minute windows). Count requests per window. If the count exceeds the limit, reject.
Window: 12:00:00 – 12:00:59 → Limit: 100 requests
Counter: 87 requests so far → Request #88 allowed
Counter: 100 → Request #101 REJECTED (429 Too Many Requests)
New window starts at 12:01:00 → Counter resets to 0
- Pro: Dead simple. O(1) memory per counter.
- Con: Boundary spike problem — a user can send 100 requests at 12:00:59 and another 100 at 12:01:00 (200 requests in 2 seconds), doubling the intended rate.
- Sliding Window Log
Store the timestamp of every request. When a new request arrives, remove all timestamps older than the window, count the remaining, and compare to the limit.
- Pro: Perfectly precise — no boundary spike problem.
- Con: Memory-expensive — storing a timestamp for every request adds up fast at high QPS. A user making 10,000 requests/minute stores 10,000 timestamps.
- Sliding Window Counter (Hybrid)
Combines fixed window counter with a weighted calculation from the previous window:
Current window (70% elapsed): 35 requestsPrevious window: 80 requests
Weighted count = 80 × (1
0.7) + 35 = 80 × 0.3 + 35 = 59 requestsLimit: 100 → Request allowed
- Pro: Good approximation of sliding window without storing individual timestamps. Low memory, no boundary spike.
- Con: Approximate, not exact.
- Used by: Cloudflare, majority of production rate limiters.
Algorithm Comparison
| Algorithm | Memory | Precision | Allows Bursts? | Complexity |
|---|---|---|---|---|
| Token Bucket | O(1) | Good | Yes (up to bucket size) | Low |
| Leaky Bucket | O(queue size) | Exact | No (constant rate) | Low |
| Fixed Window | O(1) | Poor (boundary spike) | Yes (at boundaries) | Trivial |
| Sliding Window Log | O(N) per user | Perfect | No | Medium |
| Sliding Window Counter | O(1) | Good approximation | Limited | Low |
Distributed Rate Limiting with Redis
In a multi-server deployment, rate limit counters must be shared. Redis is the standard choice because of its speed and atomic operations.
-- Redis Lua script for atomic sliding window counter
local key = KEYS[1]
local window = tonumber(ARGV[1])
local limit = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
-- Remove expired entries
redis.call(''ZREMRANGEBYSCORE'', key, 0, now - window)
-- Count current entries
local count = redis.call(''ZCARD'', key)
if count < limit then
redis.call(''ZADD'', key, now, now .. math.random())
redis.call(''EXPIRE'', key, window)
return 1 -- allowed
else
return 0 -- rejected
end
The Lua script runs atomically on the Redis server, preventing race conditions between concurrent requests.
HTTP Rate Limit Headers
Standard headers to communicate rate limit status to clients:
X-RateLimit-Limit: 100— maximum requests allowed in the windowX-RateLimit-Remaining: 47— requests remaining in the current windowX-RateLimit-Reset: 1625014800— Unix timestamp when the window resetsRetry-After: 30— seconds until the client should retry (on 429 response)
Real-World Rate Limits
| Service | Limit | Algorithm |
|---|---|---|
| GitHub API | 5,000 req/hr (authenticated) | Token Bucket |
| Stripe API | 100 req/sec (per key) | Token Bucket |
| Discord | Per-route + global limits | Token Bucket |
| Twitter/X API | 300 req / 15-min window | Fixed Window |
Common Mistakes
- ❌ Rate limiting only at the application layer — add rate limiting at the API gateway too for DDoS protection.
- ❌ Using fixed window without understanding the boundary spike — users can double-burst at window boundaries.
- ❌ Not returning rate limit headers — good API design tells clients their remaining quota.
- ❌ Non-atomic Redis operations — use Lua scripts or MULTI/EXEC to avoid race conditions.
[!TIP] Key Takeaways:
• Token Bucket: best default — allows bursts, low memory, simple. Used by AWS, NGINX.
• Sliding Window Counter: best precision/memory tradeoff. Used by Cloudflare.
• Fixed Window: simplest but beware boundary spikes.
• Distributed rate limiting: use Redis + Lua for atomic, shared counters.
• Always returnX-RateLimit-*headers and429 Too Many RequestswithRetry-After.