[!NOTE] Back-of-the-envelope estimation is the ability to quickly approximate a system''s scale using simple arithmetic. It is not about precision—it is about getting within the right order of magnitude so you can make informed architectural decisions. Is it 100 QPS or 100,000 QPS? The answer determines whether you need a single server or a distributed cluster.
Powers of 2: The Foundation
Every estimation starts with knowing your units. Memorize this table:
| Power | Approximate Value | Name | Shorthand |
|---|---|---|---|
| 210 | 1 Thousand | Kilobyte (KB) | 1 KB |
| 220 | 1 Million | Megabyte (MB) | 1 MB |
| 230 | 1 Billion | Gigabyte (GB) | 1 GB |
| 240 | 1 Trillion | Terabyte (TB) | 1 TB |
| 250 | 1 Quadrillion | Petabyte (PB) | 1 PB |
Latency Numbers Every Engineer Should Know
These are approximate latencies for common operations (originally compiled by Jeff Dean at Google). You don''t need exact numbers—just the relative order:
| Operation | Latency | Notes |
|---|---|---|
| L1 cache reference | ~0.5 ns | Fastest possible access |
| L2 cache reference | ~7 ns | 14× slower than L1 |
| Main memory (RAM) reference | ~100 ns | The speed of Redis/Memcached |
| SSD random read | ~150 μs | 1,000× slower than RAM |
| HDD seek | ~10 ms | 100× slower than SSD |
| Send 1 KB over 1 Gbps network | ~10 μs | Within same data center |
| Round trip within same data center | ~0.5 ms | Typical microservice call |
| Round trip: US East → US West | ~40 ms | Cross-region replication |
| Round trip: US → Europe | ~80 ms | Transatlantic |
| Round trip: US → India | ~150 ms | Global CDN territory |
[!TIP] The key insight: RAM is ~1,000× faster than SSD. SSD is ~100× faster than HDD. This is why caching (Redis, Memcached) is so effective—you are moving data from the 150μs tier to the 100ns tier. Every time you add a cache layer, you are jumping 3 orders of magnitude in speed.
The Magic Number: 86,400
There are 86,400 seconds in a day (24 × 60 × 60). This is the single most useful number for system design estimation. Round it to ~100,000 for quick math:
- 1 million daily requests ÷ 100,000 ≈ 10 QPS
- 100 million daily requests ÷ 100,000 ≈ 1,000 QPS
- 1 billion daily requests ÷ 100,000 ≈ 10,000 QPS
For peak traffic, multiply average QPS by 2–5× depending on the application. Social media peaks during evenings; e-commerce peaks during sales events.
The Estimation Formulas
QPS Estimation
QPS = (Daily Active Users × Actions per User) / 86,400
Peak QPS = QPS × Peak Multiplier (typically 2–5×)
Example: Twitter
300M DAU × 5 tweet views/day = 1.5 billion views/day
1,500,000,000 / 86,400 ≈ 17,000 QPS (avg)
Peak: 17,000 × 3 ≈ 50,000 QPS
Storage Estimation
Storage = Records/day × Record Size × Retention Period
Example: Instagram Photos
100M photos uploaded/day
Average photo: 2 MB (after compression)
Retention: forever (5 years for estimation)
Storage = 100M × 2 MB × 365 × 5 = 100M × 2 MB × 1,825 = 365 PB (180 PB/year)
Bandwidth Estimation
Bandwidth = QPS × Average Response Size
Example: Video Streaming (Netflix)
10,000 concurrent streams
Average bitrate: 5 Mbps
Bandwidth = 10,000 × 5 Mbps = 50 Gbps outbound
Memory (Cache) Estimation
The 80/20 rule: typically 20% of your data drives 80% of traffic. Cache the hot 20%:
Cache Memory = Daily requests × Average response size × 0.2
Example: URL Shortener
20,000 QPS × 500 bytes × 86,400 seconds × 0.2= ~170 GB of cache
Fits comfortably in a single Redis cluster.
Server Capacity Planning
Knowing how many servers you need is a common interview follow-up:
| Component | Typical Throughput (per node) | Memory |
|---|---|---|
| Web server (Node.js/Go) | ~10,000–50,000 QPS | 1–4 GB |
| PostgreSQL | ~5,000–10,000 QPS (reads) | Varies |
| MySQL (InnoDB) | ~3,000–8,000 QPS (mixed) | Varies |
| Redis | ~100,000–200,000 QPS | Up to 64 GB |
| Kafka (per partition) | ~10,000–100,000 msgs/sec | — |
| Elasticsearch | ~1,000–10,000 searches/sec | 32–64 GB |
| Cassandra | ~10,000–50,000 writes/sec | 16–32 GB |
Servers needed = Peak QPS / Throughput per server
Example: 50,000 peak QPS for a Node.js API
Each server handles ~15,000 QPS
Need: 50,000 / 15,000 ≈ 4 servers
With 2× safety factor: 8 servers
Worked Example 1: Design a Chat System
Let''s estimate the scale for a WhatsApp-like service:
- DAU: 500 million
- Messages per user per day: 40
- Total messages/day: 500M × 40 = 20 billion
- Message QPS: 20B / 86,400 ≈ 230,000 writes/sec
- Average message size: 200 bytes (text)
- Daily storage: 20B × 200 bytes = 4 TB/day
- 5-year storage: 4 TB × 365 × 5 = ~7.3 PB
Architectural insight: 230K writes/sec → need a database optimized for high write throughput (Cassandra, not PostgreSQL). 7.3 PB → need storage tiering (hot SSD for recent, cold HDD/S3 for old messages).
Worked Example 2: Design a URL Shortener
- New URLs per day: 100 million
- Read:write ratio: 100:1 (URLs are created once, read many times)
- Write QPS: 100M / 86,400 ≈ 1,200 writes/sec
- Read QPS: 1,200 × 100 = 120,000 reads/sec
- URL record size: ~500 bytes (short URL + original URL + metadata)
- Daily storage: 100M × 500 bytes = 50 GB/day
- 10-year storage: 50 GB × 365 × 10 = ~180 TB
- Cache: 120K QPS × 500 bytes × 0.2 ≈ 12 GB (easily fits in Redis)
Architectural insight: 120K read QPS → heavy caching with Redis. A single Redis node handles 200K QPS, so one node might suffice. 180 TB → needs partitioned/sharded database (e.g., NoSQL or sharded PostgreSQL).
Worked Example 3: Design a Ride-Sharing App
- DAU: 20 million riders, 2 million drivers
- Rides per day: 10 million
- Location updates per driver: every 3 seconds → 2M × (86,400/3) = 57.6 billion location points/day
- Location QPS: 57.6B / 86,400 ≈ 670,000 writes/sec
- Location record: 100 bytes (lat, lng, timestamp, driver_id)
- Daily location storage: 57.6B × 100 bytes = 5.76 TB/day
Architectural insight: 670K writes/sec for ephemeral location data → use in-memory grid (Redis) for real-time tracking, only persist completed ride trajectories. Location data is write-heavy and temporary — perfect use case for a time-series database or in-memory store.
Quick Reference: Common Interview Estimates
| System | Key Metric | Approximate Value |
|---|---|---|
| Timeline read QPS | ~300K–500K | |
| Photo storage/year | ~70 PB | |
| Message write QPS | ~230K | |
| YouTube | Video upload/min | ~500 hours |
| Google Search | Search QPS | ~70K |
| Netflix | Bandwidth peak | ~100+ Gbps |
| Uber | Location update QPS | ~670K |
Common Mistakes
- ❌ Spending too long on exact math — the interviewer wants order-of-magnitude, not a calculator exercise. Round aggressively.
- ❌ Forgetting peak traffic — average QPS is meaningless without considering 3–5× spikes. Systems crash during peaks, not averages.
- ❌ Confusing bits and bytes — network bandwidth is in bits/s, storage is in bytes. 1 Gbps = 125 MB/s.
- ❌ Not connecting estimates to design — the whole point is: "Based on 230K writes/sec, we need Cassandra, not PostgreSQL." If your estimate doesn''t inform a decision, it''s wasted effort.
- ❌ Ignoring the read:write ratio — this ratio determines whether you invest in read replicas/caching (read-heavy) or write-optimized databases (write-heavy).
- ❌ Forgetting replication overhead — if you replicate data 3×, your actual storage need is 3× whatever you calculated.
[!TIP] Key Takeaways:
• Memorize: 86,400 sec/day, powers of 2, Jeff Dean latency numbers.
• QPS = DAU × actions ÷ 86,400. Multiply by 3–5× for peak.
• Storage = records × size × retention. Remember to multiply by replication factor.
• Cache the hot 20% (80/20 rule).
• Servers needed = Peak QPS ÷ throughput per server × 2 (safety factor).
• Always connect your estimate to an architectural decision — that''s the whole point.