16. Back-of-the-Envelope Estimation

[!NOTE] Back-of-the-envelope estimation is the ability to quickly approximate a system''s scale using simple arithmetic. It is not about precision—it is about getting within the right order of magnitude so you can make informed architectural decisions. Is it 100 QPS or 100,000 QPS? The answer determines whether you need a single server or a distributed cluster.

Powers of 2: The Foundation

Every estimation starts with knowing your units. Memorize this table:

Power	Approximate Value	Name	Shorthand
2¹⁰	1 Thousand	Kilobyte (KB)	1 KB
2²⁰	1 Million	Megabyte (MB)	1 MB
2³⁰	1 Billion	Gigabyte (GB)	1 GB
2⁴⁰	1 Trillion	Terabyte (TB)	1 TB
2⁵⁰	1 Quadrillion	Petabyte (PB)	1 PB

Latency Numbers Every Engineer Should Know

These are approximate latencies for common operations (originally compiled by Jeff Dean at Google). You don''t need exact numbers—just the relative order:

Operation	Latency	Notes
L1 cache reference	~0.5 ns	Fastest possible access
L2 cache reference	~7 ns	14× slower than L1
Main memory (RAM) reference	~100 ns	The speed of Redis/Memcached
SSD random read	~150 μs	1,000× slower than RAM
HDD seek	~10 ms	100× slower than SSD
Send 1 KB over 1 Gbps network	~10 μs	Within same data center
Round trip within same data center	~0.5 ms	Typical microservice call
Round trip: US East → US West	~40 ms	Cross-region replication
Round trip: US → Europe	~80 ms	Transatlantic
Round trip: US → India	~150 ms	Global CDN territory

[!TIP] The key insight: RAM is ~1,000× faster than SSD. SSD is ~100× faster than HDD. This is why caching (Redis, Memcached) is so effective—you are moving data from the 150μs tier to the 100ns tier. Every time you add a cache layer, you are jumping 3 orders of magnitude in speed.

The Magic Number: 86,400

There are 86,400 seconds in a day (24 × 60 × 60). This is the single most useful number for system design estimation. Round it to ~100,000 for quick math:

1 million daily requests ÷ 100,000 ≈ 10 QPS
100 million daily requests ÷ 100,000 ≈ 1,000 QPS
1 billion daily requests ÷ 100,000 ≈ 10,000 QPS

For peak traffic, multiply average QPS by 2–5× depending on the application. Social media peaks during evenings; e-commerce peaks during sales events.

The Estimation Formulas

QPS Estimation

QPS = (Daily Active Users × Actions per User) / 86,400
Peak QPS = QPS × Peak Multiplier (typically 2–5×)
Example: Twitter

300M DAU × 5 tweet views/day = 1.5 billion views/day
1,500,000,000 / 86,400 ≈ 17,000 QPS (avg)
Peak: 17,000 × 3 ≈ 50,000 QPS

Storage Estimation

Storage = Records/day × Record Size × Retention Period
Example: Instagram Photos

100M photos uploaded/day
Average photo: 2 MB (after compression)
Retention: forever (5 years for estimation)
Storage = 100M × 2 MB × 365 × 5   = 100M × 2 MB × 1,825   = 365 PB (180 PB/year)

Bandwidth Estimation

Bandwidth = QPS × Average Response Size
Example: Video Streaming (Netflix)

10,000 concurrent streams
Average bitrate: 5 Mbps
Bandwidth = 10,000 × 5 Mbps = 50 Gbps outbound

Memory (Cache) Estimation

The 80/20 rule: typically 20% of your data drives 80% of traffic. Cache the hot 20%:

Cache Memory = Daily requests × Average response size × 0.2
Example: URL Shortener

20,000 QPS × 500 bytes × 86,400 seconds × 0.2= ~170 GB of cache
Fits comfortably in a single Redis cluster.

Server Capacity Planning

Knowing how many servers you need is a common interview follow-up:

Component	Typical Throughput (per node)	Memory
Web server (Node.js/Go)	~10,000–50,000 QPS	1–4 GB
PostgreSQL	~5,000–10,000 QPS (reads)	Varies
MySQL (InnoDB)	~3,000–8,000 QPS (mixed)	Varies
Redis	~100,000–200,000 QPS	Up to 64 GB
Kafka (per partition)	~10,000–100,000 msgs/sec	—
Elasticsearch	~1,000–10,000 searches/sec	32–64 GB
Cassandra	~10,000–50,000 writes/sec	16–32 GB

Servers needed = Peak QPS / Throughput per server
Example: 50,000 peak QPS for a Node.js API

Each server handles ~15,000 QPS
Need: 50,000 / 15,000 ≈ 4 servers
With 2× safety factor: 8 servers

Worked Example 1: Design a Chat System

Let''s estimate the scale for a WhatsApp-like service:

DAU: 500 million
Messages per user per day: 40
Total messages/day: 500M × 40 = 20 billion
Message QPS: 20B / 86,400 ≈ 230,000 writes/sec
Average message size: 200 bytes (text)
Daily storage: 20B × 200 bytes = 4 TB/day
5-year storage: 4 TB × 365 × 5 = ~7.3 PB

Architectural insight: 230K writes/sec → need a database optimized for high write throughput (Cassandra, not PostgreSQL). 7.3 PB → need storage tiering (hot SSD for recent, cold HDD/S3 for old messages).

Worked Example 2: Design a URL Shortener

New URLs per day: 100 million
Read:write ratio: 100:1 (URLs are created once, read many times)
Write QPS: 100M / 86,400 ≈ 1,200 writes/sec
Read QPS: 1,200 × 100 = 120,000 reads/sec
URL record size: ~500 bytes (short URL + original URL + metadata)
Daily storage: 100M × 500 bytes = 50 GB/day
10-year storage: 50 GB × 365 × 10 = ~180 TB
Cache: 120K QPS × 500 bytes × 0.2 ≈ 12 GB (easily fits in Redis)

Architectural insight: 120K read QPS → heavy caching with Redis. A single Redis node handles 200K QPS, so one node might suffice. 180 TB → needs partitioned/sharded database (e.g., NoSQL or sharded PostgreSQL).

DAU: 20 million riders, 2 million drivers
Rides per day: 10 million
Location updates per driver: every 3 seconds → 2M × (86,400/3) = 57.6 billion location points/day
Location QPS: 57.6B / 86,400 ≈ 670,000 writes/sec
Location record: 100 bytes (lat, lng, timestamp, driver_id)
Daily location storage: 57.6B × 100 bytes = 5.76 TB/day

Architectural insight: 670K writes/sec for ephemeral location data → use in-memory grid (Redis) for real-time tracking, only persist completed ride trajectories. Location data is write-heavy and temporary — perfect use case for a time-series database or in-memory store.

Quick Reference: Common Interview Estimates

System	Key Metric	Approximate Value
Twitter	Timeline read QPS	~300K–500K
Instagram	Photo storage/year	~70 PB
WhatsApp	Message write QPS	~230K
YouTube	Video upload/min	~500 hours
Google Search	Search QPS	~70K
Netflix	Bandwidth peak	~100+ Gbps
Uber	Location update QPS	~670K

Common Mistakes

❌ Spending too long on exact math — the interviewer wants order-of-magnitude, not a calculator exercise. Round aggressively.
❌ Forgetting peak traffic — average QPS is meaningless without considering 3–5× spikes. Systems crash during peaks, not averages.
❌ Confusing bits and bytes — network bandwidth is in bits/s, storage is in bytes. 1 Gbps = 125 MB/s.
❌ Not connecting estimates to design — the whole point is: "Based on 230K writes/sec, we need Cassandra, not PostgreSQL." If your estimate doesn''t inform a decision, it''s wasted effort.
❌ Ignoring the read:write ratio — this ratio determines whether you invest in read replicas/caching (read-heavy) or write-optimized databases (write-heavy).
❌ Forgetting replication overhead — if you replicate data 3×, your actual storage need is 3× whatever you calculated.

[!TIP] Key Takeaways:
• Memorize: 86,400 sec/day, powers of 2, Jeff Dean latency numbers.
• QPS = DAU × actions ÷ 86,400. Multiply by 3–5× for peak.
• Storage = records × size × retention. Remember to multiply by replication factor.
• Cache the hot 20% (80/20 rule).
• Servers needed = Peak QPS ÷ throughput per server × 2 (safety factor).
• Always connect your estimate to an architectural decision — that''s the whole point.

Powers of 2: The Foundation

Latency Numbers Every Engineer Should Know

The Magic Number: 86,400

The Estimation Formulas

QPS Estimation

Storage Estimation

Bandwidth Estimation

Memory (Cache) Estimation

Server Capacity Planning

Worked Example 1: Design a Chat System

Worked Example 2: Design a URL Shortener

Quick Reference: Common Interview Estimates

Common Mistakes

Share this article

Test your knowledge

Continue Learning

17. Consistent Hashing

18. Bloom Filters & Probabilistic Data Structures

19. Database Replication

16. Back-of-the-Envelope Estimation

Powers of 2: The Foundation

Latency Numbers Every Engineer Should Know

The Magic Number: 86,400

The Estimation Formulas

QPS Estimation

Storage Estimation

Bandwidth Estimation

Memory (Cache) Estimation

Server Capacity Planning

Worked Example 1: Design a Chat System

Worked Example 2: Design a URL Shortener

Worked Example 3: Design a Ride-Sharing App

Quick Reference: Common Interview Estimates

Common Mistakes

Share this article

Test your knowledge

Continue Learning

17. Consistent Hashing

18. Bloom Filters & Probabilistic Data Structures

19. Database Replication