Skip to content
QuizMaker logoQuizMaker
Activity
System Design: The Complete Guide
4. Interview Essentials
1. Introduction to System Design
2. Vertical vs Horizontal Scaling
3. Load Balancing
4. Caching Strategies
5. CDNs (Content Delivery Networks)
6. SQL vs NoSQL
7. Database Sharding & Partitioning
8. The CAP Theorem
9. Microservices Architecture
10. Message Queues & Event Streaming
12. Design BookMyShow (Ticket Booking)
14. Design Dropbox (Cloud File Storage)
15. How to Approach Any System Design Interview
16. Back-of-the-Envelope Estimation
17. Consistent Hashing
18. Bloom Filters & Probabilistic Data Structures
19. Database Replication
20. Leader Election & Consensus (Raft & Paxos)
21. Distributed Transactions (Saga, 2PC, Outbox)
22. Event Sourcing & CQRS
23. Unique ID Generation at Scale
24. Rate Limiting Algorithms
25. Circuit Breakers & Bulkhead Pattern
26. API Gateway, Proxies & Service Mesh
27. Real-Time Communication
28. Observability (Tracing, Logging, SLOs)
30. Design a Chat System (WhatsApp)
31. Design YouTube (Video Streaming)
32. Design a Web Crawler
CONTENTS

16. Back-of-the-Envelope Estimation

The numbers every engineer should know and the math shortcuts to estimate any system's scale in 5 minutes.

Mar 5, 20265 views0 likes0 fires
18px

[!NOTE] Back-of-the-envelope estimation is the ability to quickly approximate a system''s scale using simple arithmetic. It is not about precision—it is about getting within the right order of magnitude so you can make informed architectural decisions. Is it 100 QPS or 100,000 QPS? The answer determines whether you need a single server or a distributed cluster.

Powers of 2: The Foundation

Every estimation starts with knowing your units. Memorize this table:

PowerApproximate ValueNameShorthand
2101 ThousandKilobyte (KB)1 KB
2201 MillionMegabyte (MB)1 MB
2301 BillionGigabyte (GB)1 GB
2401 TrillionTerabyte (TB)1 TB
2501 QuadrillionPetabyte (PB)1 PB

Latency Numbers Every Engineer Should Know

These are approximate latencies for common operations (originally compiled by Jeff Dean at Google). You don''t need exact numbers—just the relative order:

OperationLatencyNotes
L1 cache reference~0.5 nsFastest possible access
L2 cache reference~7 ns14× slower than L1
Main memory (RAM) reference~100 nsThe speed of Redis/Memcached
SSD random read~150 μs1,000× slower than RAM
HDD seek~10 ms100× slower than SSD
Send 1 KB over 1 Gbps network~10 μsWithin same data center
Round trip within same data center~0.5 msTypical microservice call
Round trip: US East → US West~40 msCross-region replication
Round trip: US → Europe~80 msTransatlantic
Round trip: US → India~150 msGlobal CDN territory

[!TIP] The key insight: RAM is ~1,000× faster than SSD. SSD is ~100× faster than HDD. This is why caching (Redis, Memcached) is so effective—you are moving data from the 150μs tier to the 100ns tier. Every time you add a cache layer, you are jumping 3 orders of magnitude in speed.

The Magic Number: 86,400

There are 86,400 seconds in a day (24 × 60 × 60). This is the single most useful number for system design estimation. Round it to ~100,000 for quick math:

  • 1 million daily requests ÷ 100,000 ≈ 10 QPS
  • 100 million daily requests ÷ 100,000 ≈ 1,000 QPS
  • 1 billion daily requests ÷ 100,000 ≈ 10,000 QPS

For peak traffic, multiply average QPS by 2–5× depending on the application. Social media peaks during evenings; e-commerce peaks during sales events.

The Estimation Formulas

QPS Estimation

QPS = (Daily Active Users × Actions per User) / 86,400
Peak QPS = QPS × Peak Multiplier (typically 2–5×)
Example: Twitter

300M DAU × 5 tweet views/day = 1.5 billion views/day
1,500,000,000 / 86,400 ≈ 17,000 QPS (avg)
Peak: 17,000 × 3 ≈ 50,000 QPS

Storage Estimation

Storage = Records/day × Record Size × Retention Period
Example: Instagram Photos

100M photos uploaded/day
Average photo: 2 MB (after compression)
Retention: forever (5 years for estimation)
Storage = 100M × 2 MB × 365 × 5   = 100M × 2 MB × 1,825   = 365 PB (180 PB/year)

Bandwidth Estimation

Bandwidth = QPS × Average Response Size
Example: Video Streaming (Netflix)

10,000 concurrent streams
Average bitrate: 5 Mbps
Bandwidth = 10,000 × 5 Mbps = 50 Gbps outbound

Memory (Cache) Estimation

The 80/20 rule: typically 20% of your data drives 80% of traffic. Cache the hot 20%:

Cache Memory = Daily requests × Average response size × 0.2
Example: URL Shortener

20,000 QPS × 500 bytes × 86,400 seconds × 0.2= ~170 GB of cache
Fits comfortably in a single Redis cluster.

Server Capacity Planning

Knowing how many servers you need is a common interview follow-up:

ComponentTypical Throughput (per node)Memory
Web server (Node.js/Go)~10,000–50,000 QPS1–4 GB
PostgreSQL~5,000–10,000 QPS (reads)Varies
MySQL (InnoDB)~3,000–8,000 QPS (mixed)Varies
Redis~100,000–200,000 QPSUp to 64 GB
Kafka (per partition)~10,000–100,000 msgs/sec—
Elasticsearch~1,000–10,000 searches/sec32–64 GB
Cassandra~10,000–50,000 writes/sec16–32 GB
Servers needed = Peak QPS / Throughput per server
Example: 50,000 peak QPS for a Node.js API

Each server handles ~15,000 QPS
Need: 50,000 / 15,000 ≈ 4 servers
With 2× safety factor: 8 servers

Worked Example 1: Design a Chat System

Let''s estimate the scale for a WhatsApp-like service:

  1. DAU: 500 million
  2. Messages per user per day: 40
  3. Total messages/day: 500M × 40 = 20 billion
  4. Message QPS: 20B / 86,400 ≈ 230,000 writes/sec
  5. Average message size: 200 bytes (text)
  6. Daily storage: 20B × 200 bytes = 4 TB/day
  7. 5-year storage: 4 TB × 365 × 5 = ~7.3 PB

Architectural insight: 230K writes/sec → need a database optimized for high write throughput (Cassandra, not PostgreSQL). 7.3 PB → need storage tiering (hot SSD for recent, cold HDD/S3 for old messages).

Worked Example 2: Design a URL Shortener

  1. New URLs per day: 100 million
  2. Read:write ratio: 100:1 (URLs are created once, read many times)
  3. Write QPS: 100M / 86,400 ≈ 1,200 writes/sec
  4. Read QPS: 1,200 × 100 = 120,000 reads/sec
  5. URL record size: ~500 bytes (short URL + original URL + metadata)
  6. Daily storage: 100M × 500 bytes = 50 GB/day
  7. 10-year storage: 50 GB × 365 × 10 = ~180 TB
  8. Cache: 120K QPS × 500 bytes × 0.2 ≈ 12 GB (easily fits in Redis)

Architectural insight: 120K read QPS → heavy caching with Redis. A single Redis node handles 200K QPS, so one node might suffice. 180 TB → needs partitioned/sharded database (e.g., NoSQL or sharded PostgreSQL).

Worked Example 3: Design a Ride-Sharing App

  1. DAU: 20 million riders, 2 million drivers
  2. Rides per day: 10 million
  3. Location updates per driver: every 3 seconds → 2M × (86,400/3) = 57.6 billion location points/day
  4. Location QPS: 57.6B / 86,400 ≈ 670,000 writes/sec
  5. Location record: 100 bytes (lat, lng, timestamp, driver_id)
  6. Daily location storage: 57.6B × 100 bytes = 5.76 TB/day

Architectural insight: 670K writes/sec for ephemeral location data → use in-memory grid (Redis) for real-time tracking, only persist completed ride trajectories. Location data is write-heavy and temporary — perfect use case for a time-series database or in-memory store.

Quick Reference: Common Interview Estimates

SystemKey MetricApproximate Value
TwitterTimeline read QPS~300K–500K
InstagramPhoto storage/year~70 PB
WhatsAppMessage write QPS~230K
YouTubeVideo upload/min~500 hours
Google SearchSearch QPS~70K
NetflixBandwidth peak~100+ Gbps
UberLocation update QPS~670K

Common Mistakes

  • ❌ Spending too long on exact math — the interviewer wants order-of-magnitude, not a calculator exercise. Round aggressively.
  • ❌ Forgetting peak traffic — average QPS is meaningless without considering 3–5× spikes. Systems crash during peaks, not averages.
  • ❌ Confusing bits and bytes — network bandwidth is in bits/s, storage is in bytes. 1 Gbps = 125 MB/s.
  • ❌ Not connecting estimates to design — the whole point is: "Based on 230K writes/sec, we need Cassandra, not PostgreSQL." If your estimate doesn''t inform a decision, it''s wasted effort.
  • ❌ Ignoring the read:write ratio — this ratio determines whether you invest in read replicas/caching (read-heavy) or write-optimized databases (write-heavy).
  • ❌ Forgetting replication overhead — if you replicate data 3×, your actual storage need is 3× whatever you calculated.

[!TIP] Key Takeaways:
• Memorize: 86,400 sec/day, powers of 2, Jeff Dean latency numbers.
• QPS = DAU × actions ÷ 86,400. Multiply by 3–5× for peak.
• Storage = records × size × retention. Remember to multiply by replication factor.
• Cache the hot 20% (80/20 rule).
• Servers needed = Peak QPS ÷ throughput per server × 2 (safety factor).
• Always connect your estimate to an architectural decision — that''s the whole point.

Share this article

Share on TwitterShare on LinkedInShare on FacebookShare on WhatsAppShare on Email

Test your knowledge

Take a quick quiz based on this chapter.

easySystem Design
Quiz: Estimation
5 questions5 min

Continue Learning

17. Consistent Hashing

Intermediate
16 min

18. Bloom Filters & Probabilistic Data Structures

Intermediate
14 min

19. Database Replication

Intermediate
18 min
Lesson 2 of 2 in 4. Interview Essentials
Previous in 4. Interview Essentials
15. How to Approach Any System Design Interview
Completed
You finished this lesson → take the quiz
5 questions • 5 min
Next section: 5. Advanced Concepts & Patterns
← Back to System Design: The Complete Guide
Back to System Design: The Complete GuideAll Categories