15. How to Approach Any System Design Interview

[!IMPORTANT] System design interviews are not about memorizing architectures. They are about demonstrating structured thinking, making deliberate tradeoffs, and communicating clearly under time pressure. This chapter gives you the exact framework top candidates use. Follow it step by step, and you will never get lost in an interview again.

Why You Need a Framework

Most candidates fail system design interviews not because they lack knowledge, but because they ramble without structure. They jump into database schemas before understanding requirements, or spend 20 minutes on one component while ignoring everything else. Interviewers evaluate your process as much as your answer.

The framework below is used by engineers who have passed interviews at Google, Meta, Amazon, and Microsoft. It works for any question—whether you are designing Twitter, a parking garage, or a distributed cache.

The golden rule: A mediocre design delivered with excellent structure will always outperform a brilliant design delivered as a confused monologue. Structure is your safety net—even when you don''t know the "right" answer, the framework keeps you moving forward and demonstrating competence.

The 5-Step Framework

Step	Time	Goal	% of Signal
Clarify Requirements	~5 min	Define scope, users, features, and constraints	15%
Back-of-the-Envelope Estimation	~5 min	Quantify scale: QPS, storage, bandwidth	10%
High-Level Design	~10 min	Draw the big boxes and arrows	25%
Deep Dive	~15 min	Zoom into 2–3 critical components	40%
Bottlenecks & Improvements	~5 min	Identify weaknesses and propose fixes	10%

Step 1: Clarify Requirements (5 min)

Before drawing a single box, ask questions. This is the most important step because it defines the scope of your entire design. The interviewer is testing whether you can think like a product engineer, not just a coder.

Functional Requirements — What features must the system support?

"Should users be able to upload videos, or just watch them?"
"Do we need group chat, or only 1-on-1?"
"Should shortened URLs expire?"

Non-Functional Requirements — What quality attributes matter?

Scale: How many users? How many requests/second?
Latency: What is the acceptable response time?
Availability vs Consistency: What happens during failures? (CAP tradeoff)
Durability: Can we ever lose data?

Sample Q&A for "Design Twitter":

Your Question	Interviewer''s Answer	Impact on Design
Who are the users?	300M DAU, mostly readers	Read-heavy → cache aggressively
What features are in scope?	Post tweets, follow, read timeline	Skip DMs, search, trending for now
How long should tweets be?	280 characters max	Small records → efficient storage
Do we need real-time delivery?	Feed can be eventually consistent	We can use pre-computed fan-out
Media support?	Images yes, video out of scope	Need CDN for image serving

[!TIP] Pro move: Write your requirements on the whiteboard (or shared doc) as a bulleted list. This shows the interviewer you are organized, and you can refer back to it throughout the interview to stay on track. Explicitly separate functional and non-functional requirements.

Step 2: Back-of-the-Envelope Estimation (5 min)

Quantify the scale of the system. This determines which architectural components you need. A system serving 100 users/day doesn''t need sharding; one serving 100 million does.

Key numbers to estimate:

QPS (Queries Per Second): Daily active users × actions per user / 86,400 seconds
Peak QPS: Average QPS × 3–5× (traffic spikes during peak hours)
Storage: Records per day × record size × retention period
Bandwidth: QPS × average response size
Memory (for caching): Hot data size × 0.2 (80/20 rule)

Example: Twitter Timeline

DAU: 300M
Tweets per user per day: 2 (write), 100 reads
Write QPS: 300M × 2 / 86,400 ≈ 7,000 writes/sec
Read QPS:  300M × 100 / 86,400 ≈ 350,000 reads/sec
Peak Read QPS: 350K × 3 ≈ 1M reads/sec

Storage per tweet: ~300 bytes (text + metadata)
Daily storage: 600M tweets × 300 bytes ≈ 180 GB/day
5-year storage: 180 GB × 365 × 5 ≈ 330 TB

Insight from numbers: Read:write ratio is 50:1 → read-heavy system → invest heavily in caching and read replicas. 1M peak QPS → need multiple cache layers. 330TB over 5 years → need partitioning/sharding.

Don''t obsess over precision—the goal is to get within the right order of magnitude (is it 1,000 QPS or 1,000,000 QPS?) and connect each number to an architectural decision.

Step 3: High-Level Design (10 min)

Draw the major components and how they connect. Start with a simple flow:

Client → Load Balancer → Web Servers → App Servers → Database
                                           ↓
                                         Cache

Then progressively add components based on your requirements:

Need search? → Add Elasticsearch.
Need async processing? → Add a Message Queue (Kafka/SQS).
Need global low-latency? → Add a CDN.
Need real-time updates? → Add WebSocket servers.
Need to handle media? → Add Blob Storage (S3) + CDN.

Example: Twitter High-Level Design

                    ┌──────────────────┐
                    │     Clients       │
                    │ (Mobile/Web App)  │
                    └────────┬─────────┘
                             │
                    ┌────────▼─────────┐
                    │  Load Balancer    │
                    └────────┬─────────┘
                             │
              ┌──────────────┼──────────────┐
              │              │              │
     [Tweet Write      [Timeline Read  [User Service]
      Service]          Service]
              │              │              │
              ▼              ▼              ▼
     [Write DB       [Redis Cache     [User DB]
      (Primary)]      (Pre-computed
              │        timelines)]
              │              ↑
              └──────────────┘
              Fan-out on Write:
              Push to follower caches

Keep it simple at first. You will add complexity in Step 4. The interviewer wants to see that you can sketch the full picture before diving into details.

Step 4: Deep Dive (15 min)

This is where you differentiate yourself. The interviewer will ask you to zoom into the most interesting or challenging components. Do not wait to be asked — proactively say: "I''d like to deep-dive into the timeline fan-out strategy, since that''s the most challenging part of this system."

Common deep-dive areas:

Database schema and access patterns — What tables? What indexes? SQL or NoSQL? Partitioning strategy?
API design — What are the key endpoints? Request/response formats?
Data flow for critical operations — Walk through a single request end-to-end.
Scaling bottleneck — Where does the system break at 10x traffic? How do you fix it?
Consistency model — What happens during network partitions? What guarantees do you provide?

Example: API Design for Twitter

POST /api/v1/tweets
  Body: { "text": "Hello world", "media_ids": ["abc123"] }
  Response: { "tweet_id": "snowflake_123", "created_at": "..." }

GET /api/v1/timeline?cursor=tweet_12345&limit=20
  Response: { "tweets": [...], "next_cursor": "tweet_12325" }
  Note: cursor-based pagination (not offset) for consistency

POST /api/v1/users/{user_id}/follow
  Response: 204 No Content

For every decision, articulate the tradeoff: "I chose fan-out-on-write for the timeline because reads are 50:1 over writes, making it worth spending extra write time to pre-compute timelines. However, for celebrities with 50M followers, I''d switch to fan-out-on-read to avoid writing to 50M caches on every tweet."

Step 5: Bottlenecks & Improvements (5 min)

Step back and critique your own design. This shows maturity and operational thinking:

"Our database is a single point of failure—I''d add read replicas and automatic failover."
"The fan-out service is a bottleneck for celebrity tweets—I''d implement a hybrid approach with fan-out-on-read for users with >100K followers."
"This API isn''t rate-limited—we should add a rate limiter at the gateway."
"We don''t have monitoring—I''d add distributed tracing, p99 latency alerting, and dashboard metrics."
"Cache invalidation is tricky—I''d use TTL-based expiration with write-through for consistency."

What Interviewers Actually Evaluate

Criteria	What They Look For	Red Flags
Problem Navigation	Can you break an ambiguous problem into clear requirements?	Jumping into code immediately
Breadth	Do you cover all major components (storage, compute, network, cache)?	Only discussing the database
Depth	Can you go deep on 2–3 components with real technical detail?	Surface-level descriptions of everything
Tradeoff Articulation	Do you explain why you chose X over Y?	"I chose Redis because it''s popular"
Communication	Is your explanation clear, structured, and easy to follow?	20-minute monologue with no check-ins
Scalability Thinking	Do you proactively identify bottlenecks and propose solutions?	Never mentioning what happens at 10x scale

The Vocabulary Cheat Sheet

Use these phrases during your interview to sound structured and experienced:

Scoping: "Before we dive in, let me clarify the requirements..."
Tradeoffs: "The tradeoff here is X vs Y. Given our requirements, I''d lean toward..."
Transitions: "Now that we have the high-level design, I''d like to deep-dive into..."
Acknowledging gaps: "One thing I haven''t addressed yet is... Let me add that."
Self-critique: "A weakness in this design is... To fix it, I''d..."
Checking in: "Does this make sense so far? Would you like me to go deeper on any component?"

Common Mistakes

❌ Jumping into the solution without clarifying requirements — you might design the wrong system entirely.
❌ Over-engineering from the start — don''t propose a 50-microservice architecture for a system with 1,000 users. Start simple, scale as needed.
❌ Ignoring non-functional requirements — if you never mention latency, availability, or consistency, the interviewer will notice.
❌ Monologuing for 20 minutes — system design is a conversation, not a lecture. Check in with the interviewer every 3-5 minutes.
❌ Being afraid to say "I don''t know" — it''s far better to say "I''m not sure about the exact numbers, but I''d estimate..." than to make up facts.
❌ Forgetting monitoring and observability — production systems need logging, metrics, and alerting. Mentioning this shows operational maturity.
❌ Not drawing diagrams — always draw. A picture is worth a thousand words. Use boxes, arrows, and labels. Color-code if possible.
❌ Discussing only the happy path — what happens when a server crashes? When the network partitions? When traffic spikes 10x? Address failure scenarios.

Practice Template

Use this template for every practice session:

System: _______________Time: 40 minutes
[5 min] Requirements:  Functional: ...  Non-Functional: ...
[5 min] Estimation:  QPS: ___     Peak: ___  Storage: ___ /day, ___ /year  Bandwidth: ___
[10 min] High-Level Design:  (Draw diagram here)
[15 min] Deep Dives:

_______________ (why this component?)
_______________ (key tradeoff?)
_______________ (data model?)

[5 min] Bottlenecks & Improvements:

SPOF: ___
Scaling limit: ___
Missing: ___

[!TIP] Key Takeaways:
• Use the 5-step framework for every question — it keeps you on track and demonstrates structured thinking.
• Spend 70% of your time on Steps 3 and 4 (design and deep dive) — that is where 65% of the signal is.
• Every decision should come with a tradeoff explanation — "I chose X because Y, accepting the cost of Z."
• Treat the interview as a collaboration, not a presentation — ask questions, check in, and adapt.
• Practice with the template above for at least 10 different systems before your interview.

Why You Need a Framework

The 5-Step Framework

Step 1: Clarify Requirements (5 min)

Step 2: Back-of-the-Envelope Estimation (5 min)

Step 3: High-Level Design (10 min)

Step 4: Deep Dive (15 min)

Step 5: Bottlenecks & Improvements (5 min)

What Interviewers Actually Evaluate

The Vocabulary Cheat Sheet

Common Mistakes

Practice Template

Share this article

Test your knowledge

Continue Learning

16. Back-of-the-Envelope Estimation

17. Consistent Hashing

18. Bloom Filters & Probabilistic Data Structures