Skip to content
QuizMaker logoQuizMaker
Activity
System Design: The Complete Guide
6. More Case Studies
1. Introduction to System Design
2. Vertical vs Horizontal Scaling
3. Load Balancing
4. Caching Strategies
5. CDNs (Content Delivery Networks)
6. SQL vs NoSQL
7. Database Sharding & Partitioning
8. The CAP Theorem
9. Microservices Architecture
10. Message Queues & Event Streaming
12. Design BookMyShow (Ticket Booking)
14. Design Dropbox (Cloud File Storage)
15. How to Approach Any System Design Interview
16. Back-of-the-Envelope Estimation
17. Consistent Hashing
18. Bloom Filters & Probabilistic Data Structures
19. Database Replication
20. Leader Election & Consensus (Raft & Paxos)
21. Distributed Transactions (Saga, 2PC, Outbox)
22. Event Sourcing & CQRS
23. Unique ID Generation at Scale
24. Rate Limiting Algorithms
25. Circuit Breakers & Bulkhead Pattern
26. API Gateway, Proxies & Service Mesh
27. Real-Time Communication
28. Observability (Tracing, Logging, SLOs)
30. Design a Chat System (WhatsApp)
31. Design YouTube (Video Streaming)
32. Design a Web Crawler
CONTENTS

31. Design YouTube (Video Streaming)

Designing a video streaming platform: upload, transcode, store, and deliver video to billions of users.

Mar 5, 20265 views0 likes0 fires
18px

[!NOTE] YouTube serves 1 billion hours of video per day to over 2 billion users. Over 500 hours of video are uploaded every minute. Designing a video streaming system tests your understanding of blob storage, content delivery networks, transcoding pipelines, and recommendation systems. This is a complex system design that touches many areas.

Step 1: Requirements

FeatureRequirement
UploadSupport files up to 10GB, resumable uploads
TranscodingConvert to multiple resolutions (240p to 4K) and formats (H.264, VP9, AV1)
StreamingAdaptive bitrate streaming (auto-adjust quality based on bandwidth)
CDNGlobal delivery with low latency
Scale2B users, 500 hours uploaded/min, 1B hours watched/day

Step 2: High-Level Design (v1)

  [Creator] → Upload → [API Server] → [Blob Storage (S3)]
                                           │
                                    [Transcoding Pipeline]
                                           │
                                    [CDN (CloudFront)]
                                           │
  [Viewer] ← Stream ← [CDN Edge Server]

Step 3: Upload Pipeline (Detailed)

  Creator → [Upload Service]
              │
              ├─→ [Blob Storage] (store raw video)
              │
              ├─→ [Metadata DB] (title, description, tags)
              │
              └─→ [Message Queue] → [Transcoding Workers]
                                        │
                                    [Multiple outputs]
                                    ├─→ 240p  H.264  → [Blob Storage]
                                    ├─→ 480p  H.264  → [Blob Storage]
                                    ├─→ 720p  VP9    → [Blob Storage]
                                    ├─→ 1080p VP9    → [Blob Storage]
                                    └─→ 4K    AV1    → [Blob Storage]
                                        │
                                    [Thumbnail Generator]
                                        │
                                    [CDN Warm-up]

Resumable Uploads

For large files (multi-GB), network interruptions are common. YouTube uses the tus protocol for resumable uploads:

  1. Client initiates upload, gets an upload URL.
  2. Client uploads in chunks (e.g., 8MB each).
  3. If interrupted, client asks "How much did you receive?" and resumes from that byte offset.

Step 4: Transcoding

A single uploaded video is transcoded into many variants:

CodecQualityEfficiencyBrowser Support
H.264/AVCGoodBaselineUniversal
VP9Better~30% smaller than H.264Chrome, Firefox, Android
AV1Best~50% smaller than H.264Modern browsers (growing)

Transcoding is massively CPU-intensive. YouTube uses a DAG (Directed Acyclic Graph) pipeline where steps can run in parallel:

Raw Video → [Split into segments]
                ├─→ [Segment 1] → Transcode 240p, 480p, 720p, ...
                ├─→ [Segment 2] → Transcode 240p, 480p, 720p, ...
                └─→ [Segment N] → Transcode 240p, 480p, 720p, ...
                                    │
                                    └─→ [Merge segments per resolution]

By splitting into segments, hundreds of workers transcode in parallel, reducing a 1-hour video from hours of processing to minutes.

Step 5: Adaptive Bitrate Streaming

The viewer''s device and network conditions change constantly. Adaptive Bitrate Streaming (ABR) dynamically adjusts video quality:

  1. Video is encoded at multiple bitrates: 250kbps (240p) to 20Mbps (4K).
  2. Video is split into small segments (2–10 seconds each).
  3. A manifest file (HLS .m3u8 or DASH .mpd) lists all available quality levels and segment URLs.
  4. The player downloads the manifest, monitors its download speed, and requests the highest quality segment it can play smoothly.
Player bandwidth: 5 Mbps
  → Request 1080p segment (4 Mbps bitrate)

Bandwidth drops to 2 Mbps:
  → Switch to 480p segment (1.5 Mbps bitrate)

Bandwidth recovers to 8 Mbps:
  → Switch to 1080p or 1440p

Step 6: CDN Strategy

  • Popular videos: Cached at CDN edge servers worldwide. A viral video is served from 200+ edge locations.
  • Long-tail videos: Served from regional CDN nodes or origin. Not worth caching at every edge location.
  • Pre-warming: When a video from a popular creator is uploaded, proactively push it to CDN edges before viewers request it.

Real-world: YouTube uses its own CDN (Google Global Cache) with servers installed inside ISP networks. This means popular videos are served from a server inside your ISP's building, reducing latency to near-zero.

CDN Caching Strategy

YouTube CDN Tiers:

Tier 1: Google Global Cache (GGC)
  → Servers physically inside ISP data centers
  → Ultra-low latency (~1ms to user)
  → Caches the most popular videos per region

Tier 2: Google Edge PoPs (200+ worldwide)
  → Caches popular + moderately popular videos
  → ~5-20ms to user

Tier 3: Origin Data Centers
  → Stores all videos
  → Only hit for long-tail (rarely watched) content
  → ~50-200ms to user

Cache hit rate for popular videos: ~95%
Result: Most users never hit the origin server

Bandwidth Cost Estimation

YouTube scale:

1 billion hours of video watched daily
Average bitrate: 2.5 Mbps
1 hour = 3,600 sec × 2.5 Mbps = 9,000 Mb = 1.125 GB

Daily outbound bandwidth:  1 billion hours × 1.125 GB = ~1.1 exabytes/day
At $0.01/GB (CDN bulk pricing):  Daily cost: ~$11 million  Monthly cost: ~$330 million just for video delivery
This is why Google builds its own CDN infrastructureand places servers inside ISPs (GGC program).

Thumbnail Generation Pipeline

For every uploaded video, YouTube generates multiple thumbnail candidates:

  • Extract frames at regular intervals (every 2 seconds)
  • Use ML to score each frame (face detection, text readability, visual appeal)
  • Generate 3 top candidates for the creator to choose from
  • Resize to multiple dimensions (120×90, 320×180, 480×360, 1280×720)
  • Store all variants in a CDN for instant serving

Scale: ~500 hours of video uploaded per minute × ~1800 frames/hour × 4 sizes = ~3.6 million thumbnails generated per minute.

Common Mistakes

  • ❌ Transcoding synchronously during upload — transcoding takes minutes to hours. Use a message queue and background workers.
  • ❌ Serving video directly from blob storage — without a CDN, every viewer fetches from the origin, creating massive bandwidth costs and latency.
  • ❌ Single-resolution encoding — users on 3G networks cannot stream 4K. Always provide multiple quality levels with adaptive bitrate.
  • ❌ Not supporting resumable uploads — large file uploads will fail on unreliable networks without resume capability.
  • ❌ Ignoring long-tail content — 80% of views go to 20% of videos. Optimize CDN caching for popular content but ensure long-tail still works from origin.

[!TIP] Key Takeaways:
• Upload: resumable, chunked uploads (tus protocol). Store raw in blob storage (S3).
• Transcode: DAG pipeline with parallel segment processing. Multiple codecs (H.264, VP9, AV1).
• Stream: Adaptive Bitrate Streaming (HLS/DASH). Manifest + segmented video at multiple bitrates.
• Deliver: Multi-tier CDN (ISP cache → edge PoP → origin). 95% cache hit rate for popular content.
• Thumbnails: ML-scored frame extraction, multiple sizes, CDN-cached.
• YouTube uses Google Global Cache servers inside ISP networks for ultra-low-latency delivery.

Share this article

Share on TwitterShare on LinkedInShare on FacebookShare on WhatsAppShare on Email

Test your knowledge

Take a quick quiz based on this chapter.

hardSystem Design
Quiz: Design YouTube
5 questions5 min

Continue Learning

32. Design a Web Crawler

Advanced
16 min
Lesson 2 of 3 in 6. More Case Studies
Previous in 6. More Case Studies
30. Design a Chat System (WhatsApp)
Next in 6. More Case Studies
32. Design a Web Crawler
← Back to System Design: The Complete Guide
Back to System Design: The Complete GuideAll Categories