[!NOTE] YouTube serves 1 billion hours of video per day to over 2 billion users. Over 500 hours of video are uploaded every minute. Designing a video streaming system tests your understanding of blob storage, content delivery networks, transcoding pipelines, and recommendation systems. This is a complex system design that touches many areas.
Step 1: Requirements
| Feature | Requirement |
|---|---|
| Upload | Support files up to 10GB, resumable uploads |
| Transcoding | Convert to multiple resolutions (240p to 4K) and formats (H.264, VP9, AV1) |
| Streaming | Adaptive bitrate streaming (auto-adjust quality based on bandwidth) |
| CDN | Global delivery with low latency |
| Scale | 2B users, 500 hours uploaded/min, 1B hours watched/day |
Step 2: High-Level Design (v1)
[Creator] → Upload → [API Server] → [Blob Storage (S3)]
│
[Transcoding Pipeline]
│
[CDN (CloudFront)]
│
[Viewer] ← Stream ← [CDN Edge Server]
Step 3: Upload Pipeline (Detailed)
Creator → [Upload Service]
│
├─→ [Blob Storage] (store raw video)
│
├─→ [Metadata DB] (title, description, tags)
│
└─→ [Message Queue] → [Transcoding Workers]
│
[Multiple outputs]
├─→ 240p H.264 → [Blob Storage]
├─→ 480p H.264 → [Blob Storage]
├─→ 720p VP9 → [Blob Storage]
├─→ 1080p VP9 → [Blob Storage]
└─→ 4K AV1 → [Blob Storage]
│
[Thumbnail Generator]
│
[CDN Warm-up]
Resumable Uploads
For large files (multi-GB), network interruptions are common. YouTube uses the tus protocol for resumable uploads:
- Client initiates upload, gets an upload URL.
- Client uploads in chunks (e.g., 8MB each).
- If interrupted, client asks "How much did you receive?" and resumes from that byte offset.
Step 4: Transcoding
A single uploaded video is transcoded into many variants:
| Codec | Quality | Efficiency | Browser Support |
|---|---|---|---|
| H.264/AVC | Good | Baseline | Universal |
| VP9 | Better | ~30% smaller than H.264 | Chrome, Firefox, Android |
| AV1 | Best | ~50% smaller than H.264 | Modern browsers (growing) |
Transcoding is massively CPU-intensive. YouTube uses a DAG (Directed Acyclic Graph) pipeline where steps can run in parallel:
Raw Video → [Split into segments]
├─→ [Segment 1] → Transcode 240p, 480p, 720p, ...
├─→ [Segment 2] → Transcode 240p, 480p, 720p, ...
└─→ [Segment N] → Transcode 240p, 480p, 720p, ...
│
└─→ [Merge segments per resolution]
By splitting into segments, hundreds of workers transcode in parallel, reducing a 1-hour video from hours of processing to minutes.
Step 5: Adaptive Bitrate Streaming
The viewer''s device and network conditions change constantly. Adaptive Bitrate Streaming (ABR) dynamically adjusts video quality:
- Video is encoded at multiple bitrates: 250kbps (240p) to 20Mbps (4K).
- Video is split into small segments (2–10 seconds each).
- A manifest file (HLS .m3u8 or DASH .mpd) lists all available quality levels and segment URLs.
- The player downloads the manifest, monitors its download speed, and requests the highest quality segment it can play smoothly.
Player bandwidth: 5 Mbps
→ Request 1080p segment (4 Mbps bitrate)
Bandwidth drops to 2 Mbps:
→ Switch to 480p segment (1.5 Mbps bitrate)
Bandwidth recovers to 8 Mbps:
→ Switch to 1080p or 1440p
Step 6: CDN Strategy
- Popular videos: Cached at CDN edge servers worldwide. A viral video is served from 200+ edge locations.
- Long-tail videos: Served from regional CDN nodes or origin. Not worth caching at every edge location.
- Pre-warming: When a video from a popular creator is uploaded, proactively push it to CDN edges before viewers request it.
Real-world: YouTube uses its own CDN (Google Global Cache) with servers installed inside ISP networks. This means popular videos are served from a server inside your ISP's building, reducing latency to near-zero.
CDN Caching Strategy
YouTube CDN Tiers:
Tier 1: Google Global Cache (GGC)
→ Servers physically inside ISP data centers
→ Ultra-low latency (~1ms to user)
→ Caches the most popular videos per region
Tier 2: Google Edge PoPs (200+ worldwide)
→ Caches popular + moderately popular videos
→ ~5-20ms to user
Tier 3: Origin Data Centers
→ Stores all videos
→ Only hit for long-tail (rarely watched) content
→ ~50-200ms to user
Cache hit rate for popular videos: ~95%
Result: Most users never hit the origin server
Bandwidth Cost Estimation
YouTube scale:
1 billion hours of video watched daily
Average bitrate: 2.5 Mbps
1 hour = 3,600 sec × 2.5 Mbps = 9,000 Mb = 1.125 GB
Daily outbound bandwidth: 1 billion hours × 1.125 GB = ~1.1 exabytes/day
At $0.01/GB (CDN bulk pricing): Daily cost: ~$11 million Monthly cost: ~$330 million just for video delivery
This is why Google builds its own CDN infrastructureand places servers inside ISPs (GGC program).
Thumbnail Generation Pipeline
For every uploaded video, YouTube generates multiple thumbnail candidates:
- Extract frames at regular intervals (every 2 seconds)
- Use ML to score each frame (face detection, text readability, visual appeal)
- Generate 3 top candidates for the creator to choose from
- Resize to multiple dimensions (120×90, 320×180, 480×360, 1280×720)
- Store all variants in a CDN for instant serving
Scale: ~500 hours of video uploaded per minute × ~1800 frames/hour × 4 sizes = ~3.6 million thumbnails generated per minute.
Common Mistakes
- ❌ Transcoding synchronously during upload — transcoding takes minutes to hours. Use a message queue and background workers.
- ❌ Serving video directly from blob storage — without a CDN, every viewer fetches from the origin, creating massive bandwidth costs and latency.
- ❌ Single-resolution encoding — users on 3G networks cannot stream 4K. Always provide multiple quality levels with adaptive bitrate.
- ❌ Not supporting resumable uploads — large file uploads will fail on unreliable networks without resume capability.
- ❌ Ignoring long-tail content — 80% of views go to 20% of videos. Optimize CDN caching for popular content but ensure long-tail still works from origin.
[!TIP] Key Takeaways:
• Upload: resumable, chunked uploads (tus protocol). Store raw in blob storage (S3).
• Transcode: DAG pipeline with parallel segment processing. Multiple codecs (H.264, VP9, AV1).
• Stream: Adaptive Bitrate Streaming (HLS/DASH). Manifest + segmented video at multiple bitrates.
• Deliver: Multi-tier CDN (ISP cache → edge PoP → origin). 95% cache hit rate for popular content.
• Thumbnails: ML-scored frame extraction, multiple sizes, CDN-cached.
• YouTube uses Google Global Cache servers inside ISP networks for ultra-low-latency delivery.