Design a Video Streaming Platform (YouTube / Netflix)

The Interview Question

"Design a video streaming platform. Users can upload videos, which are processed and made available for streaming. Viewers can search for videos, view them at different quality levels, and the system should adapt to their network conditions."

This is one of the most infrastructure-heavy design questions. The interesting problems are in the upload pipeline, transcoding, and why adaptive bitrate streaming exists. Most candidates describe a database schema; the best candidates explain how video actually gets from a creator's camera to a viewer's screen.

Step 1: Requirements

Functional

Upload video (up to 4GB, any common format)
Process and transcode to multiple resolutions (360p, 480p, 720p, 1080p, 4K)
Stream video with quality adapting to viewer's network speed
Search by title, creator, tags
View counts, likes, comments

Non-functional

500 hours of video uploaded per minute
1 billion video views per day
Uploaded video available for streaming within 5 minutes
P99 stream start time: under 2 seconds
Global audience — viewers in every country

Step 2: Why You Never Stream From Your Origin Server

The single most important concept in video streaming. A naive design sends video directly from your servers to every viewer. This fails because:

1 hour video at 1080p = ~4GB file

1 million concurrent viewers each streaming this video:
  4GB × 1M connections = 4 petabytes of origin bandwidth
  Average connection: 4Mbps × 1M = 4 terabits/second of throughput

You cannot serve 4Tbps from a single origin.
You would need thousands of servers globally.
You would pay ~$360,000/hour in bandwidth costs.

The solution: CDN (Content Delivery Network)

Upload → Origin servers (one region)
         ↓ CDN pull/push
         CDN edge nodes (200+ locations globally)
         ↓
         Viewer streams from nearest edge node (~10ms latency)

After a video is uploaded and transcoded, the CDN fetches the video segments and caches them at every edge location. Viewers stream from the edge node nearest them — not from your origin. You pay CDN egress costs (~$0.009/GB) instead of origin egress (~$0.09/GB).

A video cached at the CDN edge serves 1 million viewers with the same cost as serving 1.

Step 3: Adaptive Bitrate Streaming (HLS / DASH)

Video is not one file. It's hundreds of 2-10 second segments, with each segment encoded at multiple quality levels.

video.mp4 (original upload)
   ↓ transcoding pipeline
   ├── 360p/  segment_000.ts, segment_001.ts, ... segment_890.ts
   ├── 480p/  segment_000.ts, segment_001.ts, ... segment_890.ts
   ├── 720p/  segment_000.ts, segment_001.ts, ...
   ├── 1080p/ segment_000.ts, segment_001.ts, ...
   └── 4K/    segment_000.ts, segment_001.ts, ...
   
   manifest.m3u8 (HLS) or manifest.mpd (DASH):
     Lists all available quality levels and their segment URLs

The player downloads the manifest first. Then, for each segment, it checks its current download speed:

Player buffer logic:
  Downloaded last segment at 8Mbps → request next segment at 1080p (needs 8Mbps)
  Download speed drops to 2Mbps   → switch to 480p  (needs 2Mbps)
  Download speed recovers to 5Mbps → switch back to 720p

No server involvement in quality switching — the player decides.
Seamless quality transitions mid-playback.

This is why video on YouTube never freezes — it degrades quality instead of buffering.

Step 4: Upload and Transcoding Pipeline

Creator uploads video:
   │
   ├─ 1. Client chunks the file (100MB chunks)
   │      Uploads each chunk → Blob Storage (S3/Azure Blob)
   │      Reassembles after all chunks confirmed
   │
   ├─ 2. Upload Service notifies Transcoding Queue (Kafka)
   │
   ├─ 3. Transcoding Workers consume from queue
   │      Spawn FFmpeg jobs for each target resolution
   │      Store output segments in Blob Storage
   │      (Parallelised: 5 resolutions × multiple workers = 5 min total for 1h video)
   │
   ├─ 4. After all resolutions complete:
   │      Generate manifest files (.m3u8 / .mpd)
   │      Write metadata to Video DB (title, duration, creator, status: READY)
   │
   └─ 5. Notify CDN to pre-fetch or wait for pull
         Video is now streamable

Why 5 minutes despite "500 hours uploaded per minute"?

Processing is parallelised. A 1-hour video is split into 2-second chunks; each chunk is transcoded independently by different workers. 360 × 5 chunks × parallel workers = fast.

Step 5: Architecture Diagram

┌──────────────────────────────────────────────────────────────────┐
│  Creator                                                          │
│  (chunked upload)                                                 │
└─────────┬────────────────────────────────────────────────────────┘
          │
┌─────────▼────────────┐    ┌───────────────────────────────────┐
│  Upload Service       │───►│  Kafka: video.uploaded            │
└──────────────────────┘    └──────────────┬────────────────────┘
                                            │
                             ┌──────────────▼──────────────────┐
                             │   Transcoding Worker Pool       │
                             │   (FFmpeg, auto-scales)         │
                             └──────────────┬──────────────────┘
                                            │
          ┌─────────────────────────────────▼─────────────────────┐
          │              Blob Storage (S3 / Azure Blob)            │
          │  /videos/{video_id}/360p/seg_*.ts                       │
          │  /videos/{video_id}/720p/seg_*.ts                       │
          │  /videos/{video_id}/manifest.m3u8                       │
          └─────────────────┬─────────────────────────────────────┘
                            │  CDN pulls segments
          ┌─────────────────▼─────────────────────────────────────┐
          │              CDN Edge Nodes (200+ locations)           │
          └─────────────────┬─────────────────────────────────────┘
                            │
          ┌─────────────────▼─────────────────────────────────────┐
          │                 Viewer (adaptive bitrate player)        │
          └─────────────────────────────────────────────────────────┘

Step 6: Database Choices

Video metadata — PostgreSQL

videos
  id              UUID
  creator_id      UUID
  title           TEXT
  description     TEXT
  duration_secs   INT
  status          ENUM (processing, ready, deleted)
  manifest_url    TEXT
  view_count      BIGINT
  created_at      TIMESTAMPTZ

Small rows, relational, ACID. PostgreSQL handles billions of rows with proper partitioning.

Search — Elasticsearch Full-text search across titles, descriptions, and tags. Video metadata synced to Elasticsearch asynchronously after upload completes.

Watch history — Cassandra

Partition key: (user_id, month)
Clustering key: watched_at DESC
Columns: video_id, watch_duration_secs, completed (boolean)

Write-heavy, time-series, no complex joins needed. Cassandra is the natural fit.

View count — Redis + periodic flush to PostgreSQL View counts are incremented millions of times per second. Writing directly to PostgreSQL on every view would crush it.

On each view: INCR viewcount:{video_id} in Redis
Every 30 seconds: background job reads Redis counters, 
                  batch UPDATE PostgreSQL videos SET view_count = ...
                  clear Redis counters

Step 7: The Recommendation System (High Level)

The recommendation feed is what keeps viewers watching. This is a separate complex system, but interviewers appreciate a high-level mention:

Two components:
  1. Candidate generation: 
     "Users like you watched these videos" 
     → Collaborative filtering (matrix factorisation, two-tower neural network)
     → Runs offline nightly, produces candidate lists per user

  2. Ranking:
     Re-rank candidates at request time using real-time signals:
     - current trending videos
     - user's recent watch history
     - user's current session context
     → Returns top 20 recommendations

The computation is too expensive to run at request time — candidates are pre-generated offline and stored in Redis per user.

What the Interviewer Is Actually Testing

Do you know that CDN is mandatory and explain why?
Can you describe adaptive bitrate streaming (HLS/DASH) and how quality switching works?
Do you describe the transcoding pipeline clearly — chunks, parallel workers, output segments?
Do you use the right databases for each data type (metadata vs search vs watch history vs counts)?
Do you handle view count without hammering the database on every view?
Do you address global availability (CDN edge nodes)?