YouTube

On-Site Round - System Design (45 min)🔗

Question

Design YouTube.

Areas to cover:

Video upload/download (chunking, resumable uploads, object storage)

Recommendation (collaborative filtering + content-based)

Search (inverted index, ranking, autocomplete)

Scale (1B+ DAU, hot videos, CDN, sharding)

Consistency vs availability for view/count metrics

Transcoding pipeline, thumbnails, notifications

Explanation

This question tests whether you can design a large media platform end-to-end: ingest, process, serve, discover, and measure at massive scale.

A strong answer usually separates the system into five planes:

Ingestion plane (upload + storage)
Processing plane (transcoding + thumbnails + metadata)
Serving plane (video delivery via CDN)
Discovery plane (search + recommendation)
Analytics plane (views, engagement, counters)

High-Level Architecture🔗

graph TD
    U[User Client] --> A[API Gateway]
    A --> UP[Upload Service]
    UP --> OBJ[(Object Storage)]
    UP --> MQ[(Event Queue)]

    MQ --> TR[Transcoding Workers]
    TR --> OBJ
    TR --> TH[Thumbnail Generator]
    TH --> OBJ
    TR --> MD[(Metadata DB)]

    U --> CDN[CDN/Edge]
    CDN --> OBJ

    A --> SRCH[Search Service]
    SRCH --> IDX[(Inverted Index)]

    A --> REC[Recommendation Service]
    REC --> FEAT[(User/Video Features)]

    A --> CNT[Counter Service]
    CNT --> TS[(Time-series/Counter Store)]

    MQ --> NOTIF[Notification Service]

Upload / Download🔗

Upload

Client requests upload session.
Upload service returns pre-signed chunk URLs.
Client uploads chunks with retry/resume support.
After final commit, service emits video_uploaded event.

Download/Playback

Player fetches manifest (HLS/DASH).
Segments served from CDN edge cache.
CDN misses pull from origin object storage.

This design minimizes origin pressure and handles hot videos well.

Transcoding Pipeline🔗

video_uploaded event triggers async transcoding.
Generate multiple resolutions/bitrates (240p..4K).
Create streaming manifests and thumbnails.
Persist metadata (status=ready) and publish notification event.

Failure handling:

Idempotent jobs keyed by video_id + profile.
Dead-letter queue for poison tasks.
Partial success allowed (serve lower resolutions if high profile fails).

Search Design🔗

Index title, tags, channel, and transcript tokens into inverted index.
Ranking combines lexical relevance + freshness + engagement priors.
Autocomplete uses prefix index + trending query boosts.

Read path must be low latency; indexing can be eventually consistent.

Recommendation Design🔗

Two-stage approach:

Candidate generation:
- collaborative filtering (similar users/videos)
- content-based signals (topic, embeddings, language)
Ranking:
- model combines watch history, retention, CTR, recency, diversity

Serving strategy:

Precompute candidate pools for active users.
Online rank top-N with fresh context.

Scale (1B+ DAU)🔗

Store immutable media in object storage; shard metadata by video_id.
Aggressive CDN for hot videos; multi-layer cache for manifests/metadata.
Partition queues and processing workers by region/video class.
Separate control-plane APIs from heavy data-plane traffic. GenAI assist: classify likely-to-trend videos early and pre-warm CDN/cache tiers before traffic spikes.

Consistency vs Availability (Views / Counts)🔗

Use split semantics:

View ingestion path: highly available append (event log).
Public counters: eventually consistent aggregates (near-real-time).
Creator analytics: corrected/anti-fraud batch numbers.

This keeps playback and event capture available while accepting slight lag in displayed counts.

Notifications🔗

Trigger async fanout when:

Channel publishes video and notification policy allows.
Video reaches ready state.
Notification service applies user preferences and rate limits.

Additional Complication Idea: Copyright Detection🔗

Compute audio/video fingerprints at upload time and compare against rights-holder reference sets.
Block, monetize, or allow with policy based on match confidence and territory/license rules.
Re-scan catalog periodically as reference databases and policies evolve. GenAI assist: use multimodal embeddings to catch transformed near-duplicates that exact fingerprinting can miss.

Additional Complication Idea: Abuse, Spam, and Safety🔗

Run ML + rules moderation on title/description/transcript/thumbnails pre and post publish.
Keep trust/risk scores per account and apply rate limits, temporary holds, or stricter review.
Maintain human-review queues for borderline/high-impact enforcement decisions. GenAI assist: use LLM/VLM classifiers to generate richer policy labels and reviewer-ready rationale across text, audio, and images.

Additional Complication Idea: Multi-Region Failover🔗

Use active-active playback with global DNS/load balancing and regional CDNs.
Keep upload sessions region-local, with cross-region replication and resumable continuation on failover.
Replicate metadata asynchronously and use regional stickiness/origin fallback for read-after-write gaps. GenAI assist: use an incident copilot to summarize telemetry and suggest failover/rollback runbook steps to operators.

Separate PII from high-volume event data; apply field-level encryption and strict access controls.
Support delete/export workflows for watch history and user data with auditable completion state.
Enforce retention windows and downstream deletion propagation to analytics, caches, and backups per policy. GenAI assist: use entity extraction to auto-detect and classify PII in user-generated content for policy routing.