TinyURL
On-Site Round 1 - Coding (45-60 min)🔗
Question
Design TinyURL.
Constraints and discussion points:
- Hashing strategy
- Base62 encoding for short codes
- Collision handling
- Idempotent create API for retries/concurrency
Explanation
TinyURL validates your ability to design APIs, data models, code generation, and collision-safe write paths.
A strong interview answer starts with assumptions, then API/data model, then edge cases and scale.
TinyURL - [System Design]🔗
Core API
POST /api/v1/urls
Request: { "long_url": "https://example.com/some/very/long/path" }
Response: { "short_url": "https://t.y/Ab3xQ9", "code": "Ab3xQ9" }
GET /:code
Response: 301 redirect to original long URL
Data Model
urls(
id BIGINT PRIMARY KEY,
code VARCHAR(8) NOT NULL,
long_url TEXT NOT NULL,
long_url_hash CHAR(64) NULL,
created_at TIMESTAMP NOT NULL,
expires_at TIMESTAMP NULL,
user_id BIGINT NULL,
deleted_at TIMESTAMP NULL
)
Indexes:
1. UNIQUE(code) -- main redirect lookup + collision protection
2. INDEX(user_id, created_at) -- user dashboard/list queries
3. UNIQUE(user_id, long_url_hash) -- optional idempotent create per user
4. INDEX(expires_at) -- optional cleanup/TTL jobs
High-Level Flow
graph TD
C[Client] --> A[API Service]
A --> G[Code Generator]
G --> E[Base62 Encoder]
A --> D[(SQL/NoSQL Store)]
C --> R[Read Redirect]
R --> K[(Cache)]
K --> D
R --> C
Collision Handling Approaches
- Generate candidate code, try insert with
UNIQUE(code), retry on conflict. - Salted hash + retry counter:
hash(long_url + attempt). - Deterministic ID approach: store row first, encode numeric
idto Base62, nearly collision-free at code layer.
Why Base62 (Napkin Math)
Base62 uses 0-9, a-z, and A-Z, so each character has 62 possibilities.
- 6-char code space:
62^6 = 56,800,235,584(~56.8B) - 7-char code space:
62^7 = 3,521,614,606,208(~3.5T) - 8-char code space:
62^8 = 218,340,105,584,896(~218T)
If the system stores 500M URLs:
- 6 chars gives high occupancy and more retry pressure over time.
- 7 chars gives comfortable headroom for growth.
For interview discussion, 7-8 chars is usually a practical default (depends on expected scale and TTL/expiry policy).
Hash Collision Deep Dive
Collisions can happen in two places:
- Hash collision: different long URLs produce same hash output.
- Code collision: different hash/ID inputs map to the same short code (after truncation or retries).
Mitigation strategy:
- Keep
UNIQUE(code)as final correctness guard. - On conflict, regenerate with
attemptsalt and retry (hash(url + ":" + attempt)). - Keep retry bounded, then increase code length or switch generator mode.
- Track collision rate metric; sudden increases usually indicate entropy bugs or saturation.
Birthday paradox intuition:
- Even with a large code space, collision probability rises non-linearly as occupancy grows.
- Therefore, production systems rely on unique constraints + retries, not probability alone.
Practical Notes
- Use a cache for hot redirects (
code -> long_url) to reduce read latency. - Add rate limiting and URL validation for abuse prevention.
- Add TTL support for expiring links when needed.
Expected Senior Follow-Up: Idempotency
For POST /api/v1/urls, retries should not create duplicate short links for the same tenant and normalized long URL.
Implementation approach:
- Normalize long URL deterministically.
- Compute
long_url_hash(e.g., SHA-256). - Enforce
UNIQUE(user_id, long_url_hash). - On unique conflict, return the existing code instead of creating a new one.
Additional Complication Idea: Multi-Region Writes
- Use a global allocator (or globally unique ID scheme) so regions do not generate colliding codes.
- Replicate
code -> long_urlmappings across regions asynchronously for low-latency local reads. - Global uniqueness does not imply instant cross-region visibility, so keep read-after-write mitigation (regional stickiness or origin fallback).
Additional Complication Idea: Abuse Handling
- Validate/normalize URL input and enforce domain/IP/user rate limits on create APIs.
- Run safe-browsing/phishing checks and maintain allow/block lists.
- Keep link status (
active,under_review,blocked) and enforce status checks during redirect.
Additional Complication Idea: Analytics Pipeline
- Redirect path should stay fast: emit click events asynchronously (queue/log) and return immediately.
- Enrich events offline/streaming (geo/device/referrer) and filter bots before aggregation.
- Store raw events with retention policy and expose product metrics from aggregated stores.
Additional Complication Idea: Hot Key Protection
- Cache hot links aggressively at CDN/edge and application cache layers.
- Use single-flight/request coalescing + stale-while-revalidate to prevent cache stampedes.
- Add per-key throttling/circuit breakers so viral links do not overload the origin store.
Additional Complication Idea: Custom Aliases (Paid Business Tier)
- Uniqueness and race conditions: treat alias as
code, enforceUNIQUE(code), return409 alias_takenon conflict. - Plan gating and quotas: only paid tenants can use custom aliases, with per-tenant limits and create/update rate limits.
- Brand safety and abuse: maintain reserved keywords, run profanity/phishing checks, and allow moderation flags.
- Lifecycle policy: keep aliases immutable by default and use tombstones (e.g., 60-90 days) before reuse.
- Namespace model: choose global alias uniqueness or tenant-scoped uniqueness (
UNIQUE(tenant_id, code)) with branded domains.
Additional Complication Idea: Encryption, Deletion + Expiration, GDPR
- Encryption: encrypt
long_urlat rest (KMS-managed keys), TLS in transit, and restrict key access by service role. - Soft delete vs hard delete: soft delete marks inactive immediately (
deleted_at) so redirects stop fast; hard delete asynchronously purges data from DB, caches, replicas, and analytics stores. - Expiration: use
expires_at; expired links return a terminal response (commonly410) and trigger cache invalidation/tombstone flow. - GDPR: maintain a deletion workflow with audit trail and SLA (for example, complete hard-delete propagation within policy window).
- Reuse safety: keep minimal tombstones before code/alias reuse to prevent link hijacking after delete/expire.
Sample Interface (Python)
import random
import string
BASE62 = string.digits + string.ascii_lowercase + string.ascii_uppercase
class TinyUrlService:
def __init__(self, repository):
self.repository = repository
def _random_code(self, size=6):
return "".join(random.choice(BASE62) for _ in range(size))
def create_short_url(self, long_url: str) -> str:
for _ in range(5): # bounded retry for collision handling
code = self._random_code()
if self.repository.insert_if_absent(code=code, long_url=long_url):
return code
raise RuntimeError("could not allocate short code")
def resolve(self, code: str) -> str | None:
return self.repository.get_long_url(code)