TinyURL

On-Site Round 1 - Coding (45-60 min)🔗

Question

Design TinyURL.

Constraints and discussion points:

Hashing strategy

Base62 encoding for short codes

Collision handling

Idempotent create API for retries/concurrency

Explanation

TinyURL validates your ability to design APIs, data models, code generation, and collision-safe write paths.

A strong interview answer starts with assumptions, then API/data model, then edge cases and scale.

TinyURL - [System Design]🔗

Core API

POST /api/v1/urls
Request:  { "long_url": "https://example.com/some/very/long/path" }
Response: { "short_url": "https://t.y/Ab3xQ9", "code": "Ab3xQ9" }

GET /:code
Response: 301 redirect to original long URL

Data Model

urls(
  id BIGINT PRIMARY KEY,
  code VARCHAR(8) NOT NULL,
  long_url TEXT NOT NULL,
  long_url_hash CHAR(64) NULL,
  created_at TIMESTAMP NOT NULL,
  expires_at TIMESTAMP NULL,
  user_id BIGINT NULL,
  deleted_at TIMESTAMP NULL
)

Indexes:
1. UNIQUE(code)                      -- main redirect lookup + collision protection
2. INDEX(user_id, created_at)        -- user dashboard/list queries
3. UNIQUE(user_id, long_url_hash)    -- optional idempotent create per user
4. INDEX(expires_at)                 -- optional cleanup/TTL jobs

High-Level Flow

graph TD
    C[Client] --> A[API Service]
    A --> G[Code Generator]
    G --> E[Base62 Encoder]
    A --> D[(SQL/NoSQL Store)]
    C --> R[Read Redirect]
    R --> K[(Cache)]
    K --> D
    R --> C

Collision Handling Approaches

Generate candidate code, try insert with UNIQUE(code), retry on conflict.
Salted hash + retry counter: hash(long_url + attempt).
Deterministic ID approach: store row first, encode numeric id to Base62, nearly collision-free at code layer.

Why Base62 (Napkin Math)

Base62 uses 0-9, a-z, and A-Z, so each character has 62 possibilities.

6-char code space: 62^6 = 56,800,235,584 (~56.8B)
7-char code space: 62^7 = 3,521,614,606,208 (~3.5T)
8-char code space: 62^8 = 218,340,105,584,896 (~218T)

If the system stores 500M URLs:

6 chars gives high occupancy and more retry pressure over time.
7 chars gives comfortable headroom for growth.

For interview discussion, 7-8 chars is usually a practical default (depends on expected scale and TTL/expiry policy).

Hash Collision Deep Dive

Collisions can happen in two places:

Hash collision: different long URLs produce same hash output.
Code collision: different hash/ID inputs map to the same short code (after truncation or retries).

Mitigation strategy:

Keep UNIQUE(code) as final correctness guard.
On conflict, regenerate with attempt salt and retry (hash(url + ":" + attempt)).
Keep retry bounded, then increase code length or switch generator mode.
Track collision rate metric; sudden increases usually indicate entropy bugs or saturation.

Birthday paradox intuition:

Even with a large code space, collision probability rises non-linearly as occupancy grows.
Therefore, production systems rely on unique constraints + retries, not probability alone.

Practical Notes

Use a cache for hot redirects (code -> long_url) to reduce read latency.
Add rate limiting and URL validation for abuse prevention.
Add TTL support for expiring links when needed.

Expected Senior Follow-Up: Idempotency

For POST /api/v1/urls, retries should not create duplicate short links for the same tenant and normalized long URL.

Implementation approach:

Normalize long URL deterministically.
Compute long_url_hash (e.g., SHA-256).
Enforce UNIQUE(user_id, long_url_hash).
On unique conflict, return the existing code instead of creating a new one.

Additional Complication Idea: Multi-Region Writes

Use a global allocator (or globally unique ID scheme) so regions do not generate colliding codes.
Replicate code -> long_url mappings across regions asynchronously for low-latency local reads.
Global uniqueness does not imply instant cross-region visibility, so keep read-after-write mitigation (regional stickiness or origin fallback).

Additional Complication Idea: Abuse Handling

Validate/normalize URL input and enforce domain/IP/user rate limits on create APIs.
Run safe-browsing/phishing checks and maintain allow/block lists.
Keep link status (active, under_review, blocked) and enforce status checks during redirect.

Additional Complication Idea: Analytics Pipeline

Redirect path should stay fast: emit click events asynchronously (queue/log) and return immediately.
Enrich events offline/streaming (geo/device/referrer) and filter bots before aggregation.
Store raw events with retention policy and expose product metrics from aggregated stores.

Additional Complication Idea: Hot Key Protection

Cache hot links aggressively at CDN/edge and application cache layers.
Use single-flight/request coalescing + stale-while-revalidate to prevent cache stampedes.
Add per-key throttling/circuit breakers so viral links do not overload the origin store.

Additional Complication Idea: Custom Aliases (Paid Business Tier)

Uniqueness and race conditions: treat alias as code, enforce UNIQUE(code), return 409 alias_taken on conflict.
Plan gating and quotas: only paid tenants can use custom aliases, with per-tenant limits and create/update rate limits.
Brand safety and abuse: maintain reserved keywords, run profanity/phishing checks, and allow moderation flags.
Lifecycle policy: keep aliases immutable by default and use tombstones (e.g., 60-90 days) before reuse.
Namespace model: choose global alias uniqueness or tenant-scoped uniqueness (UNIQUE(tenant_id, code)) with branded domains.

Additional Complication Idea: Encryption, Deletion + Expiration, GDPR

Encryption: encrypt long_url at rest (KMS-managed keys), TLS in transit, and restrict key access by service role.
Soft delete vs hard delete: soft delete marks inactive immediately (deleted_at) so redirects stop fast; hard delete asynchronously purges data from DB, caches, replicas, and analytics stores.
Expiration: use expires_at; expired links return a terminal response (commonly 410) and trigger cache invalidation/tombstone flow.
GDPR: maintain a deletion workflow with audit trail and SLA (for example, complete hard-delete propagation within policy window).
Reuse safety: keep minimal tombstones before code/alias reuse to prevent link hijacking after delete/expire.

Sample Interface (Python)

import random
import string

BASE62 = string.digits + string.ascii_lowercase + string.ascii_uppercase


class TinyUrlService:
    def __init__(self, repository):
        self.repository = repository

    def _random_code(self, size=6):
        return "".join(random.choice(BASE62) for _ in range(size))

    def create_short_url(self, long_url: str) -> str:
        for _ in range(5):  # bounded retry for collision handling
            code = self._random_code()
            if self.repository.insert_if_absent(code=code, long_url=long_url):
                return code
        raise RuntimeError("could not allocate short code")

    def resolve(self, code: str) -> str | None:
        return self.repository.get_long_url(code)