Designing a URL shortening service with high throughput

Designing a URL shortener is the most frequently asked system design interview question at FAANG companies, and for good reason—it compresses nearly every foundational concept into a single problem: key generation algorithms, database selection, caching, read-heavy optimization, sharding, and collision handling. The system takes a long URL as input, generates a unique short alias (like tiny.url/u7yX9), and redirects users to the original URL when they access the short link. At production scale—100M daily active users generating 1M new URLs per day with a 100:1 read-to-write ratio—the URL shortener becomes a genuinely challenging distributed systems problem that tests your ability to make and defend architectural trade-offs under time pressure.

Key Takeaways

  • The URL shortener is a read-heavy system with a 100:1 read-to-write ratio. This asymmetry drives every major design decision: database selection, caching strategy, and scaling approach.
  • Key generation is the core technical challenge. Base62 encoding of auto-incrementing IDs is simple but predictable. MD5/SHA-256 hashing avoids predictability but requires collision handling. Pre-generated key ranges (like Twitter's Snowflake approach) eliminate collisions but add coordination complexity.
  • A NoSQL key-value store (DynamoDB, Cassandra) is the standard database choice. The data model is a simple short_code → long_url mapping with no relationships, making relational features unnecessary overhead.
  • A Redis cache with 90–95% hit ratio is essential. With 100M redirects per day, the cache absorbs 90–95M reads, reducing database load to 5–10M—a 10–20x reduction.
  • Interviewers evaluate trade-off reasoning, not a perfect answer. Why you chose 302 over 301, why NoSQL over SQL, and why Base62 over MD5 matters more than the choices themselves.

Step 1: Requirements and Scope

Functional requirements:

Shorten: Given a long URL, generate a unique, compact short alias using alphanumeric characters (a-z, A-Z, 0-9). Redirect: Given a short URL, redirect the user to the original long URL with minimal latency. Custom aliases: Optionally allow users to specify a custom short code (e.g., tiny.url/my-resume). Expiration: URLs expire after a configurable TTL (default: 5 years). Analytics (optional): Track click counts per shortened URL.

Non-functional requirements:

Low latency: Redirections complete in under 20ms (p99 under 200ms). High availability: 99.99% uptime—if the service is down, all shortened URLs are broken. Scalability: Handle 100M DAU and 1B+ stored URLs. Uniqueness: Short codes must be collision-free. Unpredictability: Short codes should not be easily guessable (security consideration).

Interview tip: Always clarify these requirements with your interviewer before designing. Asking "Is analytics a requirement?" or "Should short codes be unpredictable?" demonstrates structured thinking. Skipping requirements is the fastest way to design the wrong system.

Step 2: Back-of-Envelope Estimation

This step justifies every subsequent architectural decision.

Write traffic: 1M new URLs per day = ~12 writes per second (average). At 3x peak: ~36 writes per second. This is trivially low—almost any database can handle it.

Read traffic: 100:1 read-to-write ratio = 100M redirects per day = ~1,160 reads per second (average). At 3x peak: ~3,500 reads per second. This is moderate but becomes challenging with database latency requirements.

Storage: Each URL record is approximately 500 bytes (7-byte short code + 100-byte long URL + metadata). 1M new URLs/day × 365 days × 5 years = 1.825B URLs. 1.825B × 500 bytes = ~912 GB ≈ 1 TB total storage. Manageable on a single database node, but sharding adds fault tolerance.

Short code space: Using 7 characters from a Base62 alphabet (a-z, A-Z, 0-9): 62^7 = 3.5 trillion unique codes. At 1M new URLs per day, this lasts over 9,000 years—no exhaustion risk.

Bandwidth: 100M redirects × 500 bytes per redirect response ≈ 50 GB/day outbound = ~580 KB/s. Negligible.

Interview application: "Based on the estimation, our system is read-dominated at 100:1 ratio. Write throughput is trivially low at 36 QPS peak. The design challenge is optimizing read latency and availability, not write throughput. I would focus the architecture on caching and read replicas."

Step 3: API Design

EndpointMethodInputOutputNotes
/api/urlsPOST{ long_url, custom_alias?, expires_at? }{ short_url }Creates a shortened URL
/{short_code}GETShort code in URL path302 redirect to long URLThe core redirect operation
/api/urls/{short_code}DELETEShort code + API key{ success: true }Deletes a shortened URL

301 vs 302 redirect: This is a classic interview trade-off question.

301 (Permanent Redirect): The browser caches the mapping. Subsequent requests go directly to the long URL without hitting your server. Reduces server load but loses analytics visibility—you cannot track repeat clicks.

302 (Temporary Redirect): The browser always hits your server first. Every click is tracked. Higher server load but full analytics and the ability to update or expire links.

Decision: Use 302 if analytics or link expiration is a requirement (typical case). Use 301 only if minimizing server load is the absolute priority. "I would use 302 because our requirements include analytics and link expiration. If a link expires, a 301-cached browser would never learn about the expiration."

Step 4: Key Generation — The Core Deep Dive

Key generation is where interviewers spend the most time.

Three approaches exist, each with distinct trade-offs.

Approach 1: Base62 Encoding of Auto-Incrementing IDs

Generate a unique integer ID (auto-increment counter or distributed ID generator like Twitter Snowflake), then encode it in Base62 (a-z, A-Z, 0-9) to produce a compact string.

Example: ID 125,462,371 → Base62 → u7yX9

Pros: Zero collisions (IDs are unique by definition). Simple implementation. Short codes are compact.

Cons: Predictable—an attacker can enumerate all URLs by incrementing the ID. Centralized counter becomes a bottleneck unless distributed (Snowflake adds complexity).

Mitigation: Use a distributed ID generator (Snowflake) that combines timestamp + worker_id + sequence to generate unique IDs across multiple servers without coordination. Apply a shuffle or XOR operation before Base62 encoding to add unpredictability.

Approach 2: Hashing (MD5/SHA-256) with Truncation

Hash the long URL using MD5 or SHA-256, then take the first 7 characters (after Base62 encoding) as the short code.

Pros: Unpredictable. No centralized counter. Same long URL always produces the same hash (deduplication).

Cons: Collisions are possible—two different long URLs could produce the same 7-character prefix. Requires a collision resolution strategy (append a sequence number, rehash with salt, or check-and-retry).

Mitigation: On collision, append an incrementing counter to the input and rehash: hash(long_url + "1"), hash(long_url + "2"), until a unique code is found. In practice, with 3.5 trillion possible codes and 1.8B stored URLs, collision probability is extremely low.

Approach 3: Pre-Generated Key Ranges (Key Generation Service)

A dedicated Key Generation Service (KGS) pre-generates millions of unique short codes and stores them in a database. When a URL needs to be shortened, the application server requests a batch of unused keys from the KGS.

Pros: Zero collisions (keys are pre-checked for uniqueness). No computational overhead at request time. High throughput—key allocation is a simple database read.

Cons: Adds a service dependency (KGS). Requires coordination to ensure no two servers receive the same key batch. Keys are wasted if allocated but never used.

Interview recommendation: "I would use a pre-generated key range approach with batch allocation. Each application server requests a batch of 1,000 keys from the KGS on startup and allocates locally. When the batch is exhausted, it requests a new batch. This eliminates collisions, removes hash computation from the write path, and scales horizontally."

ApproachCollisionsPredictabilityThroughputComplexity
Base62 + Auto-IncrementNoneHigh (enumerable)High (local counter)Low
MD5/SHA-256 TruncationPossible (rare)LowMedium (hash computation)Medium
Pre-Generated Key RangesNoneLowVery high (batch allocation)Medium

Step 5: Database Selection

The data model is a simple key-value mapping: short_code → { long_url, creation_date, expiration_date, user_id }.

No joins, no complex queries, no relationships.

Decision: NoSQL key-value store (DynamoDB or Cassandra)

NoSQL fits because the access pattern is exclusively point lookups by primary key (short_code). There are no relationships between records. The system needs horizontal scalability to handle 1B+ records. DynamoDB provides single-digit millisecond latency for point reads and automatic sharding.

"I would use DynamoDB with short_code as the partition key. Point reads on the partition key return in single-digit milliseconds. DynamoDB automatically distributes data across partitions as the table grows. For a SQL alternative, PostgreSQL with a B-tree index on short_code would work at our scale but provides relational features we do not need."

Step 6: Caching Strategy

With a 100:1 read-to-write ratio, caching is the single most impactful optimization.

Cache layer: Redis cluster with 3–4 nodes, storing the most frequently accessed URL mappings.

Cache strategy: Read-through caching. On a redirect request, check Redis first. If the short_code exists in cache (cache hit), return the long URL immediately. If it does not exist (cache miss), query DynamoDB, return the result, and write it to Redis with a TTL matching the URL's expiration date.

Expected hit ratio: The 80/20 rule applies—20% of shortened URLs generate 80% of traffic. With a Redis cache sized to hold 20% of active URLs, the cache hit ratio reaches 90–95%. At 100M daily redirects, 90–95M are served from Redis at sub-millisecond latency. Only 5–10M hit DynamoDB.

Cache invalidation: URLs are immutable after creation (the mapping never changes), so cache invalidation is not a concern. The only event that invalidates a cached entry is URL expiration, handled by TTL.

"I would deploy a Redis cluster with 3 nodes, each holding approximately 10M hot URL mappings. At an estimated 95% hit ratio, Redis handles 95M of our 100M daily redirects at sub-millisecond latency. DynamoDB handles only 5M lookups per day—approximately 58 QPS, well within a single table's capacity."

Step 7: Scaling for High Throughput

Horizontal scaling of application servers: Run 5–10 stateless application servers behind a load balancer. Each server handles redirect requests independently. Auto-scale based on request rate.

Database sharding: Shard by short_code hash across 3–5 DynamoDB partitions (DynamoDB handles this automatically). Each partition stores approximately 600M URLs at 1.8B total. Sharding provides both throughput scaling and fault isolation.

Multi-region deployment: For a global user base, deploy read replicas in 2–3 regions (us-east, eu-west, ap-southeast). DynamoDB Global Tables replicate data across regions with single-digit millisecond lag. Route users to the nearest region via latency-based DNS (Route 53).

CDN for static redirect responses: For extremely popular shortened URLs (viral links), a CDN like CloudFront can cache the 302 redirect response at edge locations, reducing origin server load to near zero for hot URLs.

For structured practice on this problem and 17 other real-world system design case studies, Grokking the System Design Interview walks through the URL shortener with the same framework used in this article.

For advanced scaling patterns including multi-region active-active deployment and distributed ID generation at scale, Grokking the Advanced System Design Interview covers the depth that L6+ candidates need. The system design interview guide provides the broader 14-step framework applicable to any design problem.

Common Interview Follow-Up Questions

"How do you handle URL expiration without hurting redirect performance?" Lazy cleanup: check expiration only when a user clicks a link. If expired, return a 404 and delete the record. Run a background sweeper daily to clean up expired URLs that are never accessed. Do not scan the entire database on every request—that kills performance.

"How do you support custom aliases?" Treat custom alias creation as a conditional write: check uniqueness with a put-if-absent operation in DynamoDB. If the alias already exists, return an error. Validate alias format (alphanumeric, no reserved words, length 3–30 characters).

"How do you prevent abuse?" Rate limit URL creation by API key: 100 URLs per hour per key. Rate limit redirects by IP: 1,000 per minute per IP. Validate input URLs for valid protocols (HTTP, HTTPS only). Block URLs pointing to your own domain to prevent redirect loops.

"What happens if your key generation service goes down?" Each application server holds a pre-allocated batch of 1,000 keys locally. If the KGS is unavailable, servers continue operating until their local batch is exhausted—providing approximately 15–60 minutes of runway at normal traffic. This graceful degradation ensures the system continues serving during KGS outages.

"How would you add analytics without increasing redirect latency?" Fire analytics events asynchronously. On each redirect, write a click event to a Kafka topic. A separate analytics consumer processes events in batch, incrementing counters in a dedicated analytics database. The redirect path never waits for analytics writes.

Frequently Asked Questions

Why is the URL shortener the most common system design interview question?

It compresses nearly every foundational concept into a single problem: key generation, database selection, caching, read-heavy optimization, sharding, hash collision handling, and API design. It is simple enough to scope in 45 minutes but deep enough to differentiate L5 from L6 candidates based on the depth of their trade-off discussion.

Should I use Base62 or MD5 for key generation?

Base62 encoding of auto-incrementing IDs is simpler and guarantees zero collisions but produces predictable codes. MD5 hashing with truncation is unpredictable but requires collision handling. Pre-generated key ranges combine the best of both: zero collisions and unpredictability. Choose based on whether unpredictability is a requirement.

Why NoSQL over SQL for a URL shortener?

The data model is a simple key-value mapping with no relationships. Access patterns are exclusively point lookups by short_code. NoSQL key-value stores (DynamoDB, Cassandra) provide single-digit millisecond point reads, automatic sharding, and horizontal scalability without the overhead of relational features you do not need.

What cache hit ratio should I target?

90–95% for a URL shortener. The 80/20 rule applies: 20% of URLs generate 80% of traffic. Caching the hot 20% in Redis achieves 90%+ hit ratio, reducing database load by 10–20x.

Should I use 301 or 302 redirects?

Use 302 (temporary) if you need analytics, link expiration, or the ability to update mappings. Use 301 (permanent) only if minimizing server load is the absolute priority and you do not need click tracking. Most production URL shorteners use 302.

How do I handle hash collisions?

If using MD5/SHA-256 truncation, append an incrementing counter to the input and rehash until a unique code is found. With 3.5 trillion possible 7-character Base62 codes and under 2B stored URLs, collision probability is extremely low. Alternatively, use pre-generated keys to eliminate collisions entirely.

How much storage does a URL shortener need?

At 1M new URLs per day for 5 years: 1.825B URLs × 500 bytes each ≈ 1 TB. This fits on a single DynamoDB table but should be sharded for fault tolerance and throughput distribution.

How do I scale reads for a global URL shortener?

Layer three optimizations: Redis cache (95% hit ratio), DynamoDB Global Tables (multi-region replication), and CDN edge caching for viral URLs. Together, these ensure sub-20ms redirect latency worldwide with the origin database handling under 5% of total read traffic.

What is the biggest mistake candidates make on this question?

Spending too long on key generation and running out of time for caching, scaling, and trade-offs. Budget 5 minutes for requirements, 3 minutes for estimation, 10 minutes for high-level design, 15 minutes for deep dive (key generation + caching), and 10 minutes for scaling and trade-offs.

How do I prevent the URL shortener from being used for phishing?

Validate input URLs against a blocklist of known malicious domains. Scan submitted URLs with a URL reputation service (Google Safe Browsing API). Rate limit URL creation per API key. Display an interstitial warning page before redirecting to unknown domains.

TL;DR

A URL shortener is a read-heavy system (100:1 ratio) that generates unique short codes, stores short_code → long_url mappings, and redirects users with sub-20ms latency. At 100M DAU: ~12 writes/second and ~1,160 reads/second (3,500 peak). Use a pre-generated key range service for collision-free, unpredictable short codes. Store mappings in DynamoDB (simple key-value, no relationships, automatic sharding). Cache hot URLs in Redis with 95% hit ratio, reducing database load 20x. Use 302 redirects to maintain analytics visibility and link expiration support. Scale with multi-region DynamoDB Global Tables, CDN edge caching for viral links, and auto-scaling stateless application servers. The interview is won on trade-off reasoning—why 302 over 301, why NoSQL over SQL, why pre-generated keys over MD5—not on a perfect architecture.

TAGS
System Design Interview
System Design Fundamentals
CONTRIBUTOR
Design Gurus Team
-

GET YOUR FREE

Coding Questions Catalog

Design Gurus Newsletter - Latest from our Blog
Boost your coding skills with our essential coding questions catalog.
Take a step towards a better tech career now!
Explore Answers
What is Twilio used for?
Is system design important for Java developers?
How do you solve system design problems?
Which certificate course is best for software engineering?
What are the 4 pillars of Dell?
How do you implement event-driven architecture in microservices?
Related Courses
Course image
Grokking the Coding Interview: Patterns for Coding Questions
Grokking the Coding Interview Patterns in Java, Python, JS, C++, C#, and Go. The most comprehensive course with 476 Lessons.
4.6
Discounted price for Your Region

$197

Course image
Grokking Modern AI Fundamentals
Master the fundamentals of AI today to lead the tech revolution of tomorrow.
3.9
Discounted price for Your Region

$72

Course image
Grokking Data Structures & Algorithms for Coding Interviews
Unlock Coding Interview Success: Dive Deep into Data Structures and Algorithms.
4
Discounted price for Your Region

$78

Image
One-Stop Portal For Tech Interviews.
Copyright © 2026 Design Gurus, LLC. All rights reserved.