Common system design interview questions for senior software roles
System design interview questions test your ability to architect large-scale distributed systems under constraints like scalability, fault tolerance, and latency.
For senior software engineers interviewing at FAANG companies and top startups, these questions are the single biggest factor determining your hiring level and compensation.
This guide covers the 15 most frequently asked system design interview questions, the framework to answer them, and the trade-offs interviewers expect senior candidates to discuss.
Key Takeaways
- Senior engineers face 1–2 system design rounds per interview loop, and performance here directly sets your offer level.
- The top 15 questions cover four categories: data-intensive systems, real-time systems, infrastructure components, and search/discovery.
- Interviewers evaluate your trade-off reasoning, not whether you pick the "right" architecture.
- A structured framework (clarify → estimate → design → deep-dive → scale) beats memorized diagrams every time.
- You should spend the first 5 minutes asking clarifying questions, not drawing boxes.
What Senior Engineers Are Actually Evaluated On
System design interview questions for senior engineers differ from mid-level questions in scope, depth, and ownership.
At the junior level, interviewers check whether you know a load balancer exists. At the senior level, they expect you to explain why you'd pick an L7 over an L4, what the latency impact is, and how you'd handle failover.
| Evaluation Criteria | Junior Expectation | Senior Expectation |
|---|---|---|
| Requirements gathering | Accept given constraints | Drive the conversation, identify hidden requirements |
| High-level design | Correct component diagram | Justified architecture with explained trade-offs |
| Deep dives | Basic understanding | Production-grade depth in 2–3 areas |
| Scalability | Mention caching/sharding | Quantitative capacity planning with numbers |
| Trade-offs | Acknowledge they exist | Articulate specific pros/cons and pick a side |
| Operational concerns | Rarely discussed | Monitoring, alerting, deployment, failure recovery |
If you are building your foundational knowledge before tackling these questions, the Grokking System Design Fundamentals course covers every building block — from databases and caches to load balancers and message queues — with hands-on examples.
The 15 Most Common System Design Interview Questions
These questions appear repeatedly across Google, Meta, Amazon, Netflix, Microsoft, and high-growth startups. I've grouped them by category so you can study patterns, not just individual problems.
Category 1: Data-Intensive Systems
1. Design a URL Shortener (TinyURL)
Why it's asked: This is the most popular system design interview question across all levels. For senior engineers, interviewers use it as a warm-up and then push hard on collision handling, analytics at scale, and cache eviction.
Core components: API gateway, hashing service (Base62 encoding or MD5 truncation), key-value store (DynamoDB or Redis), 301/302 redirect logic.
Senior-level deep dives the interviewer will push:
- How do you handle hash collisions at 1 billion URLs? (Answer: check-and-retry with a counter suffix, or use a pre-generated key service that allocates unique IDs from a range.)
- Read-to-write ratio is roughly 100:1. How does that shape your caching strategy? (Answer: aggressive read-through cache with Redis; TTL based on access frequency.)
- Should you use 301 (permanent) or 302 (temporary) redirects? (Answer: 302 if you need analytics on every click; 301 if you want to reduce server load and let browsers cache.)
Scale reference: Bitly processes roughly 600 million link clicks per month. Your design should handle at least this order of magnitude.
2. Design a News Feed (Facebook/Twitter Timeline)
Why it's asked: It tests your understanding of fan-out strategies, ranking algorithms, and the tension between consistency and latency.
The critical trade-off — fan-out on write vs. fan-out on read:
| Approach | How It Works | Pros | Cons |
|---|---|---|---|
| Fan-out on write (push) | Pre-compute feed for each follower when a post is created | Fast reads, O(1) feed fetch | Expensive for users with millions of followers (celebrity problem) |
| Fan-out on read (pull) | Assemble feed at read time by querying followed users' posts | No wasted writes | Slow reads, high latency at scale |
| Hybrid (what Meta uses) | Push for normal users, pull for celebrities | Balances both | More complex to implement |
Senior-level follow-up: "A user follows 500 accounts and opens the app. Walk me through the exact data path from request to rendered feed, including cache layers." You should be able to trace the request through an API gateway, feed service, ranked feed cache (Redis/Memcached), and a fallback to the posts database with a merge-sort across followed users.
3. Design a Key-Value Store (Distributed Cache)
Why it's asked: This is a pure distributed systems question. It reveals whether you understand consistent hashing, replication, conflict resolution, and the CAP theorem at a practical level.
What to cover: Consistent hashing with virtual nodes (the technique DynamoDB borrowed from Amazon's 2007 Dynamo paper), configurable quorum reads/writes (W + R > N for strong consistency), vector clocks or last-write-wins for conflict resolution, gossip protocol for failure detection.
Senior-level question: "You have a 5-node cluster and need to tolerate 2 node failures. What values of N, W, and R do you choose?" (Answer: N=5, W=3, R=3. This gives you strong consistency and survives 2 failures, but writes require 3 acknowledgments, increasing latency.)
Category 2: Real-Time Systems
4. Design a Chat Application (WhatsApp/Messenger)
Why it's asked: It tests real-time communication, message delivery guarantees, and encryption considerations.
Core architecture: WebSocket connections for real-time delivery, Kafka for durability, a chat service for routing, and Cassandra for message history (write-optimized, wide-column design).
The presence system: Tracking online/offline status for 2 billion users requires a heartbeat mechanism (30-second timeout) with Redis for state. The interviewer will ask how you handle the thundering herd when a server with 100K connections goes down.
5. Design a Notification System
Why it's asked: It covers multiple delivery channels (push, SMS, email), priority queues, rate limiting, and delivery semantics.
Key design decision: Use a priority queue with separate workers per channel. High-priority notifications (security alerts, 2FA codes) skip the rate limiter. User preferences determine active channels per notification type.
Senior-level question: "How do you guarantee no duplicate notifications?" (Answer: idempotency key in a dedup cache with TTL. Before dispatching, check the cache. Use at-least-once delivery from Kafka and dedup at the consumer.)
6. Design a Real-Time Collaborative Editor (Google Docs)
Why it's asked: A staff-level question testing conflict resolution — specifically Operational Transformation (OT) or CRDTs.
The core problem: Two users edit simultaneously. Without conflict resolution, edits overwrite each other.
| Algorithm | Central Server? | Complexity | Used By |
|---|---|---|---|
| Operational Transformation (OT) | Yes | High | Google Docs |
| CRDTs | No | Medium | Figma, Yjs |
| Last-write-wins (naive) | No | Low | Not viable for collaboration |
Category 3: Infrastructure Components
7. Design a Rate Limiter
Why it's asked: Every API needs one, and the question reveals whether you understand the algorithms, their trade-offs, and distributed coordination challenges.
Algorithm comparison:
| Algorithm | How It Works | Pros | Cons |
|---|---|---|---|
| Token bucket | Tokens added at fixed rate; each request consumes a token | Allows bursts, smooth rate limiting | Requires tuning bucket size and refill rate |
| Sliding window log | Stores timestamp of each request in a sorted set | Precise, no boundary issues | Memory-intensive at high QPS |
| Fixed window counter | Counts requests per time window | Simple, low memory | Boundary spike problem (2x burst at window edges) |
| Sliding window counter | Weighted combination of current and previous window | Good accuracy, low memory | Approximate |
Senior-level concern: In a distributed system with multiple API servers, where does the rate limit state live? (Answer: centralized Redis with Lua scripts for atomic check-and-increment. At 100K+ QPS, consider local counters with periodic sync to reduce Redis round trips.)
8. Design a Distributed Message Queue (Kafka)
Why it's asked: Message queues are foundational to event-driven architectures. This tests partitioning, consumer groups, ordering, and durability.
Key concepts: Topics and partitions, hash-based producer partitioning for per-key ordering, consumer groups for parallel processing, replication factor of 3, ISR (in-sync replicas) for durability, offset management for exactly-once semantics.
Scale reference: LinkedIn's Kafka clusters handle over 7 trillion messages per day.
9. Design a Content Delivery Network (CDN)
Why it's asked: CDNs reveal your understanding of caching hierarchies, DNS-based routing, and cache invalidation.
Architecture layers: Origin server → shield/mid-tier cache → edge PoPs. DNS or anycast routing directs users to the nearest edge. Cache invalidation uses TTLs combined with purge APIs.
Senior-level question: "A breaking news article goes viral and your origin is hammered despite the CDN. What's happening?" (Answer: cache miss stampede. Solution: request coalescing — the edge holds duplicate requests and serves them all from the single origin response.)
Category 4: Search, Discovery, and Data Processing
10. Design a Web Crawler
Why it's asked: It tests distributed task scheduling, URL deduplication at scale, and politeness policies.
Core architecture: URL frontier (priority queue with politeness constraints), DNS resolver with caching, HTML fetcher pool, content parser, URL extractor, and a deduplication store (Bloom filter — for 10 billion URLs, a Bloom filter with 1% false-positive rate uses roughly 1.2 GB).
Key constraint: Politeness. Respect robots.txt and enforce per-domain crawl delays. A multi-queue frontier with one queue per domain solves this.
11. Design a Search Autocomplete (Typeahead)
Why it's asked: It combines trie data structures, ranking signals, and latency requirements under 100ms.
Data structure: A trie (prefix tree) where each node stores the top-K completions by frequency. For 100 million unique queries, the trie fits in memory on a single machine (~10 GB). For freshness, a separate offline pipeline (MapReduce or Spark) rebuilds the trie periodically with updated frequency counts.
Senior-level question: "How do you personalize autocomplete?" (Answer: merge the global trie results with a per-user recent-queries cache stored in Redis, weighted by recency.)
12. Design a Video Streaming Platform (YouTube/Netflix)
Why it's asked: It spans upload pipelines, transcoding, adaptive bitrate streaming, and CDN delivery.
Core pipeline: Client uploads to blob storage (S3), triggering transcoding jobs via a message queue. The transcoder produces multiple resolutions and codecs (H.264, VP9, AV1). Playback uses adaptive bitrate streaming (HLS/DASH) where the player switches quality levels based on bandwidth.
13. Design a Ride-Sharing Service (Uber/Lyft)
Why it's asked: It covers geospatial indexing, real-time matching, and ETA estimation.
Geospatial indexing: Use a quadtree or geohash-based index to find nearby drivers. Uber uses a variant of Google's S2 geometry library to partition the Earth's surface into cells at multiple resolutions.
Matching: Query the spatial index for available drivers within a radius, rank by ETA (via a routing engine like OSRM), and dispatch the closest one. At Uber's scale of 28 million rides per day, matching must complete within 2–3 seconds.
14. Design an E-Commerce Platform (Amazon)
Why it's asked: It lets interviewers probe any subsystem — catalog, cart, checkout, inventory, or payments.
The inventory problem is where seniors shine: When 1,000 users try to buy the last item simultaneously, how do you prevent overselling? (Answer: optimistic locking with a version counter on the inventory row. The first successful compare-and-swap wins. For flash sales, pre-allocate inventory tokens into a Redis queue — each dequeue is atomic.)
15. Design a Metrics and Logging System
Why it's asked: Observability separates senior candidates from mid-level ones. Mentioning monitoring unprompted signals production experience.
Architecture: Agents on each host ship logs/metrics to a collector (Fluentd/Vector), writing to a time-series DB (Prometheus) for metrics and Elasticsearch for logs. Grafana provides dashboards and alerting.
Scale reference: Uber ingests over 100 billion metrics data points per day.
The 4-Step Framework for Answering Any System Design Question
Whether you are designing a chat app or a CDN, this framework keeps your answer structured and your interviewer engaged. For a deeper walkthrough with 25+ practice problems, the Grokking the System Design Interview course walks you through each step with diagrams and real examples.
Step 1: Clarify Requirements (5 minutes)
Ask functional and non-functional requirements. Functional: "Should the chat support group messages or only 1:1?" Non-functional: "What's our latency target? What consistency model do we need?"
Calculate back-of-the-envelope estimates. If the system has 100 million DAU and each user sends 40 messages/day, that's 4 billion messages/day ≈ 46K writes per second.
Step 2: High-Level Design (10 minutes)
Draw the core components: clients, API gateway, application services, databases, caches, message queues. Define the API contracts (REST or gRPC endpoints). Identify the data model — which entities exist and how they relate.
Step 3: Deep Dive (20 minutes)
Pick 2–3 components and go deep. This is where you win or lose. If designing a chat system, deep-dive into message delivery and presence. If it's a news feed, focus on fan-out strategy and ranking.
Show trade-offs explicitly: "I'm choosing Cassandra here because our write volume is 10x reads, and its LSM-tree engine handles that well. The trade-off is eventual consistency, acceptable for feeds but not payments."
Step 4: Scale and Harden (10 minutes)
Address bottlenecks at 10x traffic. Add caching, database sharding, read replicas, CDN for static assets. Discuss failure modes: what if Redis goes down? When a datacenter fails? Mention monitoring and alerting — this signals production readiness.
Sample Follow-Up Questions and Model Answers
Q: "Your database is the bottleneck. What do you do?" A: Profile queries to find hot keys. Add read replicas for read-heavy traffic. If writes are the bottleneck, shard on user_id with hash-based sharding. Migrate using dual-writes with a shadow table for validation.
Q: "How would you handle a region-level outage?" A: Active-active multi-region with async replication. DNS failover (Route 53 with 30-second TTL) redirects traffic. The trade-off is potential data loss equal to the replication lag window (typically under 1 second intra-continent).
Q: "Your cache hit ratio dropped from 95% to 60%. Diagnose this." A: Three likely causes: a new feature introduced access patterns the cache key schema doesn't cover, a deployment changed the key format (invalidating all entries), or the working set grew beyond cache capacity. Check eviction rate and memory usage to pinpoint.
Q: "A user posts a message and refreshes but doesn't see it. How do you fix this?" A: Read-after-write consistency. Route the author's own reads to the primary replica. Alternatively, set a "last write timestamp" cookie and only serve from replicas caught up past that timestamp.
Q: "How do you test this system before launch?" A: Load testing with realistic traffic (k6 or Locust). Chaos engineering (kill nodes, inject latency). Shadow traffic: replay production requests against the new system and diff outputs.
How to Prepare: A Study Plan for Senior Engineers
Weeks 1–2: Fundamentals. Study the building blocks — databases (SQL vs. NoSQL trade-offs), caching (write-through, write-behind, cache-aside), load balancing, message queues, and consistent hashing. The Ultimate System Design Interview Guide covers these fundamentals alongside real FAANG question breakdowns.
Weeks 3–4: Practice core questions. Work through the 15 questions in this guide. For each, spend 45 minutes designing on a whiteboard, then compare with reference solutions.
Weeks 5–6: Mock interviews. Practice with a partner or mock interview platform. Focus on follow-up questions — that's where senior candidates differentiate themselves.
Frequently Asked Questions
How many system design rounds are there for senior engineer interviews at FAANG?
Most FAANG companies include 1–2 system design rounds for senior (L5/E5) roles. Staff-level (L6/E6) candidates often face 2–3 rounds, sometimes with a dedicated deep dive on a single component for 60 minutes.
What is the difference between high-level design and low-level design in interviews?
High-level design covers the overall architecture: which services exist, how they communicate, and data flow. Low-level design zooms into a specific component: database schema, API contract, or internal algorithm. Senior candidates must demonstrate both.
How long should I spend on clarifying questions in a system design interview?
Aim for 3–5 minutes. Define scope, top 3 functional requirements, and scale (DAU, QPS, storage). Less than 2 minutes signals rushing. More than 7 signals stalling.
Which system design questions are asked most often at Google, Meta, and Amazon?
The most frequent: Design a URL shortener, news feed, chat/messaging system, notification system, and web crawler. Amazon also asks e-commerce problems (shopping cart, inventory, order processing).
Do I need to know specific technologies like Kafka or Redis for system design interviews?
You should know when to reach for them. "I'd use a message queue" is junior-level. "I'd use Kafka because we need ordered processing per partition key and durable replay" is senior-level.
How do I handle a system design question I've never seen before?
Apply your framework. Every system has users, an API layer, compute, storage, and communication between components. Clarify requirements, estimate scale, design the data model, build layer by layer. Novel questions test reasoning, not memorization.
Should I mention monitoring and observability in system design interviews?
Yes. Bringing up monitoring unprompted is one of the strongest signals of production experience. Mention latency p99, error rates, throughput metrics, and distributed tracing for debugging.
What's the biggest mistake senior engineers make in system design interviews?
Jumping into the solution without clarifying requirements. The second biggest: treating it as a monologue. Check in with your interviewer to show the collaboration expected at senior levels.
TL;DR
System design interview questions for senior engineers test your ability to architect scalable systems and articulate trade-offs. The 15 most common questions fall into four categories: data-intensive systems (URL shortener, news feed, key-value store), real-time systems (chat, notifications, collaborative editing), infrastructure components (rate limiter, message queue, CDN), and search/discovery systems (web crawler, autocomplete, video streaming). Use a 4-step framework: clarify requirements, sketch high-level design, deep-dive into 2–3 components, then address scaling and failure modes. What separates senior candidates is not knowing more components; it's knowing why you'd choose one over another and discussing production-level operational concerns.
GET YOUR FREE
Coding Questions Catalog

$197

$72

$78