On this page
The Framework: How to Approach Any System Design Question
Step 1: Clarify Requirements
Step 2: Estimate scale
Step 3: Design the High-level Architecture
Step 4: Go Deep into Components
Horizontal vs. Vertical Scaling
Load Balancers
Databases: SQL vs. NoSQL
SQL Databases (Relational)
NoSQL Databases (Non-Relational)
How to Choose
Caching
Cache Invalidation
Content Delivery Networks (CDNs)
Message Queues
Database Sharding and Replication
Replication
Sharding
Consistent Hashing
Rate Limiting and API Design
Putting It All Together
Key Takeaways
System Design Cheat Sheet for Senior Engineer Interviews


On This Page
The Framework: How to Approach Any System Design Question
Step 1: Clarify Requirements
Step 2: Estimate scale
Step 3: Design the High-level Architecture
Step 4: Go Deep into Components
Horizontal vs. Vertical Scaling
Load Balancers
Databases: SQL vs. NoSQL
SQL Databases (Relational)
NoSQL Databases (Non-Relational)
How to Choose
Caching
Cache Invalidation
Content Delivery Networks (CDNs)
Message Queues
Database Sharding and Replication
Replication
Sharding
Consistent Hashing
Rate Limiting and API Design
Putting It All Together
Key Takeaways
Most software engineering interviews have a moment that makes people freeze.
It is not the coding round. It is not the behavioral questions. It is when someone says, "Design a system that handles millions of users."
That single sentence can make even experienced developers go blank.
The problem is not that people are not smart enough.
The problem is that system design feels massive and vague.
Where do you even start?
Here is the thing.
System design interviews are not about memorizing architectures. They are about showing that you can think through problems at scale.
And the good part is that there is a set of core concepts that show up again and again.
If you understand those, you can tackle almost any question they throw at you.
This post is a cheat sheet. But not the shallow kind. It breaks down each concept so you actually understand what is happening behind the scenes.
The Framework: How to Approach Any System Design Question
Before jumping into specific concepts, you need a repeatable approach. Walking into a system design interview without a framework is like writing code without knowing the requirements.
Here is a simple four-step process that works for almost every question:
Step 1: Clarify Requirements
Ask questions.
What features does the system need?
How many users?
What is more important, speed or consistency?
Step 2: Estimate scale
Do rough math.
How many requests per second? How much data?
These numbers guide your design choices.
Step 3: Design the High-level Architecture
Sketch the major components. Clients, servers, databases, caches.
Show how they connect.
Step 4: Go Deep into Components
Pick the most critical parts and go deeper. This is where you show real understanding.
The rest of this post gives you the knowledge to fill in those steps with confidence.
Horizontal vs. Vertical Scaling
When a system starts getting more traffic, it needs to grow. There are two ways to do this.
Vertical scaling means making your existing machine more powerful. More CPU, more RAM, more storage. It is simpler, but it has a hard ceiling. There is only so big one machine can get. And if that machine goes down, everything goes down.
Horizontal scaling means adding more machines. Instead of one powerful server, you run ten smaller ones. This is harder to set up because your system has to coordinate across multiple machines. But it gives you the ability to keep growing without a limit and the ability to survive if one machine fails.
Almost every large-scale system uses horizontal scaling.
Load Balancers
Once you have multiple servers, you need something to decide which server handles each request. That is what a load balancer does. It sits between users and your servers, distributing traffic across them.
There are a few common strategies:
Round Robin sends each request to the next server in line. Simple, predictable, and works well when all servers are equally capable.
Least Connections sends the request to whichever server currently has the fewest active connections. This is smarter because it accounts for some requests taking longer than others.
IP Hashing uses the user's IP address to determine which server they go to. The same user tends to land on the same server, which helps if you store session data locally.
The key thing to understand is that load balancers solve two problems at once. They distribute work evenly, and they provide redundancy.
If one server dies, the load balancer stops sending traffic to it.
Databases: SQL vs. NoSQL
This is one of the most common topics in system design interviews. You will almost always need to choose a database, and the interviewer wants to know why.
SQL Databases (Relational)
SQL databases store data in tables with rows and columns. They enforce a strict structure called a schema, which means every row in a table has the same columns.
SQL databases are great when your data has clear relationships. They support ACID transactions, which is a way of saying they guarantee your data stays consistent even when multiple operations happen at the same time. If you are building something where accuracy matters a lot, SQL is usually the safe bet.
Popular choices: PostgreSQL, MySQL.
NoSQL Databases (Non-Relational)
NoSQL databases are more flexible. They do not require a fixed schema. You can store different kinds of data in the same collection without everything following the same structure.
There are several types. Document stores (like MongoDB) save data as JSON-like objects. Key-value stores (like Redis) are extremely fast and map a key to a value. Wide-column stores (like Cassandra) are designed for massive amounts of data spread across many machines.
NoSQL databases scale horizontally more easily.
The trade-off is that they often sacrifice some consistency guarantees to achieve that scale.
How to Choose
If your data is structured and relationships matter, lean toward SQL.
If you need flexibility, speed, or horizontal scalability, lean toward NoSQL.
In an interview, always explain your reasoning.
Learn more about SQL vs NoSQL.
Caching
A cache is a fast, temporary storage layer that sits between your application and your database. Its whole purpose is to avoid doing the same expensive work twice.
Here is what happens without caching. Every time a user requests the same data, your server goes to the database and runs the query.
If a thousand users request the same thing, that is a thousand identical database queries. Wasteful.
With a cache, the first request goes to the database. The result gets stored in the cache.
The next 999 requests get their answer from the cache, which is dramatically faster because it stores data in memory.
Redis and Memcached are two popular caching tools. They store data in memory, making reads extremely fast.
Cache Invalidation
The hardest part about caching is knowing when your cached data is outdated. This is called cache invalidation.
Write-through cache updates the cache every time the database is updated. Data is always fresh, but writes are slower.
Write-back cache writes to the cache first and updates the database later. Faster writes, but you risk losing data if the cache crashes.
TTL (Time to Live) gives each cached item an expiration time. After that time passes, the item is removed and the next request fetches fresh data.
In interviews, mentioning cache invalidation shows you understand caching is not a free lunch.
Content Delivery Networks (CDNs)
A CDN is a network of servers spread across different geographic locations.
When a user requests a static file (an image, a video, a CSS file), the CDN serves it from the server closest to that user.
Why does this matter? Because physical distance affects speed.
If your server is in New York and your user is in Tokyo, every request has to travel across the Pacific Ocean and back.
A CDN puts a copy of your static content near the user, making the response much faster.
In an interview, if the system involves serving media or static assets to a global audience, mentioning a CDN shows you are thinking about latency and user experience.
Message Queues
Not everything in a system needs to happen immediately. That is where message queues come in.
A message queue sits between two parts of your system.
One part (the producer) adds tasks to the queue.
Another part (the consumer) picks up tasks and processes them.
Why is this useful?
It decouples your components.
The producer does not need to wait for the consumer to finish. It drops the task in the queue and moves on. This makes your system faster for the user and more resilient.
If the consumer goes down temporarily, the tasks pile up in the queue and get processed when it comes back.
Popular tools: RabbitMQ, Apache Kafka, Amazon SQS.
In system design interviews, message queues are the answer whenever you need asynchronous processing, like sending emails, generating reports, or processing uploads.
Database Sharding and Replication
When a single database cannot handle the load, you have two main strategies.
Replication
Replication means creating copies of your database. You have one primary database that handles writes and one or more replicas that handle reads.
Since most applications read data far more often than they write, this takes a huge load off your primary database.
The trade-off is replication lag, a small delay between when data is written to the primary and when it appears on the replicas.
Sharding
Sharding means splitting your data across multiple databases. Each shard holds a portion of the total data.
This allows your system to handle far more data because no single database carries the full load. But sharding adds complexity. Cross-shard queries become difficult.
And picking the right shard key (the rule that decides which data goes where) is critical.
A bad shard key leads to uneven distribution.
In interviews, bring up sharding when the data volume is clearly too large for one database. But also mention the downsides.
Consistent Hashing
When you distribute data across multiple servers, you need a way to decide which server holds what.
Simple hashing (key modulo number of servers) works until you add or remove a server. When that happens, almost all the data needs to be remapped.
Consistent hashing solves this. It arranges servers on a virtual ring.
When you need to find where data lives, you hash the key and walk clockwise around the ring until you hit a server.
When you add or remove a server, only a small fraction of the data needs to move, not everything. This concept shows up frequently in interview questions about distributed systems.
Rate Limiting and API Design
When you design a system that exposes an API, you need to protect it from being overwhelmed.
Rate limiting controls how many requests a user or client can make within a given time window.
Without rate limiting, a single user or a malicious bot could flood your system with requests and take it down for everyone. The most common algorithm is the token bucket. It gives each user a bucket that fills with tokens at a steady rate. Each request costs one token. When the bucket is empty, requests are rejected until more tokens accumulate.
Mentioning rate limiting in an interview signals that you think about system reliability and abuse prevention.
Learn about the most crucial aspects of system design interview.
Putting It All Together
System design is not about knowing every tool. It is about understanding what problems exist and which tools solve them.
When you walk into an interview, you are not expected to build a perfect system on a whiteboard. You are expected to think clearly, communicate your reasoning, and show that you understand trade-offs.
Every concept in this post connects to a real problem:
-
Too much traffic for one server? Horizontal scaling and load balancers.
-
Slow database reads? Add a cache.
-
Database overloaded? Replication for reads, sharding for data volume.
-
Global users experience lag? Use a CDN.
-
Tasks that do not need to be instant? Message queue.
-
Distributing data across servers? Consistent hashing.
-
System getting abused? Rate limiting.
For a complete prep guide, check out Grokking the System Design Interview course by DesignGurus.io.
Key Takeaways
-
Always start with a framework. Clarify requirements, estimate scale, sketch the architecture, then deep dive.
-
Horizontal scaling is the default for large-scale systems. Vertical scaling has a hard ceiling.
-
Choose your database intentionally. SQL for structured, relational data. NoSQL for flexibility and scale.
-
Caching speeds things up but introduces cache invalidation challenges. Know the strategies.
-
Message queues decouple components and enable asynchronous processing for non-urgent tasks.
-
Sharding and replication are how databases scale, but they come with complexity.
-
Trade-offs matter more than perfect answers. Interviewers want to see critical thinking about design decisions.
What our users say
Brandon Lyons
The famous "grokking the system design interview course" on http://designgurus.io is amazing. I used this for my MSFT interviews and I was told I nailed it.
Arijeet
Just completed the “Grokking the system design interview”. It's amazing and super informative. Have come across very few courses that are as good as this!
Eric
I've completed my first pass of "grokking the System Design Interview" and I can say this was an excellent use of money and time. I've grown as a developer and now know the secrets of how to build these really giant internet systems.
Designgurus on Substack
Deep dives, systems design teardowns, and interview tactics delivered daily.
Access to 50+ courses
New content added monthly
Certificate of completion
$29.08
/month
Billed Annually
Recommended Course

Grokking the Advanced System Design Interview
38,626+ students
4.1
Grokking the System Design Interview. This course covers the most important system design questions for building distributed and scalable systems.
View Course