On this page

The Framework: How to Approach Any System Design Question

Step 1: Clarify Requirements

Step 2: Estimate scale

Step 3: Design the High-level Architecture

Step 4: Go Deep into Components

Horizontal vs. Vertical Scaling

Load Balancers

Databases: SQL vs. NoSQL

SQL Databases (Relational)

NoSQL Databases (Non-Relational)

How to Choose

Caching

Cache Invalidation

Content Delivery Networks (CDNs)

Message Queues

Database Sharding and Replication

Replication

Sharding

Consistent Hashing

Rate Limiting and API Design

Putting It All Together

Key Takeaways

System Design Cheat Sheet for Senior Engineer Interviews

Arslan Ahmad

March 18th, 2026

Struggling with system design interviews? This plain-English cheat sheet covers the building blocks every senior engineer candidate should understand.

On This Page

The Framework: How to Approach Any System Design Question

Step 1: Clarify Requirements

Step 2: Estimate scale

Step 3: Design the High-level Architecture

Step 4: Go Deep into Components

Horizontal vs. Vertical Scaling

Load Balancers

Databases: SQL vs. NoSQL

SQL Databases (Relational)

NoSQL Databases (Non-Relational)

How to Choose

Caching

Cache Invalidation

Content Delivery Networks (CDNs)

Message Queues

Database Sharding and Replication

Replication

Sharding

Consistent Hashing

Rate Limiting and API Design

Putting It All Together

Key Takeaways

The Framework: How to Approach Any System Design Question

Before jumping into specific concepts, you need a repeatable approach. Walking into a system design interview without a framework is like writing code without knowing the requirements.

Here is a simple four-step process that works for almost every question:

Step 1: Clarify Requirements

Ask questions.

What features does the system need?

How many users?

What is more important, speed or consistency?

Step 2: Estimate scale

Do rough math.

How many requests per second? How much data?

These numbers guide your design choices.

Step 3: Design the High-level Architecture

Sketch the major components. Clients, servers, databases, caches.

Show how they connect.

Step 4: Go Deep into Components

Pick the most critical parts and go deeper. This is where you show real understanding.

The rest of this post gives you the knowledge to fill in those steps with confidence.

Horizontal vs. Vertical Scaling

When a system starts getting more traffic, it needs to grow. There are two ways to do this.

Vertical scaling means making your existing machine more powerful. More CPU, more RAM, more storage. It is simpler, but it has a hard ceiling. There is only so big one machine can get. And if that machine goes down, everything goes down.

Horizontal scaling means adding more machines. Instead of one powerful server, you run ten smaller ones. This is harder to set up because your system has to coordinate across multiple machines. But it gives you the ability to keep growing without a limit and the ability to survive if one machine fails.

Almost every large-scale system uses horizontal scaling.

Load Balancers

Once you have multiple servers, you need something to decide which server handles each request. That is what a load balancer does. It sits between users and your servers, distributing traffic across them.

There are a few common strategies:

Round Robin sends each request to the next server in line. Simple, predictable, and works well when all servers are equally capable.

Least Connections sends the request to whichever server currently has the fewest active connections. This is smarter because it accounts for some requests taking longer than others.

IP Hashing uses the user's IP address to determine which server they go to. The same user tends to land on the same server, which helps if you store session data locally.

The key thing to understand is that load balancers solve two problems at once. They distribute work evenly, and they provide redundancy.

If one server dies, the load balancer stops sending traffic to it.

Databases: SQL vs. NoSQL

This is one of the most common topics in system design interviews. You will almost always need to choose a database, and the interviewer wants to know why.

SQL Databases (Relational)

SQL databases store data in tables with rows and columns. They enforce a strict structure called a schema, which means every row in a table has the same columns.

SQL databases are great when your data has clear relationships. They support ACID transactions, which is a way of saying they guarantee your data stays consistent even when multiple operations happen at the same time. If you are building something where accuracy matters a lot, SQL is usually the safe bet.

Popular choices: PostgreSQL, MySQL.

NoSQL Databases (Non-Relational)

NoSQL databases are more flexible. They do not require a fixed schema. You can store different kinds of data in the same collection without everything following the same structure.

There are several types. Document stores (like MongoDB) save data as JSON-like objects. Key-value stores (like Redis) are extremely fast and map a key to a value. Wide-column stores (like Cassandra) are designed for massive amounts of data spread across many machines.

NoSQL databases scale horizontally more easily.

The trade-off is that they often sacrifice some consistency guarantees to achieve that scale.

How to Choose

If your data is structured and relationships matter, lean toward SQL.

If you need flexibility, speed, or horizontal scalability, lean toward NoSQL.

In an interview, always explain your reasoning.

Learn more about SQL vs NoSQL.

Caching

A cache is a fast, temporary storage layer that sits between your application and your database. Its whole purpose is to avoid doing the same expensive work twice.

Here is what happens without caching. Every time a user requests the same data, your server goes to the database and runs the query.

If a thousand users request the same thing, that is a thousand identical database queries. Wasteful.

With a cache, the first request goes to the database. The result gets stored in the cache.

The next 999 requests get their answer from the cache, which is dramatically faster because it stores data in memory.

Redis and Memcached are two popular caching tools. They store data in memory, making reads extremely fast.

Cache Invalidation

The hardest part about caching is knowing when your cached data is outdated. This is called cache invalidation.

Write-through cache updates the cache every time the database is updated. Data is always fresh, but writes are slower.

Write-back cache writes to the cache first and updates the database later. Faster writes, but you risk losing data if the cache crashes.

TTL (Time to Live) gives each cached item an expiration time. After that time passes, the item is removed and the next request fetches fresh data.

In interviews, mentioning cache invalidation shows you understand caching is not a free lunch.

Content Delivery Networks (CDNs)

A CDN is a network of servers spread across different geographic locations.

When a user requests a static file (an image, a video, a CSS file), the CDN serves it from the server closest to that user.

Why does this matter? Because physical distance affects speed.

If your server is in New York and your user is in Tokyo, every request has to travel across the Pacific Ocean and back.

A CDN puts a copy of your static content near the user, making the response much faster.

In an interview, if the system involves serving media or static assets to a global audience, mentioning a CDN shows you are thinking about latency and user experience.

Message Queues

Not everything in a system needs to happen immediately. That is where message queues come in.

A message queue sits between two parts of your system.

One part (the producer) adds tasks to the queue.

Another part (the consumer) picks up tasks and processes them.

Why is this useful?

It decouples your components.

The producer does not need to wait for the consumer to finish. It drops the task in the queue and moves on. This makes your system faster for the user and more resilient.

If the consumer goes down temporarily, the tasks pile up in the queue and get processed when it comes back.

Popular tools: RabbitMQ, Apache Kafka, Amazon SQS.

In system design interviews, message queues are the answer whenever you need asynchronous processing, like sending emails, generating reports, or processing uploads.

Database Sharding and Replication

When a single database cannot handle the load, you have two main strategies.

Replication

Replication means creating copies of your database. You have one primary database that handles writes and one or more replicas that handle reads.

Since most applications read data far more often than they write, this takes a huge load off your primary database.

The trade-off is replication lag, a small delay between when data is written to the primary and when it appears on the replicas.

Sharding

Sharding means splitting your data across multiple databases. Each shard holds a portion of the total data.

This allows your system to handle far more data because no single database carries the full load. But sharding adds complexity. Cross-shard queries become difficult.

And picking the right shard key (the rule that decides which data goes where) is critical.

A bad shard key leads to uneven distribution.

In interviews, bring up sharding when the data volume is clearly too large for one database. But also mention the downsides.

Consistent Hashing

When you distribute data across multiple servers, you need a way to decide which server holds what.

Simple hashing (key modulo number of servers) works until you add or remove a server. When that happens, almost all the data needs to be remapped.

Consistent hashing solves this. It arranges servers on a virtual ring.

When you need to find where data lives, you hash the key and walk clockwise around the ring until you hit a server.

When you add or remove a server, only a small fraction of the data needs to move, not everything. This concept shows up frequently in interview questions about distributed systems.

Rate Limiting and API Design

When you design a system that exposes an API, you need to protect it from being overwhelmed.

Rate limiting controls how many requests a user or client can make within a given time window.

Without rate limiting, a single user or a malicious bot could flood your system with requests and take it down for everyone. The most common algorithm is the token bucket. It gives each user a bucket that fills with tokens at a steady rate. Each request costs one token. When the bucket is empty, requests are rejected until more tokens accumulate.

Mentioning rate limiting in an interview signals that you think about system reliability and abuse prevention.

Learn about the most crucial aspects of system design interview.

Putting It All Together

System design is not about knowing every tool. It is about understanding what problems exist and which tools solve them.

When you walk into an interview, you are not expected to build a perfect system on a whiteboard. You are expected to think clearly, communicate your reasoning, and show that you understand trade-offs.

Every concept in this post connects to a real problem:

Too much traffic for one server? Horizontal scaling and load balancers.
Slow database reads? Add a cache.
Database overloaded? Replication for reads, sharding for data volume.
Global users experience lag? Use a CDN.
Tasks that do not need to be instant? Message queue.
Distributing data across servers? Consistent hashing.
System getting abused? Rate limiting.

For a complete prep guide, check out Grokking the System Design Interview course by DesignGurus.io .

Key Takeaways

Always start with a framework. Clarify requirements, estimate scale, sketch the architecture, then deep dive.
Horizontal scaling is the default for large-scale systems. Vertical scaling has a hard ceiling.
Choose your database intentionally. SQL for structured, relational data. NoSQL for flexibility and scale.
Caching speeds things up but introduces cache invalidation challenges. Know the strategies.
Message queues decouple components and enable asynchronous processing for non-urgent tasks.
Sharding and replication are how databases scale, but they come with complexity.
Trade-offs matter more than perfect answers. Interviewers want to see critical thinking about design decisions.

System Design Interview

What our users say

Brandon Lyons

The famous "grokking the system design interview course" on http://designgurus.io is amazing. I used this for my MSFT interviews and I was told I nailed it.

Arijeet

Just completed the “Grokking the system design interview”. It's amazing and super informative. Have come across very few courses that are as good as this!

Eric

I've completed my first pass of "grokking the System Design Interview" and I can say this was an excellent use of money and time. I've grown as a developer and now know the secrets of how to build these really giant internet systems.

Recommended Course

Grokking the Object Oriented Design Interview

59,497+ students

3.9

Learn how to prepare for object oriented design interviews and practice common object oriented design interview questions. Master low level design interview.

View Course