Arslan Ahmad

May 19th, 2025

Top 12 System Design Trade-offs Every Interviewee Must Master in 2025

Discover the top 12 system design trade-offs every developer must know in 2025. Learn how to balance scalability, consistency, and performance to excel in system design interviews.

System design is all about balancing competing priorities – there’s rarely a perfect solution that maximizes everything.

In a system design interview, you’ll be expected to discuss how you navigate system design trade-offs and justify your decisions. Understanding these trade-offs not only helps you build scalable, reliable systems, but also demonstrates to interviewers that you can reason about why you choose a particular design over another.

In 2025, modern architectures (think microservices, distributed databases, cloud platforms) make classic trade-offs more relevant than ever.

In this post, we will outline 12 fundamental system design trade-offs – from scalability vs consistency to latency vs throughput – that every aspiring system designer should know.

1. Scalability vs. Consistency

Scalability is a system’s ability to handle increased load by adding resources (e.g. more servers or CPU) without sacrificing performance.

Consistency means all users see the same data at any given time (especially in distributed systems).

The trade-off here: achieving massive scalability (often via distributing data across nodes) can make it hard to maintain strict consistency across all those nodes.

In practice, the more you scale out a system, the more you risk data being momentarily out-of-sync.

The Conflict: To scale horizontally, we partition data across servers. But partitioning means one user’s data might live on a different server than another’s, making it challenging to instantly keep all copies of data in sync. Ensuring strict global consistency introduces latency or complexity, which can limit scalability. On the flip side, a single strongly-consistent database can ensure everyone sees the same data, but it might become a bottleneck as load grows.

Example:

Imagine a social network like Twitter. It needs to serve millions of users concurrently (high scalability), so data is replicated and sharded globally. Twitter could ensure every tweet and like is immediately consistent worldwide, but that would slow things down or require stopping the world on each update. Instead, they accept slight delays in propagation (inconsistency) so that the system can scale and remain fast for users. In contrast, a bank’s core transaction system will sacrifice some scalability (e.g. by funneling writes through a primary database) to guarantee that your account balance is always correct and up-to-date for every read.

Large distributed systems often balance on a spectrum between scalability and data consistency. If your goal is to handle internet-scale traffic (YouTube, Twitter), you may design for eventual consistency to stay responsive under load.

If absolute correctness is paramount (banking, inventory), you may accept scaling limits to keep data consistent. This theme is closely related to the CAP theorem, which we’ll cover next.

2. Consistency vs. Availability (The CAP Theorem)

In distributed systems, the CAP Theorem formalizes one of the most crucial trade-offs: Consistency (C) vs. Availability (A) under network Partitions (P).

In simple terms, when parts of your system can’t talk to each other (a network partition), you have to choose between: ensuring consistency (all nodes show the same latest data) or ensuring availability (the system continues to serve requests).

You can’t have both at the same time in a partition scenario.

Consistency (C): Every read receives the most recent write. All nodes instantly agree on data (no stale reads). Choosing consistency might mean if a few servers can’t communicate, they stop accepting requests to avoid serving outdated data (thus sacrificing availability during the partition).
Availability (A): Every request receives some response – it may not be the very latest data, but the system stays up no matter what. Choosing availability means even if parts of the cluster are down or partitioned, the service remains operational (possibly serving older data).
Partition Tolerance (P): The system can continue working despite network splits. In modern distributed systems, partition tolerance is usually non-negotiable – networks will fail, so we design for it.

According to CAP, you must pick C or A when P occurs.

For instance, many NoSQL databases (like Cassandra or DynamoDB) choose to remain Available in a partition at the expense of consistency; they might serve slightly stale data rather than an error.

On the other hand, a system like a distributed SQL database might prefer Consistency, denying requests during a partition rather than risking contradictory data.

Example:

In an e-commerce site’s inventory service, a CP design (Consistency + Partition tolerance) would never allow two customers to buy the last item simultaneously – if a network split happens, one partition might temporarily stop sales to keep inventory counts consistent. A more availability-focused design (AP) might allow both sales to proceed on each side of the partition, keeping the site running, and reconcile the double-selling later. Which is better? It depends on the business – many high-scale services (social media, caching systems) favor availability so the user experience is never interrupted, whereas financial or ordering systems often lean towards consistency to avoid data anomalies.

Be ready to discuss which side of CAP you’d favor for a given system. Interviewers love to ask, for example, “During a network failure, would you rather your service be consistent or still up?”

The answer hinges on the application’s needs – social networks and content caches often prefer availability (you can tolerate slightly stale posts), while banking or critical data may prefer consistency (better to refuse a request than show wrong data).

3. Strong vs. Eventual Consistency

This trade-off zooms in on how consistency is achieved in distributed data storage. Strong consistency means after any write operation, all reads will see that write (no matter which replica they read from) – it’s like the system instantly syncs everywhere.

Eventual consistency means writes will propagate to all nodes eventually, but reads might see out-of-date data in the interim. Essentially, strong consistency gives the most up-to-date, accurate view of data at all times, while eventual consistency favors system availability and performance by allowing a lag.

Strong Consistency: Guarantees that as soon as a transaction or update is done, any subsequent read will return the latest value. This typically requires mechanisms like locks, master-slave coordination, or consensus protocols to ensure all nodes agree before proceeding. It’s critical for systems where conflicting or stale data is unacceptable – for example, accounting systems or bank transfers (when you move money, your balances update immediately and consistently or the transaction won’t commit). The downside is increased latency (you often have to wait for confirmations) and reduced fault tolerance (if a replica is down, some systems block writes to maintain consistency).
Eventual Consistency: Allows that after an update, different parts of the system might see different data for a short time, but given a bit of time (and no new updates), all replicas converge to the same state. This approach embraces the realities of distributed systems: it’s okay for data to be slightly stale as long as it becomes consistent later. The big advantage is higher availability and often performance – replicas don’t have to coordinate synchronously on every write. Many NoSQL and caching systems use eventual consistency, meaning they’ll serve data fast from a local replica even if a recent update hasn’t arrived there yet.

Example

A social media feed is a classic example where eventual consistency is acceptable: when you post a photo on Instagram or a tweet on Twitter, not all your followers will see it instantly at the exact same millisecond.

Some might see it after a few seconds or might momentarily see an outdated like count – that’s eventual consistency at work, trading immediate perfection for scalability.

The system ensures that soon everyone’s feed reflects the new post, but it doesn’t block the entire app waiting for that to happen. In contrast, financial transactions (say transferring $100 between bank accounts) use strong consistency – it’s crucial that once the transfer is done, any balance inquiry or further transaction sees the updated balances immediately.

You wouldn’t want one ATM saying you have $100 less while another hasn’t gotten the memo yet!

Strong vs eventual consistency is essentially about latency/availability vs accuracy. In an interview, if your design involves a distributed database or cache, you should mention if you’ll use strong consistency (and how, e.g. single leader, two-phase commit) or eventual consistency (multi-master, async replication) and why.

For instance, “I’d choose eventual consistency for the user profile pictures service – it’s okay if a profile update takes a few seconds to propagate, and this way the system remains fast and partition-tolerant.”

Showing you understand when eventual consistency is acceptable (and the slight risk of stale data it brings) will score you points.

4. Latency vs. Throughput

When we talk about performance, we often distinguish between latency and throughput.

Latency is the time it takes to handle a single request or operation (how fast is each interaction).

Throughput is the total amount of work done per unit time (how many operations can we handle per second).

There’s an inherent trade-off: systems optimized for ultra-low latency might do less work in parallel or in bulk, whereas systems optimized for throughput often batch or pipeline work, incurring more latency per task.

Low Latency focus means the system responds immediately to each request. This is crucial in use cases like high-frequency trading, online gaming, or ride-hailing dispatch – where every millisecond of delay matters to user experience. To achieve low latency, designs might avoid expensive computations, use in-memory caches, or process requests more serially (to give full attention to one task at a time). The trade-off is you might not achieve the absolute maximum throughput because you’re handling things one-by-one very quickly.
High Throughput focus means the system maximizes the volume of work done, often by processing tasks in batches or concurrently. This is ideal for data processing pipelines, batch jobs, or any scenario where doing more is better even if each unit of work takes a bit longer. Techniques like buffering, batching, and asynchronous pipelines can drastically improve throughput (tasks per second) but at the cost of each individual task waiting in a queue or batch longer (higher latency per task).

Uber’s ride matching system prioritizes low latency – when a user requests a ride, the system must find a driver within seconds. It would be unacceptable to batch ride requests for a minute just to maybe dispatch more efficiently – users would be stuck waiting.

Each request is handled as fast as possible, possibly sacrificing some overall throughput.

On the other hand, a data analytics service (say, processing logs to generate reports) might prioritize throughput: it could collect data for a few minutes and crunch it in one go using big batch jobs. Each report might take a bit longer to be ready (latency), but the system processes millions of records per second in aggregate (throughput).

Another example: online gaming vs. analytics – a multiplayer game must be super responsive to each action (latency is critical), whereas a nightly analysis of player behavior can crunch data in bulk where a higher latency is fine if it means analyzing more data in one go.

Latency vs throughput is a classic performance trade-off.

In interviews, relate it to the system’s goals: “For a chat application, I’ll prioritize low latency so messages feel instant. But for computing analytics or backing up data, I can batch operations to maximize throughput.”

Mention techniques like batching (increases throughput, adds latency) or parallelism. Understanding this balance shows you know how to optimize for a system’s primary requirement (fast response vs. high volume).

Always tie it to the use case: if users care about snappy interactions, lean towards low latency; if it’s about processing large data or high load, consider throughput optimizations.

5. Horizontal vs. Vertical Scaling

Scaling a system can be done in two fundamental ways: vertical scaling (scale-up) and horizontal scaling (scale-out). This trade-off is about how you add resources to handle more load.

Vertical Scaling (Scale-Up): Add more power to a single server – e.g. add CPU cores, memory, or use a bigger machine. It’s straightforward: no changes to application architecture are usually needed, and a single server can handle more load. However, there’s a hard limit – you can only get so big (there’s always a server with finite capacity), and it creates a single point of failure (if that one beefy machine dies, your whole service might go down). Cost can also shoot up nonlinearly for high-end hardware.
Horizontal Scaling (Scale-Out): Add more servers to distribute the load. Instead of one supermachine, you have many ordinary machines working together. This allows virtually unlimited scaling (in theory, just keep adding servers) and eliminates single hardware bottlenecks. But it introduces complexity: you now have a distributed system. You must handle load balancing, data partitioning, inter-node communication, and consistency issues as discussed earlier. Managing 100 servers is much more complex than 1 server, but it can handle far more traffic overall.

Many startups begin with vertical scaling. Imagine an early-stage web app on a single server – to handle more users, you upgrade to a larger instance with more RAM/CPU. It’s simple and effective up to a point.

As the user base keeps growing, eventually you can’t get a bigger single machine (or it becomes too expensive), so you move to horizontal scaling – deploy the app on multiple servers and put a load balancer in front.

Companies like Facebook, Google, etc., operate at such huge scale that horizontal scaling is the only option – they use thousands of commodity servers rather than one huge mainframe. However, for your personal project or MVP, vertical scaling might be perfectly fine and simpler.

Vertical vs horizontal scaling is often one of the first trade-offs to mention when discussing scalability. Interviewers appreciate when you say something like, “We can initially scale vertically because it’s simple, but it has limits. For long-term scalability, we’d design for horizontal scaling, adding more servers and handling the distributed system complexity.”

This shows you understand the practical approach: start simple, but know how to scale out when needed.

Also mention fault tolerance – horizontal scaling can improve resilience (one node down doesn’t take out the whole system), whereas vertical scaling concentrates risk in one node.

6. Monolithic vs. Microservices Architecture

This is an architectural trade-off between building one unified application or breaking it into many small services. Monolithic architecture means the entire application (all modules/features) is built as a single deployable unit.

Microservices architecture means the application is split into a set of smaller, independent services that communicate via APIs.

Both approaches are used in industry, and each has pros/cons:

Monolith: All-in-one codebase and deployment. Monoliths are simple to develop and deploy initially – you run one thing, one codebase to understand. There’s no overhead of network calls between services, so communication is faster (function calls in-process) and it’s easier to make wide-ranging changes (since all code is together). Monoliths work great for small teams and straightforward applications. However, as the application grows large, a monolith can become a big ball of mud – tightly coupled, harder for many teams to work on simultaneously, and scaling can be challenging because you have to scale the whole app together. Deployments become riskier (one bug can bring down everything) and adopting new technologies is harder (since everything is one stack).
Microservices: The application is broken into independently deployable services, each responsible for a specific business capability. For example, an e-commerce site might have separate microservices for user accounts, product catalog, orders, payments, etc. The big advantage is scalability and agility – each service can be scaled on its own (e.g. if the product search service needs more CPU it can scale up separately) and developed by separate teams in parallel. It also enhances fault isolation (if one service crashes, others can often continue running). Microservices enable using the best-suited technology for each service. The downsides? Increased complexity in operations. Now you have many moving parts: inter-service communication (often over network), potential data consistency issues between services, complex debugging across service boundaries, and the need for robust monitoring, deployment, and orchestration tooling. Basically, you’ve traded a straightforward monolith for a distributed system of many pieces.

Example

Twitter famously started as a monolith (a Ruby on Rails app) and encountered the “Fail Whale” when it couldn’t scale.

Over time, Twitter and many other companies (Amazon, Netflix, Uber) broke their applications into microservices to scale to millions of users and deploy features faster.

For instance, Netflix has hundreds of microservices powering the streaming platform.

On the other hand, a small startup might deliberately start with a monolith for speed of development – one codebase is easier to get off the ground.

As the product matures, they might evolve into microservices when the monolith becomes a bottleneck. There are also hybrid approaches (modular monoliths, microservices for some parts of the system) showing that it’s not one-size-fits-all.

Monolith vs microservices is about simplicity vs scalability/complexity. Interviewers often ask which architecture you’d choose for a given scenario.

A smart answer is: “For a small-scale or initial version, a monolith keeps things simple and faster to develop. As requirements and team size grow, moving to microservices can help each component scale and be managed independently – but we have to deal with the added complexity in deployment and consistency.”

Showing that you understand the operational overhead of microservices (and the need for things like service discovery, load balancing, monitoring, etc.) is key. Also, emphasize that microservices are not automatically “better” – they solve certain scaling and team autonomy problems at the cost of new challenges.

Use the buzzwords if appropriate: monoliths can lead to tight coupling, microservices promote loose coupling and independent scaling but introduce distributed systems problems.

7. SQL vs. NoSQL Databases

When designing storage, a classic decision is relational (SQL) vs non-relational (NoSQL) databases. This is essentially a trade-off between the strong consistency & rich querying of SQL databases and the flexibility & scalability of NoSQL databases.

SQL Databases: Relational databases (MySQL, PostgreSQL, Oracle, etc.) use structured schemas (tables with predefined columns) and support powerful SQL queries and joins. They enforce ACID properties for transactions, which ensures strict consistency and integrity (great for banking, financial apps). SQL databases excel at complex queries (aggregations, multi-table joins) and ensure data is highly structured. However, scaling SQL databases horizontally can be hard – typically they scale vertically (bigger machines) or via read-replicas/sharding with significant effort. In other words, out-of-the-box, a single SQL instance can become a bottleneck at very large scale. Also, rigid schemas make iterative development a bit slower (need to migrate schemas for changes).
NoSQL Databases: This is a broad category (document stores like MongoDB, key-value stores like Redis, wide-column like Cassandra, graph DBs, etc.), but in general NoSQL systems favor flexible schemas or schema-less data, and they often sacrifice some traditional SQL capabilities (like complex joins or sometimes ACID consistency) to achieve massive scalability and high performance on distributed clusters. Many NoSQL databases are designed to scale horizontally easily – you can spread data across many nodes (partitioning) and handle huge volumes of reads/writes by adding servers. They often prefer eventual consistency models (for availability) and may not support multi-item transactions (or if they do, with limited scope). The trade-off is that you may have to handle data consistency and complex querying in your application or via additional systems, because NoSQL might not join or enforce all relationships for you.

Example:

Banking systems almost exclusively use SQL databases (or NewSQL) because of the need for absolute accuracy and complex transactions – e.g., recording a money transfer requires multiple updates that must all succeed or fail together, and you might need strong foreign key relationships, etc. SQL’s ACID guarantees shine here.

In contrast, modern web companies like Twitter, Facebook, and Netflix heavily use NoSQL for certain use cases: Netflix uses Cassandra (a NoSQL wide-column store) to reliably serve streaming data to millions of users across the globe.

Cassandra is designed to scale horizontally across data centers and remain available – perfect for a recommendation engine or user activity feed that can tolerate eventual consistency. Facebook’s Messenger famously uses a NoSQL (Cassandra) for message storage to handle billions of messages with low latency.

Meanwhile, they might still use MySQL for things like user accounts or financial records where consistency is crucial – many large systems actually use a mix of SQL and NoSQL for different components, choosing each based on trade-offs.

SQL vs NoSQL often boils down to structure & consistency vs flexibility & scale.

Interviewers might ask something like “Would you use SQL or NoSQL for this design, and why?” A good answer touches on data model and requirements: “If we need complex queries and transactions (e.g. an ad bidding system with relationships), a SQL database provides consistency and a rich query language. But if we’re dealing with a high-write, scalable scenario like logging or caching user sessions, a NoSQL store might be better suited due to easier horizontal scaling. Actually, we could even use both: SQL for critical data, NoSQL for the scalable portions.”

Mention things like ACID vs BASE: SQL is ACID-compliant, whereas many NoSQL follow BASE (Basically Available, Soft state, Eventual consistency) which aligns with the CAP theorem trade-offs we discussed.

Showing you know real-life uses (e.g. “MongoDB for flexible user profiles, Postgres for transactions”) makes your argument concrete.

8. Performance vs. Data Freshness (Caching Trade-off)

Caching is an essential technique in system design to improve performance, but it introduces a fundamental trade-off: speed vs. data freshness.

When you use a cache (storing copies of data in memory or closer to users), you get faster reads and can handle more load, but the risk is serving stale data (data that’s no longer up-to-date with the source of truth).

The system must balance delivering responses quickly and ensuring the data isn’t outdated.

Caching for Performance: By storing frequently accessed data in a fast storage layer (memory, CDN, etc.), systems can respond to reads very quickly without hitting the slower backend or database each time. Caching also offloads work from databases, allowing higher throughput. For example, a cache can cut down response times for an API from 200ms to 20ms, dramatically improving user experience.
Stale Data and Consistency: The downside is caches might serve old data if the underlying data has changed. The moment you copy data, you’ve introduced the chance that the copy diverges from the source. Data freshness refers to how up-to-date the cached data is compared to the source. If an item is updated in the database but the cache hasn’t been refreshed, users may temporarily see incorrect or old information. Essentially, caching favors performance and scalability at the cost of immediate consistency and accuracy of data.
Managing the Trade-off: We mitigate this by cache invalidation strategies, TTL (time-to-live) expirations, and cache update policies, but it’s impossible to have a cache that’s both always fresh and always fast unless you update it in real-time (which often negates the performance benefit). A common saying is, “This is the trade-off that caching offers: stale data vs. speed.” You as the designer choose how much staleness is tolerable. For instance, you might accept that a user’s profile page can show data that is a few minutes old (in exchange for faster load times), but you wouldn’t want a cache for bank account balances unless it’s updated instantly on each transaction.

Example:

YouTube caching – Ever notice how view counts on a YouTube video sometimes lag behind or a freshly uploaded video takes a bit to appear everywhere?

YouTube leverages a global content delivery network (CDN) to cache video content and metadata across the world. This allows users to stream videos with low latency from a nearby server rather than the distant origin.

The trade-off is that some data (like the exact view count or newly added comments) might not update immediately on all servers.

They refresh and sync these periodically or on certain triggers, but it’s a conscious decision: delivering smooth video playback (performance) is prioritized over instant global consistency of every statistic.

Another example: web API responses – developers often cache expensive API results.

Suppose an API gives the latest trending topics and updates every 60 seconds. Caching that result for even 30 seconds means users see super fast responses, but someone might see a topic list that’s up to 30 seconds old.

In most cases, that’s fine. However, for something like stock prices or live sports scores, even a 30-second cache might be unacceptable – those might need real-time freshness and thus less caching.

When discussing caching in interviews, always mention the trade-off between performance and freshness.

For instance, “I’ll use a cache (Redis) to store user session data for quick access. It will greatly improve response times, but I need to consider cache invalidation carefully to avoid stale sessions. Caching is essentially trading immediate consistency for speed, so I’ll set a short TTL and update the cache on writes to minimize staleness.”

This shows you are aware that caching isn’t free magic – it comes with the complexity of maintaining consistency.

Also, mention real-world examples or strategies: cache aside pattern, write-through vs write-back (each has implications on freshness), and that famous quote “There are only two hard things in Computer Science: cache invalidation and naming things.”

It earns a chuckle and shows you grasp that keeping cached data fresh is hard. But in the end, caching, used wisely, yields massive performance gains – just be ready to discuss how you keep data reasonably fresh and what level of staleness is acceptable for the application.

9. Synchronous vs. Asynchronous Processing

This trade-off concerns how tasks are executed and how users or systems wait for results. Synchronous processing means a request or task is handled in-line and the caller waits until it’s finished.

Asynchronous processing means tasks are handled in the background or in parallel, and the caller doesn’t wait – the work is queued or executed concurrently, with results handled later. The choice between sync and async affects user experience, throughput, and complexity.

Synchronous Processing: In a synchronous flow, each action is completed fully before moving to the next. The client (or next step) is blocked waiting for the operation to finish. This is simpler to reason about and appropriate when the result of one step is needed immediately for the next step. However, it can lead to inefficiency if tasks could be done in parallel or if some tasks involve waiting (e.g. waiting for I/O). Too much synchronous dependency can also degrade user experience – for example, a web page that locks up until a long request finishes is using synchronous logic on the client side.
Asynchronous Processing: In async, tasks don’t block the flow. You might send a request and get a quick acknowledgement, then the heavy lifting is done in the background (possibly on another thread, or via a message queue, etc.). The client can do other things and be notified of completion later (via callback, polling for result, etc.). Asynchronous designs can improve throughput and responsiveness by utilizing wait times – while one task waits on I/O, another can run. It’s great for user experience when a task is non-critical to continue, or when you want to parallelize work. The downside is added complexity: you need to manage callbacks or event loops, handle out-of-order execution, and ensure eventual consistency of results.

Example (User-facing):

When you purchase something online, often the payment process is synchronous – you click “Pay” and the website waits (you see a loading spinner) until the payment gateway confirms success or failure.

This makes sense because you shouldn’t move on to “Download product” until payment is verified; it’s a critical step that must complete before proceeding.

On the other hand, when you upload a photo to a social network, the app typically lets you continue browsing or posting other things while the upload happens in the background – you don’t have to stare at a progress bar (or sometimes the upload happens asynchronously and you get a notification if it fails). The photo upload is a great candidate for async: it might take time, but the user can do other actions, and the system will handle the upload completion when ready.

Example (System/internal):

Consider a logging or analytics service. If every time a user clicks a button your app synchronously writes to an analytics database, the user’s action is slowed down by that logging.

Instead, you can log asynchronously: put a log message onto a queue and immediately return to the user.

A background worker will process the queue and write to the analytics store. This way, from the user’s perspective the action was fast (they’re not waiting on the logging), and the logging happens eventually behind the scenes.

Many high-scale systems use asynchronous pipelines (message queues, task queues like RabbitMQ, Kafka, Celery, etc.) to handle non-critical tasks asynchronously – it decouples the work from user-facing request/response cycle.

Sync vs async should come up whenever you have potentially long-running tasks or opportunities to parallelize.

In an interview, if you mention using a queue or background workers, you’re invoking this trade-off.

A good approach is: “For sending confirmation emails, I wouldn’t do it synchronously in the user signup request. I’d enqueue an email task to be handled asynchronously – this keeps the signup flow fast for the user, and the email will go out a few seconds later.”

This shows you value user experience (fast response) by using async where appropriate. Conversely, clarify when sync is needed: “However, when the user hits ‘Save’ on their profile, I will do that synchronously and confirm it’s saved before telling them success.”

Knowing what must be sync vs what can be async is a key design decision.

Also mention that asynchronous systems often require careful monitoring and retry logic (since work is happening in the background, you need to ensure it eventually completes).

But when done right, async processing greatly improves throughput and user responsiveness by not having everyone wait in line for tasks that can be done independently.

10. Stateful vs. Stateless Architecture

When designing servers or services, a crucial consideration is whether they should be stateful or stateless.

Stateful services maintain some memory of past client interactions (state) – for example, a session, or data cached in memory per user.

Stateless services treat each request independently, with no reliance on stored context from previous interactions. This trade-off impacts scalability, complexity, and how you handle things like user sessions.

Stateful Systems: A stateful server remembers information between requests. Classic examples: a web server that keeps a user’s session in memory (so it knows who you are after you login), or a game server that keeps track of players’ positions in memory. Stateful design can simplify certain things – you don’t need to re-send or re-compute context on every request. However, it makes scaling and fault tolerance harder. If a particular server instance is holding a user’s session state, all that user’s requests must go to the same server (this is called “sticky sessions”), which complicates load balancing. If that server goes down, the state might be lost or the user’s session is disrupted. It also means the server has to manage memory carefully to hold potentially many sessions. In short, stateful services can become bottlenecks and single points of failure, and adding more servers doesn’t seamlessly spread the load if state isn’t shared.
Stateless Systems: A stateless server, on the other hand, handles each request without any memory of previous ones. For a web application, this means no session stored on the server – instead, any needed context is passed in with each request (for example, a token or all necessary data). The advantage is massive scalability and simplicity of scaling: you can route any request to any server, since no “memory” is required. If one server goes down, it doesn’t take any unique state with it – another server can handle the next request just fine. This is why most modern web services (especially REST APIs) are stateless by design – e.g., each API call contains an auth token and all info needed; servers don’t remember you between calls. The drawback is sometimes more overhead per request (since clients might need to send more data, or the server might have to recompute things that a stateful server could have cached in memory). Also, truly stateless designs might push state management elsewhere (e.g. the database or client).

Example:

HTTP and REST APIs are stateless by protocol – every HTTP request is meant to be independent. If you log into a website and then navigate to another page, how does the server know you’re the same user?

In a stateless design, it’s because your browser sends a token or cookie on each request, which the server checks against a database or in-memory store. The server itself doesn’t retain a session object; it relies on that token (the state is externalized).

Many large-scale systems use an external session store or JWT tokens so that their web servers remain stateless and easy to scale.

By contrast, older architectures or certain games might use stateful servers – e.g., a multiplayer game server that holds all players’ state in memory must have players reconnect to the same server.

If that server crashes, the game might end because the state was lost. That’s why some online games have “reconnect” issues – the server was stateful. Modern designs often find ways to make even games more stateless or at least replicate state.

Another simpler example: shopping cart on a website – a stateful approach keeps the cart in the server’s session memory; a stateless approach keeps the cart in a client-side cookie or in a database, and each request fetches or is accompanied by the cart data.

Most sites now choose stateless for web servers (store cart server-side in a DB or cache, and identify by a cookie token, rather than keep it in one server’s memory).

Stateless vs stateful often comes up in context of web service design. It’s best to lean towards stateless service architecture for scalability – mention that: “We’ll design the web servers to be stateless, so we can easily add more servers behind a load balancer without worrying about session affinity. User session data will be stored in Redis or a database, not on individual servers.”

This shows you know how to achieve horizontal scaling. If something does need to be stateful (say, a real-time game or a long-lived connection), discuss how you’d handle that (like sticky sessions, or replication of state to avoid single point of failure).

Also mention the benefits: stateless services are easier to auto-scale and recover – any server can handle any request.

This concept is fundamental in cloud environments (like serverless or containers) where you might spin up new instances anytime – those instances better not require some pre-loaded state.

In summary, stateless = easier scaling, stateful = must handle where state lives.

11. Normalization vs. Denormalization (Database Design)

When designing a database schema, especially in relational databases, there’s a trade-off between normalization and denormalization of data.

Normalization means structuring your data to eliminate redundancy (usually by creating more tables and relationships).

Denormalization means deliberately allowing duplicate data or combining tables to optimize read performance at the expense of some redundancy.

Normalization (Normalized Schema): A fully normalized database might have many tables, each with a very specific purpose, linked by foreign keys. For example, in a normalized design, you might have a Users table and an Orders table, with a user’s orders referenced by a user ID (rather than storing user info in each order record). Normalization reduces data redundancy and ensures data integrity (update in one place). The benefit is you don’t have inconsistencies – e.g., the user’s name is stored once, so you can’t have two orders with two different copies of the name. It also typically saves storage space. The downside is complex queries and more joins: to gather data spread across many tables, you end up joining multiple tables, which can be slower in read-heavy scenarios. Highly normalized schemas are write-friendly (each piece of data is written in one place) but can be read-unfriendly when the data is needed together frequently.
Denormalization (De-normalized Schema): This involves merging data that would normally be in separate tables, or storing derived/duplicate data in advance. You’re basically trading data integrity and storage efficiency for query speed. Denormalization can dramatically speed up read performance because you reduce the need for joins – all the needed data might be in one table or one document. The cost is duplicate data: for instance, you might store a user’s name and email alongside each order in an Orders table so that fetching orders doesn’t require joining with Users. This means if the user changes their name, you have to update it in many places (risk of inconsistency if you miss one). It increases storage use as well. But for read-heavy systems (like analytics dashboards or content feeds), denormalization is often worth it.

Example:

A classic example is a social media feed or blog.

A normalized design would have separate tables for Posts, Comments, Users, etc. To display a feed of posts with the latest comments and user info, you’d have to join posts with users and comments tables – potentially slow if you’re doing it on the fly for every user’s feed.

A denormalized approach might store some user info and a few latest comments within the post record or in a pre-joined form, so that showing a feed is just a single lookup.

Indeed, many large-scale systems use denormalization or caching layers to serve feeds quickly. Another example from commerce: an order invoice – you might store the product price and name at the time of purchase inside the Order record (denormalized from the Products table). This way the order history doesn’t change if the product price or name changes later, and you don’t have to join to display the order.

However, it duplicates data from the Products table. In contrast, a normalized approach might just reference the product ID and always look up current name/price (which could be wrong if price changed). Many systems denormalize for the sake of preserving historical data and speeding up reads.

Normalization vs denormalization is a trade-off between data integrity vs read performance (speed).

In an interview, if talking about database schema, mention: “I’d start with a normalized schema for clarity and integrity, but if we face performance issues on heavy read queries, we might introduce denormalization or caching.

For example, storing an aggregate or duplicating some read-intensive fields to avoid costly joins.” If the question involves big data or a read-heavy use case, you can proactively say, “We might use a denormalized schema or NoSQL store for the feed, since we prioritize fast reads and can tolerate some data duplication.” Also, mention the maintenance aspect: with denormalization, “we’ll need to update or invalidate derived data whenever the source changes, which adds complexity.”

Demonstrating awareness of this trade-off shows that you can optimize database design for the use case, and that you know the cost of doing so (which is more complex writes/updates).

This is often tied to SQL vs NoSQL discussions too, since many NoSQL databases encourage denormalized, document-style storage for performance.

12. Batch Processing vs. Stream Processing

For handling data processing workloads, especially large data or event flows, a key design choice is batch vs. stream processing.

Batch processing involves collecting data over a period and processing it all at once (in batches).

Stream processing involves processing data continuously in real-time as it arrives. This trade-off influences system complexity, timeliness of results, and throughput.

Batch Processing: In batch, you accumulate a set of data, then run a job to process that entire set. It’s like doing work in chunks. Batch jobs might run on a schedule (e.g. every hour, or nightly). The advantage is efficiency for large volumes – the system can optimize processing on big chunks of data, and it can run when load is low (say, process logs at midnight). It also simplifies design sometimes: you don’t need your system to be always on handling events; it just kicks off jobs as scheduled. The downside is latency – results are ready only after the batch job completes. If you only run a recommendation algorithm once a day at 2 AM, then a user’s recommendations won’t update until the next day even if their behavior changed at noon. Batch is excellent for scenarios where some delay is acceptable, like generating a daily report, payroll processing, or aggregating stats at the end of the day.
Stream Processing (Real-time): In stream processing, data is processed element-by-element or in small windows, continuously. As soon as an event comes in, the system processes it (or within seconds). This enables real-time analytics or immediate reactions. The benefit is obviously timeliness – you can detect patterns or update metrics within seconds or minutes. For example, fraud detection systems often use stream processing to catch fraudulent transactions as they happen. The challenge is that streaming systems are typically more complex to build and operate. You need frameworks or pipelines that are always running, and you have to handle out-of-order events, exactly-once processing or duplicates, etc. It can also be more expensive to process every event in real-time versus doing one big batch, depending on scale.

Example:

Credit card processing often uses a combination: traditional billing might be batch (processing all transactions at the end of the day to settle accounts), whereas fraud detection is stream – as each transaction flows in, it’s evaluated in real-time for anomalies.

Another example: analytics dashboards – a startup might start with batch processing web analytics (compute yesterday’s traffic numbers each night).

If they need to display up-to-the-minute data, they move to a streaming approach (using something like Kafka + Storm/Flink or Spark Streaming) to update analytics in real-time.

Uber and Lyft likely use streaming for calculating ETA and surge pricing in real-time based on live data from riders and drivers, but they might use batch processing for things like end-of-day financial reconciliation or reporting.

Technologies: If relevant, you can mention technologies: batch processing is often associated with systems like Hadoop MapReduce or Spark jobs, whereas stream processing might involve Kafka, Apache Flink, Apache Storm, or cloud services for real-time data. For the interview context, the main point is conceptual: “Batch = process later in chunks, Stream = process now continuously.”

Batch vs stream is about throughput and simplicity vs real-time responsiveness.

In an interview, if you have a data processing component, clarify if it will be batch or streaming and why. For example: “User feed generation can be done in batches every few minutes to simplify things and aggregate changes, but for the chat notifications, we need stream processing to deliver messages instantly.”

Or “Our system will initially use batch processing for daily reports (easier to implement and sufficient), but if requirements change to real-time, we could shift to a streaming architecture using Kafka for event ingestion.”

Showing awareness of this trade-off indicates you understand the timing dimension of system design. Not everything needs real-time processing – batch might drastically simplify a solution if real-time isn’t necessary.

Conversely, some scenarios absolutely need streaming (e.g. alerting systems). If you can discuss both and justify one, you’re in good shape.

Conclusion

System design is inherently about trade-offs – improving one aspect of a system often means a compromise in another.

As a recap, here are the key points to remember from our top 12 trade-offs:

Scalability vs Consistency: More scale (distribution) can mean harder consistency. Decide what matters more based on the app (global social feed vs. bank ledger).
CAP Theorem (Consistency vs Availability): In a network failure, you’ll sacrifice one. Choose based on whether downtime or stale data is less acceptable for your system.
Strong vs Eventual Consistency: Instant synchronization (strong) vs. accept slight delays (eventual) for better performance. Not every system needs every piece of data in real-time.
Latency vs Throughput: Optimize for quick response or for total work done. Recognize what the primary goal of your system is – fast interactions or high-volume processing.
Horizontal vs Vertical Scaling: Scale out with more servers for high growth, but manage complexity; scale up for simplicity but mind the limits.
Monolith vs Microservices: One deployable unit for simplicity, or many small services for flexibility and independent scaling – consider team size, complexity, and growth plans.
SQL vs NoSQL: Structured, ACID databases for complex queries and consistency vs schema-flexible, distributed databases for massive scale and speed.
Caching (Performance vs Freshness): Use caches to speed up responses and handle load, but implement invalidation/refresh policies to control staleness of data.
Synchronous vs Asynchronous: Blocking, immediate operations vs background, eventual processing. Improve user experience and throughput by making non-critical tasks async when possible.
Stateful vs Stateless: Keeping session state can simplify logic but hinders scaling and failover. Embrace stateless services where feasible for easier scaling and resilience.
Normalization vs Denormalization: Clean, normalized data models prevent redundancy but may require slow joins; denormalize (or pre-aggregate) when you need fast reads, at the cost of data duplication and complex updates.
Batch vs Stream Processing: Process data in bulk for efficiency when real-time results aren’t needed, but switch to streaming for low-latency, continuous processing of incoming data when freshness is key.

Mastering these trade-offs will give you the confidence to tackle system design interviews in 2025.

Interviewers don’t expect a “perfect” design – they want to see that you understand the consequences of your choices and can articulate why you’d choose one approach over another.

By referring to these trade-offs (and even dropping relevant examples like “we might use eventual consistency like how Twitter does for feeds, to remain available under load”), you demonstrate a holistic understanding of system design.

Finally, remember that technology trends evolve, but these core concepts remain.

Whether it’s a traditional web app or a cutting-edge AI service, you’ll always be balancing things like consistency vs availability, speed vs accuracy, simplicity vs scalability. Show that you can find the right balance for the problem at hand.

Good luck with your interviews, and may your system designs be ever in harmony with their trade-offs!

FAQs

1. What are system design trade-offs?

System design trade-offs are the balancing acts engineers perform when building software architectures. Improving one aspect (like scalability) often impacts another (like consistency or complexity). Understanding these trade-offs helps you choose the best design for specific business requirements.

2. Why are system design trade-offs important for interviews?

Employers look for engineers who understand the consequences of design decisions. By demonstrating knowledge of trade-offs—like monolith vs. microservices or consistency vs. availability—you show you can design robust, scalable systems.

3. What is the CAP theorem, and how does it relate to system design?

The CAP theorem states that in the presence of a network partition, a distributed system can only provide either consistency (C) or availability (A), but not both. It’s essential in system design to decide which attribute matters more for your application.

4. When should I choose SQL versus NoSQL databases?

Use SQL (relational) databases for structured data, complex queries, and strong consistency. Use NoSQL when you need flexible schemas, high write throughput, or easy horizontal scalability.

5. How do I handle cache invalidation?

Cache invalidation ensures that stale data is updated or removed at the right time. Common strategies include setting time-to-live (TTL) values, using write-through or write-around policies, or actively invalidating caches when the underlying data changes.

6. Is microservices architecture always better than a monolith?

Not always. Microservices are great for large, complex projects with multiple teams, but they add operational complexity. Monoliths can be simpler for smaller applications or early-stage startups. The best choice depends on your team size, app complexity, and scaling requirements.

7. What is the difference between latency and throughput?

Latency measures how quickly a single request is handled (speed per request), while throughput measures the total number of requests handled over a period (total volume). High throughput can sometimes increase latency, and vice versa.

8. Why might I use eventual consistency in a design?

Eventual consistency is valuable when your system needs to remain highly available and can tolerate slight delays in data synchronization. Many large-scale systems—like social media feeds—adopt this approach to handle massive traffic.

9. Which is easier to scale: vertical or horizontal scaling?

Vertical scaling (upgrading a single machine) is simpler initially but has a hard limit on hardware capacity. Horizontal scaling (adding more machines) is more flexible long-term but introduces distributed system complexity.

10. How can I improve the reliability of a stateful service?

You can replicate state across multiple servers, use sticky sessions with load balancers, or design services to externalize state (e.g., store session data in a shared database or cache). Each approach has trade-offs around complexity, cost, and data consistency.

11. What’s the difference between batch and stream processing?

Batch processing handles data in large chunks on a schedule, often suitable for analytics or reports. Stream processing handles data in real-time, ideal for time-sensitive tasks like fraud detection or live analytics.

12. Where can I learn more about system design trade-offs?

Resources like “Designing Data-Intensive Applications” by Martin Kleppmann, online courses, and official documentation of distributed systems (e.g., AWS, Google Cloud) are excellent ways to deepen your knowledge and practice real-world scenarios.

System Design Fundamentals

What our users say

Tonya Sims

DesignGurus.io "Grokking the Coding Interview". One of the best resources I’ve found for learning the major patterns behind solving coding problems.

Steven Zhang

Just wanted to say thanks for your Grokking the system design interview resource (https://lnkd.in/g4Wii9r7) - it helped me immensely when I was interviewing from Tableau (very little system design exp) and helped me land 18 FAANG+ jobs!

KAUSHIK JONNADULA

Thanks for a great resource! You guys are a lifesaver. I struggled a lot in design interviews, and this course gave me an organized process to handle a design problem. Please keep adding more questions.