When to choose SQL versus NoSQL databases for a design problem
SQL vs NoSQL is the most frequently discussed database trade-off in system design interviews. SQL databases (PostgreSQL, MySQL, Aurora) store data in structured tables with predefined schemas, enforce ACID transactions, and excel at complex queries with joins. NoSQL databases (DynamoDB, Cassandra, MongoDB, Redis) store data in flexible formats (key-value, document, wide-column, graph), scale horizontally by design, and trade strict consistency for availability and throughput. In every system design interview, your database choice is a scored trade-off discussion—interviewers care less about which you pick and more about whether you can explain why given the requirements.
Key Takeaways
- Start every database decision with the workload, not the technology. Ask: What is the access pattern? What is the read-to-write ratio? Is ACID compliance required? How large will the data grow?
- SQL is the right default when you need transactions, complex queries with joins, data integrity, or when the data model has well-defined relationships.
- NoSQL is the right choice when you need horizontal scalability, high write throughput, flexible schemas, or simple key-value access patterns at massive scale.
- Many production systems use both. Netflix uses MySQL for billing (needs ACID) and Cassandra for content metadata (needs write throughput). This is the mature answer in interviews.
- Never say "NoSQL because it scales better" without qualification. Facebook scales MySQL to petabytes. Uber built their payment system on PostgreSQL. The tool matters less than the reasoning.
Why This Question Matters in Interviews
Database selection is the easiest trade-off discussion to score points on because every system design question requires at least one database. Interviewers use this decision to evaluate three things: whether you understand data modeling fundamentals, whether you can match a database to an access pattern, and whether you can articulate trade-offs under pressure.
A weak answer: "I would use NoSQL because it scales."
This earns zero points because it shows no reasoning about the specific workload.
A strong answer: "I would use DynamoDB here because our access pattern is simple key-value lookup by short URL, we need to handle 100,000 reads per second, and horizontal scalability is critical. The trade-off is that we lose the ability to do ad-hoc joins—if we later need analytics across URL mappings, I would add a separate Redshift data warehouse and sync via Kinesis."
The difference is specificity: named technology, stated access pattern, quantified scale, explicit trade-off, and mitigation plan.
The Core Comparison
| Dimension | SQL (Relational) | NoSQL (Non-Relational) |
|---|---|---|
| Data model | Tables with rows and columns, predefined schema | Key-value, document, wide-column, graph; flexible schema |
| Relationships | First-class support via foreign keys and joins | No native joins; denormalized or application-level joins |
| Consistency | Strong (ACID: Atomicity, Consistency, Isolation, Durability) | Typically eventual (BASE: Basically Available, Soft-state, Eventually consistent) |
| Scalability | Primarily vertical; horizontal requires sharding effort | Horizontal by design; add nodes to scale |
| Query flexibility | Rich SQL with joins, aggregations, subqueries | Limited to primary key lookups and predefined access patterns |
| Schema | Fixed; changes require migrations | Flexible; schema evolves with data |
| Write throughput | Limited by lock contention and schema validation | High; append-only logs and no schema validation overhead |
| Best for | Transactions, complex relationships, data integrity | High throughput, simple access patterns, massive horizontal scale |
| Examples | PostgreSQL, MySQL, Aurora, SQL Server, Spanner | DynamoDB, Cassandra, MongoDB, Redis, Bigtable |
When to Choose SQL
You Need ACID Transactions
Any system where data correctness is non-negotiable requires SQL. Payment processing, banking, inventory management, and order fulfillment involve operations that must be atomic—either the entire transaction succeeds or the entire transaction rolls back.
If two concurrent requests try to deduct from the same account balance, SQL's isolation guarantees prevent double-spending.
Interview example: "For the payment service, I need ACID transactions to ensure that debiting the buyer and crediting the seller happen atomically. If either fails, the entire transaction rolls back. PostgreSQL with serializable isolation handles this correctly."
You Need Complex Queries With Joins
When your data has well-defined relationships—users have orders, orders have line items, line items reference products—SQL's join capability is a natural fit. Running a query like "Find all orders from users in California who purchased product X in the last 30 days" is trivial in SQL and painful in NoSQL (requiring multiple round-trips and application-level joining).
Interview example: "The admin dashboard needs to run ad-hoc analytics queries joining users, orders, and products. SQL handles this natively. With NoSQL, I would need to denormalize the data or maintain a separate analytics store."
Your Data Model Has Strong Relationships
E-commerce catalogs, ERP systems, social graphs with complex queries, and financial ledgers all have deeply relational data. SQL schemas enforce referential integrity—you cannot create an order for a non-existent user if the foreign key constraint is in place.
Scale Is Manageable
If your system handles thousands to tens of thousands of requests per second (not millions), SQL on a managed service like Aurora handles it well. Aurora supports up to 128 TB of storage, 15 read replicas, and automatic failover. Most systems never outgrow this.
Real-world validation: Facebook scales MySQL to manage petabytes of data across their infrastructure. Uber built their payment system on PostgreSQL. SQL's scalability ceiling is higher than most candidates assume.
When to Choose NoSQL
You Need Horizontal Scalability at Massive Scale
When your system needs to handle millions of writes per second or store petabytes of data, NoSQL databases like Cassandra and DynamoDB are designed for horizontal scaling. You add nodes to increase capacity without architectural changes.
Interview example: "Our chat system processes billions of messages per day. Cassandra handles this because its LSM-tree storage engine is optimized for write-heavy workloads, and we can add nodes linearly as message volume grows."
Your Access Pattern Is Simple
If every query is a single-key lookup—"given user_id, return profile" or "given short_url, return long_url"—a key-value store like DynamoDB is faster, cheaper, and simpler than SQL. You do not need the overhead of schema validation, join support, and query planning.
Interview example: "The URL shortener's access pattern is a direct key-value lookup: given the short code, return the destination URL. DynamoDB gives us single-digit millisecond latency for this pattern with zero capacity planning via on-demand mode."
You Need Flexible or Evolving Schemas
Content management systems, user-generated content platforms, and IoT data pipelines often deal with data whose structure varies per record or evolves frequently. Document databases like MongoDB store each record as a JSON-like document with no fixed schema, eliminating the need for migration scripts.
You Need High Write Throughput
NoSQL databases skip schema validation on writes and often use append-only storage models, making writes significantly faster. Cassandra's architecture—writes go to a commit log and memtable before flushing to disk—enables sustained write throughput that relational databases struggle to match at scale.
Real-world example: Gojek migrated from PostgreSQL to Cassandra to handle 100 million transactions per month. The write-heavy workload exceeded what PostgreSQL could sustain with acceptable latency.
The Four Types of NoSQL Databases
Not all NoSQL databases are the same.
Knowing the four types and when to use each is a significant interview differentiator.
| Type | How It Stores Data | Best For | Examples |
|---|---|---|---|
| Key-Value | Simple key → value pairs | Session storage, caching, URL shorteners, user preferences | DynamoDB, Redis, Riak |
| Document | JSON/BSON documents with nested fields | Content management, user profiles, catalogs with varying attributes | MongoDB, Firestore, CouchDB |
| Wide-Column | Rows with dynamic columns grouped by column families | Time-series data, IoT telemetry, activity logs, messaging | Cassandra, HBase, Bigtable |
| Graph | Nodes and edges representing entities and relationships | Social networks, recommendation engines, fraud detection | Neo4j, Amazon Neptune |
Interview tip: When you say "I would use NoSQL," specify which type. "I would use a wide-column store like Cassandra" is far stronger than "I would use NoSQL."
The Decision Framework: A 5-Question Checklist
Use this checklist during interviews to justify your database choice systematically.
1. Do I need ACID transactions? Yes → SQL. Payment processing, banking, order management. No → Continue to question 2.
2. Do I need complex joins or ad-hoc queries? Yes → SQL. Dashboards, analytics, admin panels, reports. No → Continue to question 3.
3. Is my access pattern simple (single-key or narrow range)? Yes → Key-value or wide-column NoSQL. URL lookups, session storage, message retrieval by conversation_id. No → Consider document or graph NoSQL.
4. Do I need to scale beyond what a single machine (or managed cluster) can handle? Yes → NoSQL is the natural fit for horizontal scaling. No → SQL with read replicas likely suffices.
5. Does my schema change frequently? Yes → Document NoSQL offers flexibility. No → SQL's fixed schema provides data integrity benefits.
This checklist is not absolute—real systems are nuanced—but it provides a structured way to narrate your decision in an interview.
For structured practice applying this decision framework across 30+ system design problems, Grokking the System Design Interview includes database selection rationale in every solution. For deeper coverage of distributed database architectures—multi-region replication, tunable consistency, and sharding strategies—the system design interview guide covers the advanced trade-offs interviewers expect at senior levels.
Using Both: The Polyglot Persistence Pattern
The most mature interview answer is often: "I would use both." Real production systems frequently use multiple databases, each chosen for a specific workload. This is called polyglot persistence.
Netflix uses MySQL for billing and user account data (needs ACID and relational integrity) and Cassandra for content metadata, viewing history, and personalization data (needs massive write throughput and horizontal scale).
Uber uses PostgreSQL for payment processing (needs strong transactions) and uses Cassandra and MySQL for trip data and marketplace systems across their infrastructure.
Twitter uses MySQL for user accounts and social graph (relational data with complex queries) and Manhattan (a custom key-value store) for tweet storage (simple access pattern at massive write volume).
Interview application: "For this e-commerce system, I would use PostgreSQL for the order and payment services because they require ACID transactions—an order placement that debits payment and decrements inventory must be atomic. For the product catalog and search, I would use MongoDB because product attributes vary by category (electronics have specs, clothing has sizes) and the schema needs to be flexible. For session storage, I would use Redis because it provides sub-millisecond reads for frequently accessed session data."
This multi-database answer demonstrates that you think about database selection per workload, not as a one-size-fits-all decision.
Interview Follow-Up Questions
"Why not just use PostgreSQL for everything?"
"PostgreSQL could technically serve most workloads at moderate scale, and I would start with PostgreSQL for simplicity. The limitation appears at very high write throughput—beyond 50,000 writes per second, PostgreSQL requires complex sharding setups that are operationally expensive. For the message storage component, which handles billions of writes per day, Cassandra's native horizontal scaling is a better fit. I am choosing operational simplicity for each workload over architectural uniformity."
"Isn't eventual consistency a problem?"
"It depends on the use case. For the chat message store, eventual consistency means a message might appear on a second device with a 100ms delay. That is imperceptible to users. For the payment ledger, eventual consistency could cause a double-charge, which is unacceptable—that is why I use PostgreSQL with serializable transactions for payments."
"How would you handle the need for analytics on NoSQL data?"
"I would set up a change data capture (CDC) pipeline from Cassandra to a data warehouse like Redshift or BigQuery. Operational writes go to Cassandra for speed; analytical queries run against the warehouse. This separates the write-optimized path from the read-optimized analytics path—a common pattern called CQRS."
For advanced database architecture patterns including CQRS, multi-region replication, and tunable consistency models, Grokking the Advanced System Design Interview covers these production-scale decisions in detail.
Frequently Asked Questions
When should I use SQL vs NoSQL in a system design interview?
Use SQL when you need ACID transactions, complex joins, or strict data integrity. Use NoSQL when you need horizontal scalability, high write throughput, or simple key-value access patterns. Start by identifying the access pattern and consistency requirements, then choose the database that matches.
Can SQL databases scale horizontally?
Yes, but with more effort. Techniques like read replicas, sharding, and partitioning enable horizontal scaling for SQL. Facebook scales MySQL to petabytes. However, NoSQL databases like Cassandra and DynamoDB are architecturally designed for horizontal scaling, making it operationally simpler at massive scale.
What is the difference between ACID and BASE?
ACID (Atomicity, Consistency, Isolation, Durability) guarantees that transactions are reliable and complete. BASE (Basically Available, Soft-state, Eventually consistent) relaxes these guarantees in exchange for higher availability and scalability. SQL databases follow ACID. Most NoSQL databases follow BASE, though some (like MongoDB) offer optional ACID transactions.
Should I always choose NoSQL for large-scale systems?
No. Scale alone does not determine the database choice. The access pattern matters more. A system with 100,000 complex analytical queries per second is better served by SQL with read replicas than by NoSQL, which would require application-level joins. Match the database to the workload, not to a generic notion of scale.
What are the four types of NoSQL databases?
Key-value stores (DynamoDB, Redis) for simple lookups. Document stores (MongoDB, Firestore) for flexible schemas. Wide-column stores (Cassandra, Bigtable) for time-series and write-heavy workloads. Graph databases (Neo4j, Neptune) for relationship-heavy data like social networks.
Can I use both SQL and NoSQL in one system?
Yes, and production systems routinely do. This is called polyglot persistence. Netflix uses MySQL for billing and Cassandra for content metadata. Use each database for the workload it handles best rather than forcing one database to serve all access patterns.
How do I handle joins with NoSQL databases?
NoSQL databases do not support native joins. You have three options: denormalize data (store redundant copies to avoid joins), perform application-level joins (query multiple collections and merge in code), or use a separate analytical store (sync data to a SQL-based data warehouse for complex queries).
Is MongoDB SQL or NoSQL?
MongoDB is a NoSQL document database. It stores data as JSON-like documents (BSON) with flexible schemas. Since version 4.0, MongoDB supports multi-document ACID transactions, blurring the traditional SQL/NoSQL boundary. However, its architecture is fundamentally document-oriented, not relational.
What database should I use for a chat system?
Use a wide-column store like Cassandra for message storage (write-heavy, simple access pattern: get messages by conversation_id sorted by timestamp). Use Redis for user presence tracking (sub-millisecond reads). Use PostgreSQL for user accounts and authentication (needs ACID). This multi-database approach matches each component to its optimal storage engine.
How do I discuss database choice in an interview without sounding rehearsed?
Tie every choice to the specific requirements you clarified. Say "Given our 10:1 read-to-write ratio and the need for complex product queries with joins, PostgreSQL is the right fit here" rather than "SQL is good for relational data." Specificity sounds like engineering experience; generality sounds like textbook recitation.
TL;DR
SQL vs NoSQL is the most common database trade-off in system design interviews.
SQL databases (PostgreSQL, MySQL, Aurora) provide ACID transactions, complex joins, and data integrity—choose them for payment systems, order management, and relational data.
NoSQL databases (DynamoDB, Cassandra, MongoDB, Redis) provide horizontal scalability, high write throughput, and flexible schemas—choose them for simple access patterns at massive scale, write-heavy workloads, and evolving data models.
The strongest interview answer often uses both: SQL for transactional components, NoSQL for high-throughput components, and a data pipeline connecting them for analytics. Always start with the workload and access pattern, not the technology name.
GET YOUR FREE
Coding Questions Catalog

$197

$72

$78