On this page

What is a NoSQL database?

Why NoSQL became a thing

The four main types of NoSQL databases

  1. Key-value stores
  1. Document databases
  1. Wide-column (column-family) databases
  1. Graph databases

A quick note on other NoSQL types

When to choose NoSQL over SQL — a decision framework

The trade-offs you're accepting

How NoSQL connects to the CAP theorem

Real interview answers using NoSQL

Common follow-up questions to expect

Putting it all together

Keep learning

NoSQL Databases in System Design: When to Use Them and Why

Image
Arslan Ahmad
When should you choose NoSQL over SQL? Complete guide to NoSQL types (key-value, document, column, graph), use cases, trade-offs, and system design interview answers.
Image

What is a NoSQL database?

Why NoSQL became a thing

The four main types of NoSQL databases

  1. Key-value stores
  1. Document databases
  1. Wide-column (column-family) databases
  1. Graph databases

A quick note on other NoSQL types

When to choose NoSQL over SQL — a decision framework

The trade-offs you're accepting

How NoSQL connects to the CAP theorem

Real interview answers using NoSQL

Common follow-up questions to expect

Putting it all together

Keep learning

If you've ever sat in a system design interview and the interviewer casually asked, "What database would you use here?" — you know the panic. You start listing names (MongoDB! Cassandra! Redis!), hoping one of them sounds right. But sounding right isn't the same as being right.

Here's the truth: senior engineers don't memorize databases. They memorize when to use each type. Once you understand the trade-offs, the specific product becomes a detail you can look up.

This guide walks you through the NoSQL landscape the way I wish someone had walked me through it when I was preparing for my first FAANG interview. By the end, you'll know:

  • What NoSQL actually means (hint: it's not "no SQL allowed")
  • The four main types of NoSQL databases and when each one shines
  • The trade-offs you're agreeing to when you choose NoSQL over SQL
  • How to explain your database choice in a system design interview without getting caught out

Let's get into it.

What is a NoSQL database?

"NoSQL" is a little misleading as a name. It doesn't mean "no SQL" — it means "not only SQL." The category covers any database that doesn't use the traditional relational table structure (rows and columns tied together by foreign keys).

Instead of forcing your data into rigid tables, NoSQL databases store data in formats that match how applications actually use it: documents that look like the JSON your API returns, key-value pairs that look like a dictionary in code, columns grouped by how you read them, or graphs that mirror real-world relationships.

The big shift in thinking is this: with SQL, you design your schema first and shape your application around it. With NoSQL, you design around your access patterns first, and the "schema" follows.

That's it. Everything else — the scalability, the flexibility, the performance — flows from that one design principle.

Why NoSQL became a thing

NoSQL wasn't invented because SQL was bad. SQL is still the right choice for most applications — think banking, e-commerce orders, anything where you need strong guarantees that data is consistent across tables.

NoSQL became popular for three reasons, all tied to the scale the internet reached in the 2000s:

1. Horizontal scalability. Traditional SQL databases scale by buying a bigger machine (vertical scaling). At some point, there's no bigger machine. NoSQL databases were designed from day one to scale horizontally — add more cheap servers, and capacity grows linearly. If you want to go deeper on this specific problem, I wrote about it in scaling SQL databases horizontally.

2. Schema flexibility. When you're shipping a product and iterating fast, changing a SQL schema is painful — migrations, downtime, locked tables. NoSQL databases let you add fields to documents without touching the rest of the data. Move fast, break fewer things.

3. Access pattern optimization. A relational database is a generalist. It tries to be good at everything. NoSQL databases are specialists — a document database is really good at retrieving whole documents by ID, a graph database is really good at traversing relationships, a column database is really good at scanning millions of rows for analytics. When your app has one dominant access pattern, a specialist wins.

Here's the catch, and this is where interview candidates often get tripped up: you pay for these benefits with weaker consistency guarantees. We'll get to that in the trade-offs section.

The four main types of NoSQL databases

Most of the NoSQL universe falls into four categories. If you can explain these four types and give one real-world example for each, you've covered 90% of what a system design interviewer will ask.

1. Key-value stores

Think of a key-value store as a giant distributed hash map. You give it a key, it gives you back a value. That's the whole API.

The value can be anything — a string, a JSON blob, a binary object — but the database treats it as an opaque bag of bytes. It doesn't index the contents or let you query by anything other than the key.

Why this is powerful: It's the fastest possible read. If you know the key, the database can find the value in O(1). At scale, you can't beat that.

When to use it:

  • Session management — user logged in? Store session_id → user_data. Every page load looks it up in milliseconds.
  • Shopping cartsuser_id → cart_items. The cart is always retrieved whole; no need for complex queries.
  • User preferencesuser_id → preferences_json.
  • Distributed caching — the classic use case. Cache a database query result or an API response by some hash of the inputs.

Examples: Amazon DynamoDB, Redis, Memcached, Riak.

One important split inside this category: in-memory key-value stores like Redis and Memcached keep everything in RAM, which is blazingly fast but loses data if the server restarts (unless you configure persistence). Disk-backed ones like DynamoDB are slightly slower but durable by default. Pick based on whether you can tolerate cache misses.

The interview framing: "I'd use a key-value store here because every read is by the session ID, I never query by anything else, and I need sub-millisecond latency at the 99th percentile. DynamoDB would give me that with automatic replication and no ops overhead."

2. Document databases

A document database stores data as documents — usually JSON or BSON (a binary version of JSON). Each document can have its own structure, and you can query by any field inside it (not just the key).

This is the NoSQL type that feels most natural to web developers, because the documents look exactly like the objects you're already passing around in your code.

Why this is powerful: Flexibility. You don't have to declare your schema upfront. If product requirements change and you need a new field, you just start writing it — old documents without that field are still valid.

When to use it:

  • Product catalogs — every product has different attributes (a book has an author, a phone has battery life, a shirt has a size), so a rigid SQL schema is painful. Documents handle the variation naturally.
  • User profiles — the canonical "users table" becomes impossible once profile fields vary by account type, region, or product.
  • Content management systems — blog posts, articles, comments. Each has different fields.
  • Event logging and audit trails where the shape of the event varies.

Examples: MongoDB, Amazon DocumentDB, CouchDB, Firebase Firestore.

What to watch for in interviews: Document databases are great until you need to join two kinds of documents together. They can do it, but they're not optimized for it, and query performance suffers. If your data has lots of relationships, consider the graph database section below instead.

3. Wide-column (column-family) databases

This is the category that trips people up the most, because the name is confusing. Wide-column databases look like SQL tables from a distance — rows and columns — but they behave very differently under the hood.

Here's the key insight: wide-column databases store data grouped by column, not by row. When you query, the database can pull just the columns you need without loading the rest of the row. And — critically — different rows can have different columns. One row might have 5 columns filled in, another might have 500.

Why this is powerful: Two reasons. First, column-oriented storage is fast for analytical queries that scan millions of rows but only need a few fields. Second, the wide-row flexibility makes this great for time-series data, where each row (e.g., a user) has a growing list of events as columns.

When to use it:

  • Time-series data — sensor readings, financial tick data, application metrics. Each row is an entity; each column is a timestamped event.
  • Analytics pipelines — "what's the average purchase amount by country over the last 30 days?" scans a lot of data but touches few columns.
  • Recommendation engines — storing user-item interaction matrices.
  • Messaging at scale — this is why Cassandra is famous for being behind Discord, Netflix's metadata, and Instagram's inbox.

Examples: Apache Cassandra, HBase, Google Bigtable, ScyllaDB, Amazon Keyspaces.

The trade-off: these databases give you blazing-fast writes and scans, but they're bad at ad-hoc queries that filter on non-key fields. You have to design your queries upfront and model the data to match.

4. Graph databases

Graph databases store data as nodes (entities) connected by edges (relationships). You can traverse from one node to another by following edges, and the database is optimized to do this traversal really, really fast.

Why this is powerful: In SQL, finding "friends of friends of Alice" requires joining a users table to itself multiple times, which gets slow fast. In a graph database, it's a single query that follows edges. For relationship-heavy data, graphs win by an order of magnitude.

When to use it:

  • Social networks — friends, followers, mutual connections.
  • Recommendation engines — "users who bought X also bought Y" via collaborative filtering.
  • Fraud detection — "is this transaction connected to known fraudulent accounts through any chain of intermediaries?"
  • Knowledge graphs — connecting concepts, entities, and facts (this is how Google's knowledge panels work).
  • Network and IT operations — modeling dependencies between services, servers, and data flows.

Examples: Neo4j, Amazon Neptune, ArangoDB, JanusGraph.

In interviews, graph databases are almost always the right answer for social features, recommendation flows, and fraud detection. If an interviewer asks "how would you store the friend graph for a social network?" and you say "I'd use Neo4j because friend traversal is a native operation instead of repeated joins" — you've just signaled senior-level thinking.

A quick note on other NoSQL types

You'll occasionally hear about other categories: time-series databases (InfluxDB, Prometheus, TimescaleDB) that are specialized for timestamped data, and ledger databases (Amazon QLDB) that provide immutable, cryptographically verifiable logs. These are narrower specializations — time-series databases are basically wide-column databases with time-based optimizations, and ledger databases are document stores with an append-only contract. You probably won't need to reach for them in a generic system design interview, but knowing they exist is worth a few points.

When to choose NoSQL over SQL — a decision framework

In interviews, the wrong answer is "NoSQL, because scale." The right answer walks through a few questions out loud.

Ask yourself these, in order:

1. Do I need strict ACID transactions across multiple rows? If yes — banking, inventory, orders — stick with SQL. Most NoSQL databases give you atomic writes on a single document or row, but multi-document transactions are either limited or slow. For the deep dive on why this matters, see ACID and database transactions.

2. Is my schema stable, or does it evolve constantly? If the schema is stable and relationships are clear (users have orders, orders have line items), SQL's JOINs and referential integrity are a gift, not a burden. If the schema changes weekly because product is still figuring out what fields to collect, a document database saves you painful migrations.

3. What's my dominant access pattern? Write this down before picking a database. "I read by user ID 95% of the time and occasionally scan all users in a region" points to a key-value store with a secondary index. "I need to find all products matching 8 different filter criteria" points to SQL. "I need to find shortest paths between users" points to a graph database.

4. What scale am I realistically designing for? Be honest. A SQL database on a modest server can handle 10,000 requests per second. That's more than most apps will ever need. Don't reach for NoSQL "in case we grow to Twitter scale" — design for today + 10x, not today + 1000x.

5. Can I tolerate eventual consistency? NoSQL databases that scale horizontally almost always give you eventual consistency — meaning a write to one node might not be visible on another node for a few milliseconds. For a like counter, that's fine. For a bank balance, it's a lawsuit. Read eventual vs strong consistency if this concept is fuzzy.

In an interview, verbalize this framework. An interviewer would much rather hear you think through access patterns than hear you pattern-match "large scale → Cassandra."

Image

The trade-offs you're accepting

Every choice in system design costs something. NoSQL is no exception. Here's what you give up when you pick NoSQL over SQL:

No joins (or very limited joins). In SQL, joining orders to users is trivial. In most NoSQL databases, you either denormalize (store the user data inside each order document) or do the join in application code. Denormalization means data can go out of sync; application-level joins add latency. Neither is free.

Weaker consistency guarantees. Most NoSQL databases at scale offer eventual consistency by default. Different nodes may disagree for a brief window after a write. Many support strong consistency as an opt-in, but at the cost of latency. The CAP theorem explains the fundamental reason: in a distributed system, when the network fails, you have to pick between consistency and availability. NoSQL databases typically pick availability.

Less mature tooling. SQL has 50 years of tooling — ORMs, migration frameworks, query optimizers, monitoring. NoSQL tooling has gotten much better, but it's still patchier. Expect to write more custom code.

Application-level integrity checks. SQL enforces referential integrity for you (you can't insert an order for a user that doesn't exist). In NoSQL, you enforce that in application code, which means bugs become data corruption.

Each NoSQL database is its own thing. SQL is a standard. MongoDB, Cassandra, and DynamoDB have wildly different APIs, data models, and operational characteristics. Switching between them is a rewrite, not a config change.

None of these are dealbreakers if you chose NoSQL for the right reasons. They're dealbreakers if you chose NoSQL because it was trendy.

How NoSQL connects to the CAP theorem

This comes up in senior interviews a lot. Every database sits somewhere on the CAP triangle, and understanding where tells you a lot about its behavior.

CP systems (consistency + partition tolerance): These refuse writes when they can't guarantee consistency across nodes. MongoDB (in its default config) and HBase lean this way. Good when wrong data is worse than no data.

AP systems (availability + partition tolerance): These keep accepting writes even during network partitions and reconcile differences later. Cassandra, DynamoDB, and Riak lean this way. Good when availability matters more than perfect consistency (think: a social feed that can afford to show slightly stale likes).

CA systems (consistency + availability, but not partition tolerance): These only exist in theory for distributed systems — you can't really build a distributed database that isn't partition tolerant, because partitions happen whether you plan for them or not. SQL databases on a single node approximate CA, but they lose availability when that node fails.

If an interviewer asks "why did you choose DynamoDB?" and you say "because for this use case, I need the system to keep taking writes during a regional network issue even if it means the read-your-writes guarantee is briefly violated" — that's a senior answer.

Real interview answers using NoSQL

Let me show you how to actually say these things in a system design interview. Here are four canonical interview questions and the NoSQL reasoning I'd use for each.

Q: "Design a social media news feed."

"The feed is a write-heavy, eventually-consistent workload. I'd use Cassandra for the main feed data because I can partition by user ID, get linear scalability, and tolerate eventual consistency — if a new post takes 200ms to appear on a follower's feed, no one notices. I'd also use Redis as a cache for the top of each active user's feed, because hot users will be read tens of thousands of times per second. I wouldn't use SQL here — the access pattern is simple (get posts for user X), there are no complex joins, and the scale is too high for a relational database to handle gracefully."

Q: "Design a ride-sharing service like Uber."

"For the driver location data — updated every few seconds by millions of drivers — I'd use a key-value store with geospatial indexing like Redis with its geo commands, or DynamoDB with geohashes. Writes are dominant and the reads are always 'find drivers near this point.' For the trip and payment records, though, I'd switch to SQL — payments need ACID transactions, and the data volume is much smaller."

Q: "Design a product catalog for an e-commerce site."

"Product catalogs are the classic document database use case. Different product categories have wildly different attributes — a book has a page count and an ISBN, a phone has battery capacity and screen size — and a SQL schema that accommodates all of that gets ugly fast. MongoDB lets each product document have exactly the fields that product type needs. I'd index on category and a few common filters, and use a separate search service like Elasticsearch for full-text search."

Q: "Design fraud detection for a payments platform."

"Fraud detection is inherently about relationships — 'is this account connected to any known fraudulent account through any chain of transactions?' That's a graph traversal problem, and SQL's recursive CTEs don't scale for this. I'd use Neo4j to store account-transaction-account relationships. When a new transaction comes in, a single graph query can check whether any known bad actor is within N hops."

Notice the pattern: in each answer, I name the access pattern first, explain why SQL struggles with it, and pick the specific NoSQL type that fits. That structure is more impressive to an interviewer than the specific database choice.

Common follow-up questions to expect

Once you've picked a NoSQL database in an interview, the follow-ups tend to land on the same handful of areas. Be ready for these:

  • "How would you handle a hot partition in your key-value store?" (Answer: composite keys, write sharding, backoff/retry logic.)
  • "How does your database handle a node failure?" (Answer: replication factor, quorum reads/writes, failover.)
  • "How do you shard this data?" (Answer: pick a partition key with good cardinality and even distribution; see database sharding.)
  • "What happens if two clients write to the same key simultaneously?" (Answer: last-write-wins vs vector clocks vs conflict-free replicated data types depending on the database.)
  • "How do you query data you didn't design the key for?" (Answer: secondary indexes, denormalization, or a separate search service.)
  • "When would you add a cache in front of this database?" (Answer: when reads dominate, when the same keys are hit repeatedly, or when the database latency is the system bottleneck.)

If you can answer those six without hesitating, you'll ace most of the database portion of your interview.

Putting it all together

Here's the one-sentence version of this whole article: NoSQL isn't a better database — it's a different set of trade-offs, and the job of a system designer is to match the trade-offs to the problem.

When you walk into an interview, you don't need to have memorized every feature of every database. You need to be able to ask the right questions about the access pattern, explain the trade-offs clearly, and pick a specific database that matches. If you can do that, the interviewer will trust you with the bigger design questions that follow.

And honestly? Most senior engineers in the field operate exactly like this too. The difference between a junior and a senior engineer isn't that the senior memorized more database docs — it's that the senior knows how to ask "what am I optimizing for?" before reaching for a tool.

Good luck with your interviews. You've got this.

Keep learning

If you want to go deeper on the database topics this post touched:

For the full system design interview roadmap, start with my complete system design interview guide.

System Design Fundamentals
System Design Interview
NoSQL
SQL vs NoSQL

What our users say

ABHISHEK GUPTA

My offer from the top tech company would not have been possible without this course. Many thanks!!

Eric

I've completed my first pass of "grokking the System Design Interview" and I can say this was an excellent use of money and time. I've grown as a developer and now know the secrets of how to build these really giant internet systems.

Vivien Ruska

Hey, I wasn't looking for interview materials but in general I wanted to learn about system design, and I bumped into 'Grokking the System Design Interview' on designgurus.io - it also walks you through popular apps like Instagram, Twitter, etc.👌

More From Designgurus
Substack logo

Designgurus on Substack

Deep dives, systems design teardowns, and interview tactics delivered daily.

Read on Substack
Annual Subscription
Get instant access to all current and upcoming courses for one year.

Access to 50+ courses

New content added monthly

Certificate of completion

$29.08

/month

Billed Annually

Recommended Course
Grokking the System Design Interview

Grokking the System Design Interview

169,039+ students

4.7

Grokking the System Design Interview is a comprehensive course for system design interview. It provides a step-by-step guide to answering system design questions.

View Course
Join our Newsletter

Get the latest system design articles and interview tips delivered to your inbox.

Read More

Mastering Estimation in System Design Interviews

Arslan Ahmad

Arslan Ahmad

FAANG Interviews in 2025: What Changed, What to Study, and How to Win

Arslan Ahmad

Arslan Ahmad

50 Advanced System Design Interview Questions to Prepare

Arslan Ahmad

Arslan Ahmad

How To Clear System Design Interview: A Quick Guide

Arslan Ahmad

Arslan Ahmad

Image
One-Stop Portal For Tech Interviews.
Copyright © 2026 Design Gurus, LLC. All rights reserved.