Grokking System Design Fundamentals
Ask Author
Back to course home

0% completed

Vote For New Content
NoSQL Databases
Table of Contents

Contents are not accessible

Contents are not accessible

Contents are not accessible

Contents are not accessible

Contents are not accessible

NoSQL databases, also known as "Not Only SQL" databases, are a diverse group of non-relational databases designed to address the limitations of traditional SQL databases, particularly in terms of scalability, flexibility, and performance under specific workloads. NoSQL databases do not adhere to the relational model and typically do not use SQL as their primary query language. Instead, they employ various data models and query languages, depending on the specific type of NoSQL database being used.

The key characteristics of NoSQL databases include their schema-less design, which allows for greater flexibility in handling data; horizontal scalability, which makes it easier to distribute data across multiple servers; and their ability to perform well under specific workloads, such as high write loads or large-scale data storage and retrieval.

Types of NoSQL Databases

NoSQL databases can be broadly categorized into the following six types, each with its unique data model and use cases:

1. Key-value databases

Key-value databases store data as key-value pairs, where the key is a unique identifier and the value is the associated data. These databases excel in scenarios requiring high write and read performance for simple data models, such as session management and real-time analytics.

Use cases: Session management, user preferences, and product recommendations.

Examples: Amazon DynamoDB (in the cloud), Azure Cosmos DB, Redis.

2. In-memory key-value databases

The data is primarily stored in memory, unlike disk-based databases. By eliminating disk access, these databases enable minimal response times. Because all data is stored in main memory, in-memory databases risk losing data upon a process or server failure. In-memory databases can persist data on disks by storing each operation in a log or by taking snapshots.

Examples: Redis, Memcached, Amazon Elasticache.

Types of NoSQL databases
Types of NoSQL databases

3. Document databases

Document databases are structured similarly to key-value databases except that keys and values are stored in documents written in a markup language like JSON, BSON, XML, or YAML. Each document can contain nested fields, arrays, and other complex data structures, providing a high degree of flexibility in representing hierarchical and related data.

Use cases: User profiles, product catalogs, and content management.

Examples: MongoDB, Amazon DocumentDB, CouchDB.

4. Wide-column databases

Wide column databases are based on tables but without a strict column format. Rows do not need a value in every column, and segments of rows and columns containing different data formats can be combined.

Use cases: Telemetry, analytics data, messaging, and time-series data.

Examples: Cassandra, Accumulo, Azure Table Storage, HBase.

5. Graph databases

Graph databases map the relationships between data using nodes and edges. Nodes are the individual data values, and edges are the relationships between those values.

Use cases: Social graphs, recommendation engines, and fraud detection.

Examples: Neo4j, Amazon Neptune, Cosmos DB through Azure Gremlin.

6. Time series databases

These databases store data in time-ordered streams. Data is not sorted by value or id but by the time of collection, ingestion, or other timestamps included in the metadata.

Use cases: Industrial telemetry, DevOps, and Internet of Things (IOT) applications.

Examples: Graphite, Prometheus, Amazon Timestream.

Here are some well-known NoSQL databases:

  • MongoDB: A document-oriented database that uses the BSON format for data storage and supports horizontal scaling through sharding.

  • Redis: An in-memory, key-value store that supports various data structures and offers fast performance for caching, message queues, and real-time analytics.

  • Apache Cassandra: A highly scalable, distributed wide-column store that provides high availability and fault tolerance, designed for handling large-scale data across many commodity servers.

  • Neo4j: A graph database that offers powerful query capabilities for traversing complex relationships and analyzing connected data.

Pros and cons of using NoSQL databases

  1. Flexibility and schema-less design: One of the primary advantages of NoSQL databases is their schema-less design, which allows for greater flexibility in handling diverse and dynamic data models. This makes it easier to adapt to changing requirements and accommodate new data types without the need for extensive schema modifications, as is often the case with SQL databases.

  2. Horizontal scalability: NoSQL databases are designed to scale horizontally, enabling the distribution of data across multiple servers, often with built-in support for data replication, sharding, and partitioning. This makes NoSQL databases well-suited for large-scale applications with high write loads or massive amounts of data, where traditional SQL databases may struggle to maintain performance and consistency.

  3. Performance under specific workloads: NoSQL databases can offer superior performance under specific workloads, such as high write loads, large-scale data storage and retrieval, and complex relationships. By choosing a NoSQL database tailored to the needs of a particular application, developers can optimize performance and resource utilization while maintaining an appropriate level of data consistency and reliability.

  4. CAP theorem and trade-offs: The CAP theorem states that a distributed data store can provide only two of the following three guarantees: Consistency, Availability, and Partition Tolerance. NoSQL databases often prioritize Availability and Partition Tolerance over Consistency, resulting in a trade-off known as “eventual consistency.” While this may be acceptable in some applications, it can lead to challenges in maintaining data integrity and reconciling conflicting updates in scenarios where strong consistency is required.

  5. Query complexity and expressiveness: While some NoSQL databases offer powerful query languages and capabilities, they may not be as expressive or versatile as SQL when it comes to complex data manipulation and analysis. This can be a limiting factor in applications that require sophisticated querying, joining, or aggregation of data. Additionally, developers may need to learn multiple query languages and paradigms when working with different types of NoSQL databases.

When to Use NoSQL (and When Not To)

Now the big question: given these pros and cons, when is a NoSQL database a good fit for your project, and when should you stick (or switch back) to SQL? There is no one-size-fits-all answer, but we can outline common scenarios for each.

Scenarios Where NoSQL Excels (Good Fits)

You should consider a NoSQL database (or a specific type of NoSQL) when your application or use-case has these characteristics:

  • Huge Scale of Data or Traffic: If you expect to handle very large volumes of data (think terabytes to petabytes) or sustain very high read/write throughput (thousands or millions of operations per second), NoSQL databases are generally better equipped for horizontal scaling. For example, if you’re building the next Facebook or a global IoT sensor network, a single SQL server likely won’t cut it. NoSQL systems like Cassandra, DynamoDB, or MongoDB Atlas can partition data across many servers and keep performance up. In fact, many big companies (Amazon, Google, Netflix, Facebook) rely on NoSQL for their core data because of the sheer scale. If you anticipate “web scale” growth, NoSQL is a strong contender.

  • Flexible or Evolving Schema Needs: If your data model is likely to evolve rapidly or each entity is slightly different (sparse attributes), a document store or other schema-less NoSQL can save you a lot of headache. Startups and agile teams often don’t want to lock down a schema on day one. NoSQL lets you adjust as you learn more about your data. Also, if you are dealing with unstructured data (text, logs, JSON from external sources) that doesn’t fit neatly into columns, a NoSQL store will accommodate it easily. For example, storing user-generated content or JSON configurations – it’s more natural to put those in a document DB or key-value store without forcing structure.

  • Data is Naturally a Collection of Documents or Objects: If your data can be thought of as self-contained documents (like catalog entries, user profiles, blog posts, etc.), a document database may align better with your use case. You’ll be able to retrieve or update a whole document in one operation, which often corresponds nicely to how your application works (e.g., fetch the whole profile in one go). Similarly, if your data is a graph of relationships or a series of time-stamped events, a graph DB or wide-column DB (respectively) might be the intuitive choice because they model the domain more closely.

  • High Availability and Geo-Distribution are Priorities: When you need an always-on, globally distributed system, NoSQL designs (which often replicate data to multiple nodes and even multiple data centers) shine. For instance, if you want a database that can stay up even if one data center goes down, some NoSQL databases can replicate across data centers with eventual consistency. Or if you need low latency for users on different continents, a multi-region distributed NoSQL store might serve reads from the nearest region. Some SQL databases can do multi-region, but typically with more complexity or cost. NoSQL often makes this a core feature (e.g., CouchDB replication, Cosmos DB offering global distribution with tunable consistency).

  • Cache or Real-Time Analytics Layer: NoSQL is a great choice for caching and real-time analytics due to speed. Using an in-memory NoSQL like Redis as a cache in front of a relational DB is a common pattern (best of both worlds). For real-time analytics on streaming data (click streams, logs), a combination of a wide-column store (like HBase) or a document store can ingest at high rates. Some modern “NewSQL” or streaming systems blur this, but if you’re piecing your architecture, NoSQL components are well-suited to these parts.

  • Specific use cases that map to NoSQL types:

    • If you need full-text search or complex text queries, a search engine like Elasticsearch (often categorized with NoSQL stores) might be used – it’s a specialized document store optimized for text querying.
    • If you need to store sessions, user cache, feature flags, etc., a key-value store (Redis, etc.) is typically a good fit.
    • If you’re implementing something like an event-sourced system or logging where you append lots of events, a wide-column or document store can keep these events and let you query by key (like by user or aggregate id).
    • Graph-based problem: If your main problem is graph-like (social network, recommendation with relationships, network graph), a graph DB will likely simplify your life compared to contorting a SQL DB for the same.

In short, use NoSQL when it solves a problem that is hard to solve with SQL. A good litmus test: Ask “what requirement do I have that a relational DB cannot easily fulfill?”. If you have a clear answer (like “I need to handle 100k writes a second on a flexible schema across 5 regions”), that points toward NoSQL. NoSQL is often the right tool for big, messy, or fast-changing data needs.

Scenarios Where NoSQL May Not Be the Best Choice (When to Avoid or Be Cautious)

You might avoid using NoSQL or at least not replace your SQL database in scenarios like these:

  • Strong Need for Multi-Object Transactions and Consistency: If your application absolutely requires consistent, transactional updates to multiple pieces of data, a traditional RDBMS is still the gold standard. For example, in financial systems (banking, accounting) where accuracy and consistency are paramount, the ACID transactions of SQL databases prevent errors (e.g., not losing money between accounts during a transfer). Can it be done with NoSQL? Sometimes yes, but it’s more complex and risky. If each step of a business operation must commit or roll back all-or-nothing, SQL is often simpler. NoSQL with eventual consistency could allow states that violate business rules temporarily. So, for things like inventory management, banking, or any system where you cannot tolerate anomaly, SQL might be safer unless the NoSQL database explicitly supports the needed level of ACID and you trust it.

  • Relational Data with Complex Queries: If your data is inherently relational and you need to do a lot of JOINs or aggregations across different entities, a relational database might serve you better. Think of analytics dashboards that join customers, orders, products, etc., or an app that frequently does complex filters and grouping on data. SQL databases and their query optimizers are very good at these tasks. NoSQL solutions often require workarounds like doing multiple queries and merging in code, or maintaining redundant data specifically for certain queries. If you find yourself needing ad-hoc reporting or the flexibility to ask new questions of the data via queries, SQL is extremely powerful. NoSQL might feel limiting in such exploratory or highly relational querying scenarios. For example, an e-commerce team using a SQL data warehouse can write arbitrary SQL to get insights. If the data were in a pure NoSQL store, they might have to export it to analyze it.

  • Small or Medium Scale Applications (when relational is enough): If your application is not hitting scaling limits, a single (or replicated) relational database is often simpler and perfectly adequate. There’s a famous saying in engineering: “You are not Google.” It means that the scale problems Google or Amazon face are extreme edge cases – 99% of applications can run fine on a well-tuned relational database. If you have, say, a few million records and moderate traffic, Postgres or MySQL can handle that on a single server or a simple cluster. Introducing NoSQL might add unnecessary complexity. Many projects prematurely adopt NoSQL out of hype and then struggle with its constraints, when an SQL database would have been easier and just as effective for their size. So if you’re dealing with data that comfortably fits in one machine’s RAM/disk and load that one machine can handle, the need for NoSQL’s horizontal scaling may not justify the cost of its limitations.

  • Need for Ad-hoc Analytics or BI: Organizations that rely heavily on ad-hoc queries, data analysis, and use tools like Tableau or PowerBI usually prefer SQL databases or data warehouses. While some NoSQL (like MongoDB) provide connectors for BI tools (by presenting a SQL interface or exporting data), it can be kludgy. If your use case involves a lot of reporting, joins across multiple data sets, and complex calculations on the fly, a relational setup (or a hybrid approach where NoSQL data is periodically copied to a relational store for analysis) might be better. NoSQL is not a replacement for analytical databases (OLAP systems) in many cases – those often still use columnar SQL databases (like Snowflake, etc.). So consider what you’ll do with the data. If analysis is key, you might avoid going full NoSQL.

  • Tight Consistency or Validation Requirements: If your data model benefits from the enforcement of schema and constraints (like foreign keys, uniqueness, data type enforcement), SQL is ideal because it won’t let bad data in easily. NoSQL, being schema-less, might allow all sorts of inconsistent data unless your application checks everything. For example, in a SQL DB, you can declare that every order must have a valid customer_id that references an existing customer, and the database will enforce that. In a NoSQL, you have to implement such checks in your code and maybe backfill when inconsistencies inevitably occur. If the risk of inconsistent or invalid data is a big concern (say, in healthcare or finance records), the guardrails of SQL are very valuable. It might be better not to remove those guardrails by going NoSQL.

  • Team’s Familiarity and Operational Considerations: If your development and operations team is very experienced with SQL databases and not with the particular NoSQL technology, and if the learning curve could cause delays or mistakes, weigh that in. SQL is tried-and-true; NoSQL might require new skills (data modeling differently, understanding eventual consistency issues, etc.). Operating a large NoSQL cluster can also be non-trivial (though managed cloud services ease that). If you don’t have a clear need for NoSQL, introducing it just adds a new technology to maintain. As one Stack Overflow discussion pointed out, sometimes it’s easier to find an SQL DBA or developer than a specialized NoSQL one. So consider the human factor: use the tool your team can excel with, unless the project’s needs push you elsewhere.

  • Systems with Stable, Structured Schema: If your data is well-structured and unlikely to change often – essentially if a fixed schema isn’t a problem but actually a feature (for data integrity) – there’s less reason to go NoSQL. A well-defined schema can be an advantage for ensuring all data conforms. For example, if you have a core business data (like product catalog with well-defined attributes, or a banking system with fixed fields), a relational schema ensures every record has the required fields and types. In NoSQL you might inadvertently miss adding a field in some records and not know until later.

In summary, avoid or rethink NoSQL when your problem can be handled elegantly by a relational database and you gain no strong upside from NoSQL. Don’t use NoSQL just because it’s trendy – use it because it’s necessary. A common sensible approach is to start with relational (if it fits initially) and only move to NoSQL if and when you identify concrete needs that relational can’t meet (scale, flexibility, etc.). There’s also the approach of polyglot persistence: using multiple databases for different purposes in the same application. For instance, using a SQL database for core transactional data, but a NoSQL database for a caching layer or for logging or for a recommendation engine. That way you get each technology for what it’s best at.

.....

.....

.....

Like the course? Get enrolled and start learning!

Table of Contents

Contents are not accessible

Contents are not accessible

Contents are not accessible

Contents are not accessible

Contents are not accessible