System Design

Learn System Design

How to Learn System Design?

Functional vs. Non-functional Requirements

What are Back-of-the-Envelope Estimations?

Things to Avoid During System Design Interview

System Design Basics

Introduction to Load Balancing

Load Balancing Algorithms

Uses of Load Balancing

Load Balancer Types

Stateless vs. Stateful Load Balancing

High Availability and Fault Tolerance

Scalability and Performance

Challenges of Load Balancers

Introduction to API Gateway

Usage of API gateway

Advantages and disadvantages of using API gateway

Scalability

Availability

Latency and Performance

Concurrency and Coordination

Monitoring and Observability

Resilience and Error Handling

Fault Tolerance vs. High Availability

HTTP vs. HTTPS

TCP vs. UDP

HTTP: 1.0 vs. 1.1 vs 2.0 vs. 3.0

URL vs. URI vs. URN

Introduction to DNS

DNS Resolution Process

DNS Load Balancing and High Availability

Introduction to Caching

Why is Caching Important?

Types of Caching

Cache Replacement Policies

Cache Invalidation

Cache Read Strategies

Cache Coherence and Consistency Models

Caching Challenges

Cache Performance Metrics

What is CDN?

Origin Server vs. Edge Server

CDN Architecture

Push CDN vs. Pull CDN

Introduction to Data Partitioning

Partitioning Methods

Data Sharding Techniques

Benefits of Data Partitioning

Common Problems Associated with Data Partitioning

What is a Proxy Server?

Uses of Proxies

VPN vs. Proxy Server

What is Redundancy?

What is Replication?

Replication Methods

Data Backup vs. Disaster Recovery

Introduction to CAP Theorem

Components of CAP Theorem

Trade-offs in CAP Theorem

Examples of CAP Theorem in Practice

Beyond CAP Theorem

System Design Trade-offs in Interviews

Introduction to Databases

SQL Databases

NoSQL Databases

SQL vs. NoSQL

ACID vs BASE Properties

Real-World Examples and Case Studies

SQL Normalization and Denormalization

In-Memory Database vs. On-Disk Database

Data Replication vs. Data Mirroring

Database Federation

Indexes

What are Indexes?

Types of Indexes

Introduction to Bloom Filters

Benefits & Limitations of Bloom Filters

Variants and Extensions of Bloom Filters

Applications of Bloom Filters

Difference Between Long-Polling, WebSockets, and Server-Sent Events

Why Quorum?

What is Quorum?

What is Heartbeat?

What is Checksum?

Uses of Checksum

What is Leader and Follower Pattern?

What is Security and Privacy?

What is Authentication?

What is Authorization?

Authentication vs. Authorization

OAuth vs. JWT for Authentication

What is Encryption?

What are DDoS Attacks?

Introduction to Messaging System

Introduction to Kafka

Messaging patterns

Popular Messaging Queue Systems

RabbitMQ vs. Kafka vs. ActiveMQ

Scalability and Performance

What is a Distributed File System?

Architecture of a Distributed File System

Key Components of a DFS

Batch Processing vs. Stream Processing

XML vs. JSON

Synchronous vs. Asynchronous Communication

Push vs. Pull Notification Systems

Microservices vs. Serverless Architecture

Message Queues vs. Service Bus

Stateful vs. Stateless Architecture

Event-Driven vs. Polling Architecture

Quiz

Importance of Discussing Trade-offs

Strong vs Eventual Consistency

Latency vs Throughput

ACID vs BASE Properties in Databases

Read-Through vs Write-Through Cache

Batch Processing vs Stream Processing

Load Balancer vs. API Gateway

API Gateway vs Direct Service Exposure

Proxy vs. Reverse Proxy

API Gateway vs. Reverse Proxy

SQL vs. NoSQL

Primary-Replica vs Peer-to-Peer Replication

Data Compression vs Data Deduplication

Server-Side Caching vs Client-Side Caching

REST vs RPC

Polling vs. Long-Polling vs. WebSockets vs. Webhooks

CDN Usage vs Direct Server Serving

Serverless Architecture vs Traditional Server-based

Stateful vs Stateless Architecture

Hybrid Cloud Storage vs All-Cloud Storage

Token Bucket vs Leaky Bucket

Read Heavy vs Write Heavy System

Quiz

System Design Interviews - A step by step guide

System Design Master Template

Designing a URL Shortening Service like TinyURL

Quiz - Designing URL Shortner

Designing Pastebin

Quiz - Designing Pastebin

Designing Instagram

Quiz - Designing Instagram

Designing Dropbox

Quiz - Designing Dropbox

Designing Facebook Messenger

Quiz - Designing Facebook Messenger

Designing Twitter

Quiz - Designing Twitter

Designing Youtube or Netflix

Quiz - Designing Youtube

Designing Typeahead Suggestion

Quiz - Designing Typeahead Suggestion

Designing an API Rate Limiter

Quiz - Designing an API Rate Limiter

Designing Twitter Search

Quiz - Designing Twitter Search

Designing a Web Crawler

Quiz - Designing a Web Crawler

Designing Facebook’s Newsfeed

Quiz - Designing Facebook’s Newsfeed

Designing Yelp or Nearby Friends

Quiz - Designing Yelp or Nearby Friends

Designing Uber backend

Quiz - Designing Uber backend

Designing Ticketmaster

Quiz - Designing Ticketmaster

Dynamo: Introduction

High-Level Architecture

Data Partitioning

Replication

Vector Clocks and Conflicting Data

The Life of Dynamo’s put() & get() Operations

Anti-entropy Through Merkle Trees

Gossip Protocol

Dynamo Characteristics and Criticism

Summary: Dynamo

Quiz: Dynamo

Mock Interview: Dynamo

YouTube Likes Counter

Quiz

Cassandra: Introduction

High-level Architecture

Replication

Cassandra Consistency Levels

Gossiper

Anatomy of Cassandra's Write Operation

Anatomy of Cassandra's Read Operation

Compaction

Tombstones

Summary: Cassandra

Quiz: Cassandra

Mock Interview: Cassandra

Messaging Systems: Introduction

Kafka: Introduction

High-level Architecture

Kafka: Deep Dive

Consumer Groups

Kafka Workflow

Role of ZooKeeper

Controller Broker

Kafka Delivery Semantics

Kafka Characteristics

Summary: Kafka

Quiz: Kafka

Mock Interview: Kafka

Chubby: Introduction

High-level Architecture

Design Rationale

How Chubby Works

File, Directories, and Handles

Locks, Sequencers, and Lock-delays

Sessions and Events

Master Election and Chubby Events

Caching

Database

Scaling Chubby

Summary: Chubby

Quiz: Chubby

Mock Interview: Chubby

Hadoop Distributed File System: Introduction

High-level Architecture

Deep Dive

Anatomy of a Read Operation

Anatomy of a Write Operation

Data Integrity & Caching

Fault Tolerance

HDFS High Availability (HA)

HDFS Characteristics

Summary: HDFS

Quiz: HDFS

Mock Interview: HDFS

Google File System: Introduction

High-level Architecture

Single Master and Large Chunk Size

Metadata

Master Operations

Anatomy of a Read Operation

Anatomy of a Write Operation

Anatomy of an Append Operation

GFS Consistency Model and Snapshotting

Fault Tolerance, High Availability, and Data Integrity

Garbage Collection

Criticism on GFS

Summary: GFS

Quiz: GFS

Mock Interview: GFS

BigTable: Introduction

BigTable Data Model

System APIs

Partitioning and High-level Architecture

SSTable

GFS and Chubby

Bigtable Components

Working with Tablets

The Life of BigTable's Read & Write Operations

Fault Tolerance and Compaction

BigTable Refinements

BigTable Characteristics

Summary: BigTable

Quiz: BigTable

Mock Interview: BigTable

Design Reddit

Quiz

Designing a Notification System

Quiz

Design Google calendar (Medium)

Quiz

Design a Recommendation System for Netflix

Quiz

Design Gmail

Quiz

Design Google News, a Global News Aggregator System (Medium)

Quiz

Design Unique ID Generator (Easy)

Quiz

Design Code Judging System like LeetCode (Medium)

Quiz

Design Payment System

Quiz

Design a Flash Sale for an E-commerce Site (Hard)

Quiz

Design a Reminder Alert System

Quiz

Introduction: System Design Patterns

1. Bloom Filters

2. Consistent Hashing

3. Quorum

4. Leader and Follower

5. Write-ahead Log

6. Segmented Log

7. High-Water Mark

8. Lease

9. Heartbeat

10. Gossip Protocol

11. Phi Accrual Failure Detection

12. Split Brain

13. Fencing

14. Checksum

15. Vector Clocks

16. CAP Theorem

17. PACELC Theorem

18. Hinted Handoff

19. Read Repair

20. Merkle Trees

Quiz

What are Indexes?

What are Indexes?

database indexing

data retrieval optimization

query performance

table scans

+3

hard
·
6 min
·Updated Jan 2025

Indexes are well known when it comes to databases. Sooner or later there comes a time when database performance is no longer satisfactory. One of the very first things you should turn to when that happens is database indexing.

The goal of creating an index on a particular table in a database is to make it faster to search through the table and find the row or rows that we want. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records.

Example: A library catalog

A library catalog is a register that contains the list of books found in a library. The catalog is organized like a database table generally with four columns: book title, writer, subject, and date of publication. There are usually two such catalogs: one sorted by the book title and one sorted by the writer name. That way, you can either think of a writer you want to read and then look through their books or look up a specific book title you know you want to read in case you don’t know the writer’s name. These catalogs are like indexes for the database of books. They provide a sorted list of data that is easily searchable by relevant information.

Simply saying, an index is a data structure that can be perceived as a table of contents that points us to the location where actual data lives. So when we create an index on a column of a table, we store that column and a pointer to the whole row in the index. Let's assume a table containing a list of books, the following diagram shows how an index on the 'Title' column looks like:

Image
Database Index

Just like a traditional relational data store, we can also apply this concept to larger datasets. The trick with indexes is that we must carefully consider how users will access the data. In the case of data sets that are many terabytes in size, but have very small payloads (e.g., 1 KB), indexes are a necessity for optimizing data access. Finding a small payload in such a large dataset can be a real challenge, since we can’t possibly iterate over that much data in any reasonable time. Furthermore, it is very likely that such a large data set is spread over several physical devices—this means we need some way to find the correct physical location of the desired data. Indexes are the best way to do this.

Purpose of Database Indexes

a. Faster Data Retrieval: Indexes significantly speed up query execution by providing a more efficient means of locating data, which can lead to a reduction in the number of disk I/O operations and CPU usage.

b. Sorting and Ordering: Indexes can be used to quickly sort and order the data in a table based on specific criteria, which can be useful for reporting or displaying data in a specific order.

How Indexes Improve Query Performance

a. Reduced Table Scans: By using an index, the database can avoid full table scans, which require reading every row in a table. Instead, the database can directly access the indexed columns, reducing the amount of data that needs to be read and processed.

b. Efficient Data Access: Indexes provide a more efficient means of accessing data by organizing it in a way that allows the database to quickly locate the rows that meet the query criteria.

c. Index Selectivity: Indexes with high selectivity can improve query performance by reducing the number of rows that need to be accessed. High selectivity means that the index can effectively filter out a large number of rows, thereby reducing the amount of work required to process a query.

How Indexes decrease write performance?

It's important to note that while indexes can significantly improve query performance, they also come with some overhead. Indexes require additional storage space and can slow down write operations, such as INSERT, UPDATE, and DELETE, since the indexes must be updated along with the table data. Therefore, it's essential to strike a balance between the number of indexes and their impact on query performance and storage requirements.

When adding rows or making updates to existing rows for a table with an active index, we not only have to write the data but also have to update the index. This will decrease the write performance. This performance degradation applies to all insert, update, and delete operations for the table. For this reason, adding unnecessary indexes on tables should be avoided and indexes that are no longer used should be removed.

To summarize, adding indexes is about improving the performance of search queries. If the goal of the database is to provide a data store that is often written to and rarely read from, in that case, decreasing the performance of the more common operation, which is writing, is probably not worth the increase in performance we get from reading.

Mark as read
PreviousDatabase Federation
NextTypes of Indexes
Discussion
Have a question or insight about this topic? Share it with the community.
Reading Progress
0%

On This Page