System Design Interview Crash Course
Ask Author
Back to course home

0% completed

Vote For New Content
Design a Distributed Cache (like Redis)
On this page

1. Problem Definition and Scope

2. Clarify functional requirements

3. Clarify non-functional requirements

4. Back of the envelope estimates

5. API design

6. High level architecture

7. Data model

8. Core flows end to end

Flow 1: Saving Data (The "SET" Request)

Flow 2: Retrieving Data (The "GET" Request)

Flow 3: The Cache Miss (Database Rescue)

9. Caching and read performance

10. Storage, indexing and media

11. Scaling strategies

12. Reliability, failure handling and backpressure

13. Security, privacy and abuse

14. Bottlenecks and next steps

Summary

Image

Here is the step-by-step design for a Distributed Cache (like Redis or Memcached).

1. Problem Definition and Scope

We are designing a distributed, in-memory key-value store. The system acts as a high-speed buffer between application services and the primary database to reduce latency and database load.

  • User Groups: Backend microservices and application servers (not human end-users).

  • Main Actions: Storing data (Put), retrieving data (Get), and removing data (Delete).

  • Scope:

    • We will focus on the distributed aspect: sharding data across multiple servers.

    • We will focus on Availability and Partition Tolerance (AP), favoring speed over strict consistency.

    • Out of Scope: We will not cover complex data structures (like Sorted Sets, Geo-indexes) or advanced features like Pub/Sub and Lua scripting, focusing strictly on the core Key-Value functionality.

Image

2. Clarify functional requirements

Must Have:

  • Put (Key, Value, TTL): Clients can store a binary value with a string key and an optional expiration time (Time-To-Live).

  • Get (Key): Clients can retrieve the value if it exists and hasn't expired.

  • Delete (Key): Clients can explicitly remove a key.

  • Eviction: The system must automatically remove the Least Recently Used (LRU) items when memory is full.

  • High Availability: If a node fails, data should remain accessible via replicas.

Nice to Have:

  • Persistence: Periodic snapshots to disk to allow "warm" restarts.

  • Cluster Awareness: Clients should know which node holds which key to avoid network hops.

Image

3. Clarify non-functional requirements

  • Latency: Ultra-low. Reads and writes should be < 1 ms (excluding network time) at the 99th percentile.

  • Throughput: High scale. We target 10 million QPS (Queries Per Second) globally.

  • Availability: 99.9% ("Three nines"). Since this is a cache, temporary failures are acceptable (the DB is the source of truth), but the system should be robust.

  • Consistency: Eventual consistency is acceptable for replicas, but the primary shard should provide strong consistency (read-your-own-write).

  • Data Size: We need to store roughly 10 TB of data in RAM.

Image

4. Back of the envelope estimates

Let's size the system for the 10 TB and 10M QPS targets.

  • Storage Estimates:
    • Target Data: 10 TB.
    • Replication Factor: 3 (1 Primary + 2 Replicas for HA).
    • Total RAM Required: 10 TB * 3 = 30 TB.
    • RAM per Node: Assume 64 GB physical RAM, with ~50 GB usable for cache (leaving space for OS).
    • Total Nodes: 30,000 \text{ GB} / 50 \text{ GB} = \mathbf{600} nodes.
  • Throughput Estimates:
    • Total QPS: 10 Million.
    • QPS per Node: 10,000,000 / 600 \approx 16,666 QPS.
    • Feasibility: A high-performance cache (like Redis) can easily handle 50k-100k QPS per core.1 Our load per node is low, meaning we are Memory Bound, not CPU bound.
  • Bandwidth Estimates:
    • Average Object Size: 1 KB.
    • Total Bandwidth: 10 \text{ M QPS} \times 1 \text{ KB} \approx 10 \text{ GB/s}.
    • Bandwidth per Node: 10 \text{ GB/s} / 600 \approx 17 \text{ MB/s}. This is trivial for modern 10Gbps NICs.
Image

5. API design

While real caches use binary protocols (like TCP/RESP) for performance, we will define a simple REST-style API for clarity.

1. Set Value

  • Method: PUT /v1/keys/{key}
  • Body: { "value": "base64_data", "ttl_seconds": 3600 }
  • Response: 201 Created or 413 Payload Too Large.

2. Get Value

  • Method: GET /v1/keys/{key}
  • Response:
    • 200 OK: { "value": "base64_data", "ttl": 3590 }
    • 404 Not Found: Key missing or expired.

3. Delete Value

  • Method: DELETE /v1/keys/{key}
  • Response: 204 No Content.

6. High level architecture

To achieve sub-millisecond latency, we will use a Smart Client architecture. This avoids a centralized Load Balancer (which adds an extra network hop) and allows the application to route requests directly to the correct shard.

Components:

  1. Smart Client (SDK): A library embedded in the application servers (Service A, Service B). It holds a "Cluster Map" and uses Consistent Hashing to pick the right server.

  2. Cache Nodes: The 600 servers holding data in RAM. They are "shared-nothing" (they don't know about each other).

  3. Cluster Manager: A highly available service (like ZooKeeper or Etcd) that monitors the health of cache nodes. It pushes the "Cluster Map" updates to the Smart Clients.

Image

7. Data model

Since this is an in-memory store, the "data model" refers to the internal Data Structures used in RAM to ensure O(1) performance.

1. Hash Map (The Index)

  • Purpose: Provides instant lookup for a key.
  • Key: String (the user's key).
  • Value: A pointer to the CacheEntry object in the heap.

2. Doubly Linked List (The Eviction Queue)

  • Purpose: Tracks the usage order of items for LRU (Least Recently Used) eviction.
  • Head: Most Recently Used (MRU).
  • Tail: Least Recently Used (LRU).
  • Mechanism: Every time a key is accessed, we move it to the Head. When memory is full, we chop off the Tail.

8. Core flows end to end

In a distributed cache, speed is everything. We cannot afford delays from "middlemen" or complex coordination. We use a Smart Client architecture.

Think of the client application like a delivery driver with a perfect GPS. It knows exactly which server holds the data before it even leaves the garage.

Here are the detailed steps for the three most critical scenarios: Saving Data, Retrieving Data, and Handling Missing Data.

Flow 1: Saving Data (The "SET" Request)

Scenario: A user logs into your website. The backend (Service A) wants to save their session user:123 with data {"status": "active"} for 1 hour.

  1. The "GPS" Calculation (Hashing)
    • The Problem: We have 600 servers. We can't just pick one at random, or we'll never find the data again.
    • The Solution: The Smart Client (a library inside Service A) takes the key user:123 and runs it through a math formula (Hashing).
    • The Result: The math tells us: "This specific key belongs to Server #42."
  2. Direct Delivery (Routing)
    • Service A sends the data packet directly to Server #42 using an open internet connection.
    • Why strict speed? We skip any "Load Balancers" or gateways. It is a straight line from Client to Server.
  3. The "Bouncer" Check (Memory Eviction)
    • The request arrives at Server #42. Before saving, the server checks its RAM usage.
    • If RAM is full: The server needs to make space. It looks at its internal "Old Usage Line" (LRU List).
    • It grabs the item at the very back of the line (the data that hasn't been touched in the longest time).
    • It deletes that old item to free up space. This is called Eviction.
  4. Storing in Memory
    • Now that there is space, Server #42 saves your data in two internal structures:
      • The Dictionary (Hash Map): Saves the data so we can find it instantly later.
      • The Freshness Line (Linked List): Adds user:123 to the very front of the line. This marks it as the "Freshest" item, ensuring it won't be deleted anytime soon.
  5. "Fire and Forget" Backup (Async Replication)
    • Server #42 saves the data to its RAM and immediately sends a "Success" message back to Service A.
    • Background Task: A few milliseconds after replying, Server #42 quietly copies the data to a backup server (Server #43).
    • Trade-off: We prioritize speed over perfect safety. If Server #42 crashes in that split second before the backup finishes, the data is lost. For a cache, this is acceptable.
Image

Flow 2: Retrieving Data (The "GET" Request)

Scenario: The user refreshes the page. Service A needs to read user:123.

  1. The "GPS" Calculation (Again)
    • Service A runs the same math formula on user:123.
    • It gets the same result: Server #42. It sends the "Get" request there.
  2. The Lookup
    • Server #42 looks in its Dictionary (Hash Map).
    • If the key is missing, it returns 404 Not Found.
    • If the key is found, it grabs the data.
  3. The "Sniff Test" (Expiration Check)
    • The server looks at the "Time-To-Live" label on the data.
    • Is it expired? If the current time is past the expiration time, the data is "rotten." The server deletes it immediately and returns 404 Not Found. This is called Lazy Deletion.
    • Is it fresh? If yes, we proceed.
  4. The "Freshness" Update (Crucial Step)
    • Even though we are only reading data, the server updates its memory.
    • It finds user:123 in the Freshness Line. It might have drifted toward the back.
    • The server moves user:123 back to the very front.
    • Why? Because you just used it! We want to protect popular data from being evicted.
  5. The Response
    • Server #42 sends the data back to Service A. The whole process typically takes less than 1 millisecond.
Image

Flow 3: The Cache Miss (Database Rescue)

Scenario: What happens if the cache is empty or the data was evicted? This is the standard "Look-Aside" pattern.

  1. Cache Fail: Service A asks Server #42 for user:123. Server #42 says 404 Not Found.

  2. Database Fetch: Service A connects to the primary database (like PostgreSQL), which is slower but permanent. It finds the user data there.

  3. Cache Refill: Service A sends a SET command (Flow 1) to Server #42 with the data it just found.

  4. Result: The next time anyone asks for user:123, the cache will have it ready.

Image

9. Caching and read performance

This section details how we optimize the internal engine.

  • Single-Threaded Event Loop:
    • We use a single thread (like Redis) with I/O Multiplexing (epoll or kqueue).
    • Why? Removing locks (mutexes) significantly speeds up RAM access. Since memory operations take nanoseconds, the CPU is rarely the bottleneck.
  • Memory Fragmentation:
    • Allocating/freeing random sizes (10 bytes, then 5KB) creates holes in RAM ("Swiss Cheese" memory).
    • Solution: Use a specialized allocator like jemalloc. It categorizes memory into size classes (e.g., small, large, huge) to minimize fragmentation.
  • Expiration Strategy:
    • Lazy Deletion: Checked during GET (as described in Flow 2).
    • Active Sampling: A background job runs 10 times/sec. It picks 20 random keys with TTLs. If any are expired, it deletes them. This ensures expired keys that are never accessed again don't clutter RAM forever.

10. Storage, indexing and media

  • Primary Storage: RAM (Volatile).
  • Persistence (Optional):
    • To survive restarts, we can use Snapshots (RDB). Every 15 minutes, the process forks and dumps the entire memory to a disk file.
    • On startup, the node reads this file to "warm up" the cache.
  • Media Handling:
    • We do not store large images or videos in the cache. It fills RAM too fast and blocks the network.
    • Strategy: Store the media in Object Storage (S3) and cache only the URL (100 bytes) in this system.

11. Scaling strategies

We use Consistent Hashing to distribute data across the 600 nodes.

  • The Ring:
    • Imagine a circle with values 0 to 2^{32}.
    • Each Server is placed on the ring based on hash(server_ip).
    • A Key is stored on the first server found moving clockwise from hash(key).
  • Virtual Nodes:
    • Problem: If we have few nodes, data might be unevenly distributed (e.g., Node A gets 40% of data, Node B gets 10%).
    • Solution: Each physical server appears as 100 "Virtual Nodes" scattered randomly on the ring. This ensures a uniform distribution of keys and load.
  • Adding/Removing Nodes:
    • When a new node joins, it only "steals" a small set of keys from its immediate neighbor. The other 598 nodes are unaffected. This minimizes data movement.

12. Reliability, failure handling and backpressure

  • Failure Detection:
    • The Cluster Manager sends heartbeats (pings) to all Cache Nodes.
    • If a Primary Node misses 3 heartbeats, it is marked as "Dead".
  • Failover:
    • The Cluster Manager promotes one of the Replicas to be the new Primary.
    • The new Topology Map is pushed to all Smart Clients.
  • Thundering Herd (Cache Stampede):
    • Problem: If a popular key expires, 10,000 requests might hit the DB simultaneously.
    • Solution: Request Coalescing (Singleflight) in the Client SDK. If 100 threads want the same missing key, the SDK makes one request to the DB and shares the result with all 100 threads.

13. Security, privacy and abuse

  • Network Isolation: The cache cluster runs in a private VPC. Ports are not exposed to the public internet.

  • Authentication: Clients must provide a secure token/password upon connection (AUTH <password>).

  • Encryption: Use TLS for data in transit. This adds slight latency but is required for compliance.

  • Namespace Isolation: If multiple teams share the cluster, prefix keys (e.g., billing:user:1, shipping:user:1) to prevent collisions.

14. Bottlenecks and next steps

  • Bottleneck: The "Celebrity" Key:
    • If "Taylor Swift's Profile" is read 1 million times/sec, it will overwhelm the single shard holding it.
    • Next Step: Implement Local Caching (Near Cache). The Smart Client detects hot keys and caches them in the application's local memory for a few seconds.
  • Bottleneck: RAM Cost:
    • RAM is expensive.
    • Next Step: Implement Tiered Storage. Keep hot data in RAM and warm data on Flash/NVMe SSDs (using a tool like RocksDB or Redis on Flash). This lowers cost significantly.

Summary

  1. Architecture: Distributed In-Memory Key-Value store using Smart Clients.

  2. Scale: Handles 10M QPS and 10TB data using 600 shards.

  3. Data Structure: Hash Map + Doubly Linked List for O(1) access and LRU eviction.

  4. Reliability: Uses Async Replication and Consistent Hashing to minimize impact of node failures.

Mark as Completed

On this page

1. Problem Definition and Scope

2. Clarify functional requirements

3. Clarify non-functional requirements

4. Back of the envelope estimates

5. API design

6. High level architecture

7. Data model

8. Core flows end to end

Flow 1: Saving Data (The "SET" Request)

Flow 2: Retrieving Data (The "GET" Request)

Flow 3: The Cache Miss (Database Rescue)

9. Caching and read performance

10. Storage, indexing and media

11. Scaling strategies

12. Reliability, failure handling and backpressure

13. Security, privacy and abuse

14. Bottlenecks and next steps

Summary