Caching strategies and distributed caching solutions for high-traffic web services

Distributed caching is a technique where frequently accessed data is stored in a shared, in-memory layer spread across multiple nodes, sitting between your application servers and your database. It reduces read latency from tens of milliseconds to sub-millisecond, cuts database load by 80–99%, and is the single most impactful scaling lever for high-traffic web services.

If you're preparing for a system design interview or building production systems, caching is foundational.

Key Takeaways

  • Cache-aside (lazy loading) is the most common distributed caching strategy and the default answer in interviews unless the problem demands something else.
  • Cache invalidation — not caching itself — is the hard problem. Choose between TTL-based expiration, event-driven invalidation, and delete-on-write based on your consistency requirements.
  • Redis is the default choice for most teams in 2026. Memcached still wins for simple, high-throughput string caching at massive scale.
  • Consistent hashing distributes keys across cache nodes and minimizes reshuffling when nodes join or leave the cluster.
  • A multi-tier cache (local L1 + distributed L2) gives you sub-microsecond hot-path reads with cluster-wide consistency for the broader working set.
  • In interviews, always discuss eviction policies, cache stampede mitigation, and failure modes — these separate strong candidates from average ones.

Why Distributed Caching Matters for High-Traffic Systems

A single database read to PostgreSQL or MySQL takes 1–10ms. A Redis or Memcached read takes 0.1–0.5ms. At 100,000 requests per second, that difference is the gap between a responsive application and a crashed database.

Every major tech company relies on distributed caching at scale. Meta's TAO cache serves billions of social graph reads per second. Netflix caches personalization data in EVCache (built on Memcached) across multiple AWS regions to serve 260+ million subscribers. Twitter caches timelines in Redis to serve 500 million tweets per day.

The value is simple: memory is faster than disk, and a shared cache avoids the inconsistency of local per-server caches. When your application scales beyond a single server, a distributed cache becomes the coordination point for fast, consistent reads.

If you're building this foundation from scratch, the Grokking System Design Fundamentals course walks through caching alongside other core building blocks like load balancing and database replication.

The Four Core Caching Strategies

Every caching pattern answers one question: who keeps the cache and database in sync?

1. Cache-Aside (Lazy Loading)

The application owns all cache logic. On a read, the app checks the cache first. On a miss, it queries the database, writes the result to the cache, and returns. On a write, the app updates the database and either invalidates or ignores the cache entry.

def get_user(user_id): # Step 1: Check cache cached = redis.get(f"user:{user_id}") if cached: return deserialize(cached) # Step 2: Cache miss — query database user = db.query("SELECT * FROM users WHERE id = %s", user_id) # Step 3: Populate cache with TTL redis.setex(f"user:{user_id}", 3600, serialize(user)) return user

When to use it: Read-heavy workloads where stale data is tolerable for short windows. This is the default pattern for most web applications.

Trade-off: The first request after a miss or expiration is slow (cold start penalty). Data can be stale between the write to the database and the next cache refresh.

2. Write-Through Cache

Every write goes to the cache and the database synchronously before the write is acknowledged. The cache always contains the latest data.

When to use it: Systems where read-after-write consistency is critical, like shopping carts or account balances.

Trade-off: Higher write latency because every write hits two systems. Cache fills with data that may never be read.

3. Write-Behind (Write-Back) Cache

Writes go to the cache immediately and are asynchronously flushed to the database in batches. The application gets fast write acknowledgment.

When to use it: Write-heavy workloads like analytics event ingestion or activity logging where brief data loss on cache failure is acceptable.

Trade-off: Risk of data loss if the cache node crashes before flushing. Adds complexity with background flush workers and retry logic.

4. Read-Through Cache

The cache itself is responsible for loading data on a miss — the application never talks to the database directly for reads. The cache acts as the primary read interface.

When to use it: When you want to decouple the application from data-fetching logic, common in CDN architectures and some ORM frameworks.

Trade-off: Tightly couples your caching layer to your data model. Harder to debug because the fetch logic lives inside the cache abstraction.

Caching Strategies Comparison Table

StrategyWrite PathRead PathConsistencyWrite LatencyBest For
Cache-AsideApp writes to DB, invalidates cacheApp checks cache → DB on missEventual (TTL-bound)Low (DB only)General-purpose read-heavy apps
Write-ThroughApp writes to cache + DB synchronouslyAlways served from cacheStrongHigh (two writes)Read-after-write critical paths
Write-BehindApp writes to cache; async flush to DBAlways served from cacheWeak (async lag)Very low (cache only)High-throughput write-heavy systems
Read-ThroughApp writes to DB; cache loads on next readCache auto-fetches on missEventualLow (DB only)CDN, transparent caching layers

Cache Invalidation: The Hard Problem

Phil Karlton famously said there are only two hard things in computer science: cache invalidation and naming things. He was right about the first one.

Invalidation means ensuring the cache reflects reality. Get it wrong and users see stale prices, phantom inventory, or outdated permissions. There are three primary approaches.

TTL-Based Expiration

Every cache entry gets a time-to-live. After expiration, the next read fetches fresh data. This works well when you can tolerate bounded staleness. Common pattern: 60-second TTL for user profiles, 5-minute TTL for product listings, 24-hour TTL for static configuration.

Event-Driven Invalidation

When data changes, a change event (via a message queue, CDC stream, or database trigger) notifies the cache to delete or refresh the affected keys. Netflix uses CDC from their databases and an internal event bus to propagate invalidation events to EVCache clusters, keeping data fresh within seconds.

Delete-on-Write (Idempotent Invalidation)

On any write, delete the cache key rather than trying to update it. The next read triggers a cache-aside refill. Meta recommends this for large-scale caching because it eliminates race conditions: two concurrent writes can't leave the cache inconsistent if both simply delete the key.

Interview tip: Lead with delete-on-write plus TTL as a safety net. It's the simplest correct answer for most scenarios.

Cache Eviction Policies

When cache memory is full, the eviction policy decides which entries to discard.

PolicyHow It WorksBest ForUsed By
LRU (Least Recently Used)Evicts the entry not accessed for the longest timeGeneral-purpose workloadsRedis (approximated LRU), Memcached
LFU (Least Frequently Used)Evicts the entry with the fewest accessesSkewed popularity distributionsRedis 4.0+
TTLEvicts entries after a fixed time-to-liveTime-sensitive data (sessions, tokens)All major caches
RandomEvicts a random entryUniform access patterns, simplicityRedis (optional policy)
ARC (Adaptive Replacement Cache)Dynamically balances recency and frequencyWorkloads with shifting access patternsZFS, IBM DS8000

Redis uses an approximated LRU by default. It samples a configurable number of keys (default 5) and evicts the least recently used among the sample. Starting with Redis 4.0, you can switch to LFU, which is better for workloads where a few keys dominate access (a Zipfian distribution, which describes most web traffic).

Distributed Caching Architecture: Consistent Hashing

A single cache node is a single point of failure and a capacity ceiling. Distributed caches partition data across multiple nodes. The standard technique is consistent hashing.

In consistent hashing, both cache nodes and keys are mapped onto a hash ring. A key is stored on the first node encountered clockwise from the key's position on the ring. When a node is added or removed, only ~1/N of keys need to be remapped (where N is the number of nodes), compared to a naive modular hash where nearly all keys shift.

Virtual nodes solve the problem of uneven distribution. Instead of placing each physical server at one point on the ring, you place 100–200 virtual nodes per server. Amazon's 2007 Dynamo paper popularized this approach, and Cassandra adopted it for its partitioning layer. Redis Cluster uses a fixed set of 16,384 hash slots distributed across nodes — a similar concept with a different implementation.

Redis vs Memcached: Choosing Your Cache Engine

This is one of the most common questions in system design interviews and real architectural decisions. Here's how they compare in 2026.

FeatureRedis / ValkeyMemcached
Data structuresStrings, lists, sets, sorted sets, hashes, streams, bitmapsStrings only
PersistenceRDB snapshots + AOF (append-only file)None — purely volatile
ReplicationBuilt-in primary-replica replicationApplication-level only
ClusteringRedis Cluster with automatic sharding (16,384 slots)Client-side consistent hashing
ThreadingSingle-threaded command execution (threaded I/O since 6.0)Multi-threaded, leverages multiple cores
Pub/SubBuilt-inNot supported
Licensing (2026)AGPLv3 (Redis 8.0+); Valkey is BSD-licensed forkBSD — fully open source
Best forComplex data models, pub/sub, persistence needsSimple key-value caching at extreme throughput

The Valkey factor: In 2024, Redis Labs changed its license away from BSD, eventually landing on AGPLv3 in 2025. The Linux Foundation forked Redis 7.2.4 as Valkey, a production-ready, BSD-licensed, drop-in replacement now used by AWS ElastiCache as its default engine.

When to pick Memcached: Pure key-value string caching (session tokens, HTML fragments, API response blobs) at extreme throughput. Netflix's EVCache is built on Memcached for this reason.

When to pick Redis/Valkey: Any use case requiring sorted sets, pub/sub, persistence, or data structures beyond strings. For most teams and most interview answers, Redis is the default.

Multi-Tier Caching: L1 + L2 Architecture

Production systems at scale rarely use a single cache tier. The standard architecture uses two layers:

L1: Local in-memory cache (per application server). Libraries like Caffeine (Java) or simple LRU dictionaries. Response time: microseconds. Capacity: a few hundred MB to a few GB per server.

L2: Distributed cache (Redis/Memcached cluster). Shared across all application servers. Response time: sub-millisecond over the network. Capacity: tens to hundreds of GB across the cluster.

The read path: check L1 first → on L1 miss, check L2 → on L2 miss, query the database and populate both L1 and L2. Writes invalidate both layers. The L1 absorbs repeated reads for hot data (e.g., trending content), while the L2 provides cluster-wide consistency for the broader working set.

The Cache Stampede Problem (Thundering Herd)

When a popular cache key expires, hundreds of concurrent requests simultaneously find a cache miss and all query the database at once. This cache stampede can overload your database and cascade into a full outage. Three proven mitigations:

1. Locking (Mutex): The first request acquires a distributed lock and fetches from the database. Others wait or receive a stale value. Redis's SET key value NX EX 5 is commonly used as the lock.

2. Probabilistic Early Expiration (XFetch): Each request has a small random probability of proactively refreshing the cache before TTL actually expires, increasing as TTL approaches zero.

3. Request Coalescing: Multiple concurrent misses for the same key collapse into a single database query. Go's singleflight package is a common implementation.

Mentioning cache stampede mitigation unprompted in interviews is a strong signal that you think about failure modes, not just the happy path.

Caching in System Design Interviews

Caching questions appear in nearly every system design interview. For a structured walkthrough with dozens of practice problems, the Grokking the System Design Interview course covers caching in the context of real designs like URL shorteners, Twitter feeds, and chat applications.

Sample Interview Dialog

Interviewer: "You're designing a news feed for 500 million users. How would you handle caching?"

Strong answer: "I'd use a multi-tier cache. L1 in-process cache on each feed server stores feeds for the most active users (top 1% by request frequency). L2 is a Redis Cluster holding precomputed feeds for all active users. For invalidation, when a user publishes a post, I'd fan-out-on-write to update follower caches asynchronously, with a 5-minute TTL safety net. For celebrity users with millions of followers, I'd switch to fan-out-on-read. LRU eviction works well since engagement follows a power law."

Interviewer follow-up: "What happens if the Redis cluster goes down?"

Strong answer: "The L1 caches absorb a portion of read traffic. For the rest, I'd serve stale data from a backup tier or CDN-cached version. The database is protected by a circuit breaker limiting concurrent reads. Recovery means warming the cache from the database in priority order — most active users first. I'd alert if cache hit ratio drops below 95%."

Common Follow-Up Questions Interviewers Ask

  1. "How do you decide what to cache?" — Cache data that is read frequently, expensive to compute, and tolerant of short staleness windows.
  2. "How do you size the cache?" — Use the 80/20 rule: 20% of your data typically serves 80% of reads. Estimate the working set size and provision 1.5–2x that for headroom.
  3. "What's your cache hit ratio target?" — 95%+ for read-heavy services. Below 90% usually means your eviction policy or key design needs work.

For advanced distributed systems topics like multi-region caching, geo-replicated consistency, and cache coherence protocols, the Grokking the Advanced System Design Interview goes deeper into the patterns used by systems like DynamoDB, Cassandra, and Google Spanner.

Frequently Asked Questions About Distributed Caching

What is a distributed cache and how does it work?

A distributed cache is a shared, in-memory data store spread across multiple nodes on a network. Data is partitioned across nodes using consistent hashing, and replication provides fault tolerance. Redis Cluster and Memcached are the two most widely used implementations.

What is the difference between cache-aside and read-through caching?

In cache-aside, the application manages checking the cache and querying the database on a miss. In read-through, the cache itself handles the database fetch transparently. Cache-aside gives more control; read-through gives cleaner application code.

When should I use Redis vs Memcached in 2026?

Use Redis (or its BSD-licensed fork, Valkey) when you need data structures beyond strings, persistence, pub/sub, or built-in clustering. Use Memcached for pure key-value string caching at maximum multi-threaded throughput. Most teams default to Redis for its versatility.

How do you handle cache invalidation in a microservices architecture?

Use delete-on-write: when any service writes to the database, it deletes the corresponding cache key. Combine this with a TTL safety net so that even if an invalidation message is lost, the cache self-heals. For cross-service invalidation, publish events to a message broker (Kafka, RabbitMQ) that consuming services subscribe to.

What is a cache stampede and how do you prevent it?

A cache stampede occurs when a popular key expires and hundreds of concurrent requests simultaneously hit the database. Prevent it with distributed locking, probabilistic early expiration (refresh before TTL hits), or request coalescing (collapse duplicate misses into one query).

How do you decide what data to cache?

Cache data that is read far more often than written, expensive to fetch, and tolerant of short staleness. Common examples: user sessions, product catalog entries, API responses, and computed aggregations.

What is consistent hashing and why is it used in distributed caches?

Consistent hashing maps keys and nodes onto a circular hash space. Each key is assigned to the nearest node clockwise on the ring. When nodes are added or removed, only ~1/N of keys are remapped, minimizing disruption during scaling events.

How large should a distributed cache be?

Size your cache to hold the working set. The 80/20 rule is a good heuristic: 20% of your data typically serves 80% of requests. If your cache hit ratio is consistently below 95% and eviction rates are high, add more capacity.

Can a cache replace a database?

No. A cache is an acceleration layer, not a persistence layer. Even Redis with persistence is not a substitute for a database. Caches optimize for read speed; databases optimize for durability, transactions, and complex queries.

TL;DR

Distributed caching stores frequently accessed data in shared, in-memory nodes to reduce database load and serve reads in sub-millisecond time. The four core strategies are cache-aside, write-through, write-behind, and read-through — cache-aside is the default. Invalidation is the hardest part: use delete-on-write plus TTL safety nets. Redis/Valkey is the standard choice in 2026; Memcached wins for pure key-value throughput.

In interviews, discuss eviction policies, consistent hashing, cache stampede prevention, and multi-tier architecture. Target a 95%+ hit ratio.

TAGS
System Design Interview
System Design Fundamentals
CONTRIBUTOR
Design Gurus Team
-

GET YOUR FREE

Coding Questions Catalog

Design Gurus Newsletter - Latest from our Blog
Boost your coding skills with our essential coding questions catalog.
Take a step towards a better tech career now!
Explore Answers
How to clear a Google coding interview?
Which of the following are valid best practices for multithreading and concurrency?
What is the biggest failure interview questions?
How do you use CRDTs in collaborative apps (set/counter/map)?
Learn how to use CRDTs in collaborative apps like Google Docs or Figma. Covers sets, counters, and maps with step-by-step explanations, real-world examples, common pitfalls, comparison table, and interview insights.
What salary should I ask for a software engineer?
How to pass an interview at Meta?
Related Courses
Course image
Grokking the Coding Interview: Patterns for Coding Questions
Grokking the Coding Interview Patterns in Java, Python, JS, C++, C#, and Go. The most comprehensive course with 476 Lessons.
4.6
Discounted price for Your Region

$197

Course image
Grokking Modern AI Fundamentals
Master the fundamentals of AI today to lead the tech revolution of tomorrow.
3.9
Discounted price for Your Region

$72

Course image
Grokking Data Structures & Algorithms for Coding Interviews
Unlock Coding Interview Success: Dive Deep into Data Structures and Algorithms.
4
Discounted price for Your Region

$78

Image
One-Stop Portal For Tech Interviews.
Copyright © 2026 Design Gurus, LLC. All rights reserved.