How do you prevent cache stampede/dogpile under load?

Cache stampede also called dogpile happens when many requests hit a key right as it expires. The cache collapses and the data store or downstream service gets flooded. The fix is not one trick. You combine small safe patterns that smooth expirations, coalesce work, and protect the source of truth.

Introduction

Cache stampede is the surge of concurrent misses for the same key after expiry. Picture a hot product page or a trending feed item that many users request at the same time. One key flips from cached to missing and hundreds of threads rush to rebuild it. Without coordination, you overload the database or microservice and latency spikes.

Why It Matters

In production systems you design for the common case and the worst minute. A single hot key that expires can generate a burst that dwarfs normal traffic. Stampede control improves tail latency, error rate, and cost stability. For the system design interview this topic reveals that you understand cache aside, read patterns, and resilience under load. It also shows pragmatic thinking about expiry mechanics which is where many real incidents begin.

How It Works step by step

Use a per key lock to allow only one worker to rebuild a value. Everyone else waits or serves stale. In Redis you can approximate this with SET NX and a short expire. If the lock is acquired, rebuild and set the value and release the lock by deleting only if you still own it.
Add soft TTL with a grace window. Keep two timestamps in the value. A hard expiry and a soft expiry. After soft expiry you still serve the cached value but you trigger a background refresh. This is the stale while revalidate pattern.
Probabilistic early refresh. Before a key reaches soft expiry, allow a small percentage of requests to refresh early with a probability that increases as expiry nears. This spreads refresh work over time and avoids coordinated bursts at one second after expiry.
Add jitter to TTL. Instead of expiring all items at exactly five minutes, add a random delta such as plus or minus ten percent. This reduces herd behavior across a large set of similar keys.
Request coalescing inside the application. Maintain an in process map of in flight fetches per key. If a request for key K is already rebuilding, new callers await the same future rather than starting extra rebuilds.
Hedge with a small stale serve policy. If the lock queue grows or if rebuild time exceeds a threshold, serve a recently stale copy and continue refreshing in the background. Users see a slightly older value but the system stays responsive.
Apply rate limit and backoff around the origin. Token bucket or leaky bucket protects databases and feature services. When tokens are exhausted you either queue with a cap or serve stale data.
Use tiered caches. A tiny in process L1 with short TTL plus a shared L2 such as Redis or Memcached. When L1 misses, L2 often hits. Only a fraction reaches the database. You still use the same stampede controls on L2 keys.
Negative caching. If a lookup often misses, cache the miss result for a brief time so you do not batter the database for non existent keys.
Warm hot keys proactively. For top feeds or product pages compute and set the cache ahead of traffic using a scheduled job or stream of hot key metrics.
Safe invalidation. When the source of truth mutates, use write through for small values or event driven invalidation to delete or refresh the right keys. This keeps caches fresh without periodic cliff expirations.
Instrument and alarm. Track rebuild count per key, lock contention, stale serve rate, and refresh latency. Graph percentiles and watch for sudden growth.

Real World Example

Think of an Amazon style product detail page in a flash sale. The page aggregates price, inventory, reviews, and recommendations. Each piece is cached at multiple layers. Without stampede control, an expiry at the exact start of the sale would send a spike to the pricing service and the catalog database.

A robust setup looks like this. The CDN and edge cache use a short hard TTL plus stale while revalidate. The application layer uses request coalescing so many threads wait for a single rebuild per key. L2 Redis stores values with a soft TTL and a grace period. A small percentage of requests perform early refresh as the key nears soft expiry. The origin services are shielded by token bucket limiters. If a rebuild is slow the app serves a recent stale value for a brief window. Hot product keys are warmed five minutes before the event using a scheduled job informed by analytics. Incidents are minimized and median latency stays flat.

Common Pitfalls or Trade offs

Locking without an expire. A lock that never expires can deadlock rebuilds during crashes. Always set a short expire and verify ownership on release.
Serving stale without limits. Stale forever hides real freshness problems and can surface incorrect data. Use a grace period and audit freshness.
Over eager early refresh. If the probability is too high you simply shift the burst earlier. Start near zero and increase near expiry.
One giant global lock. Coarse locks harm throughput. Scope locking to a specific key and keep the critical section tiny.
Ignoring misses. Negative cache for common empty results. Otherwise you may stampede on non existent keys too.
No instrumentation. Without metrics you cannot tune probability, grace windows, or jitter range and you will either over cache or overload origins.

Interview Tip

Expect a prompt like this. The feed item for a celebrity post expires at the same time for millions of users. How will you keep p99 latency stable If the candidate says add a cache and move on you should go deeper. Discuss soft TTL with grace, request coalescing per key, and a rate limiter in front of the feed builder. Bonus points for early refresh and TTL jitter.

Key Takeaways

Stampede control is a combination of small patterns that coordinate rebuilds and smooth expiry.
Lock per key and request coalescing prevent duplicated work.
Soft TTL with stale while revalidate keeps latency stable during refresh.
Jitter and probabilistic early refresh spread load over time.
Protect the origin with rate limiters and hedge with brief stale serves.

Table of Comparison

Approach	Goal	Best for	Risks and Trade-offs
Per-key lock	Ensure only one writer rebuilds the cache value	Hot keys with expensive rebuilds	Lock leaks if not expired correctly
Request coalescing	Share one in-flight fetch across threads	Application servers handling concurrent requests	Memory map growth if not cleaned properly
Soft TTL and grace period	Serve stale data while rebuilding cache	Read-heavy workloads with minor staleness tolerance	Users may see slightly outdated data
Early refresh (probabilistic)	Refresh cache before expiry with small probability	Extremely hot keys	Overly aggressive refresh may increase origin load
TTL jitter	Desynchronize cache expirations	Large sets of similar keys	Harder to predict exact data freshness
Rate limiter	Protect database or origin service	Expensive downstream operations	May delay or drop requests during peaks
Negative cache	Avoid repeated misses for non-existent data	Frequently queried empty results	Must use short TTL to avoid masking new data
Tiered cache (L1/L2)	Absorb load across multiple cache layers	Systems with varied access patterns	Harder to manage coherence and invalidation

FAQs

Q1. What is cache stampede or dogpile?

It is a surge of concurrent cache misses for the same key after expiry. Many workers try to rebuild at once and overload the source of truth.

Q2. Which single pattern should I pick first?

Start with per key locks combined with request coalescing. This stops duplicate rebuilds with minimal changes.

Q3. What is stale while revalidate in caching?

Serve the current cached value after soft expiry and trigger a background refresh. Users get fast responses while the cache becomes fresh again.

Q4. How do I choose TTL values?

Pick TTL based on data volatility and rebuild cost. Add jitter to spread expiries. Add a grace window if slight staleness is acceptable.

Q5. How do I protect the database during a spike?

Use a token bucket or leaky bucket limiter in front of rebuild calls. If tokens run out, queue with a cap or serve a recent stale value.

Q6. Is negative caching safe?

Yes if TTL is short and you recheck periodically. It removes repeated work for common empty results while staying correct.

Further Learning

Build a deeper foundation for cache patterns and resilience in our beginner friendly program Grokking System Design Fundamentals. If you want guided practice with real interview scale problems and complete solutions, explore Grokking Scalable Systems for Interviews.