How do you mitigate hot keys in a sharded cache?
A hot key is a cache entry that receives outsized traffic compared to the average key. In a sharded cache, consistent hashing places that key on a single shard. All requests pile onto that shard which creates an imbalance. The result is uneven utilization, long queues on one node, and cascading failures that ripple into downstream services and databases. This guide explains practical ways to mitigate hot keys that work in both interviews and production.
Why It Matters
Hot keys show up in real systems whenever you have celebrities, trending items, popular home pages, or global feature flags. The problem appears at any scale once traffic is skewed. Interviewers love this topic because it tests your ability to reason about uneven load, distributed systems, and scalable architecture beyond the happy path. A strong answer proves you can diagnose heavy hitters, select the right mitigation, and discuss the cost of each choice.
How It Works Step by Step
-
Measure the skew
Start with visibility. Track per key request counts and per shard queue depth. Use approximate heavy hitter detection to keep cost low such as a count min sketch or top K with sampling. Add alarms for any single key that exceeds a percent of total QPS.
-
Add a small near cache on each app node
L one per process caching absorbs repeated reads for hot items without extra cross network hops. Keep TTL short and add random jitter so keys do not expire together. This alone flattens many spikes.
-
Collapse duplicate requests
Request coalescing means only one thread fetches a missing key while others wait for that result. Use a single flight style lock keyed by cache key. This removes the stampede on miss or at expiry.
-
Stripe the hot key across replicas
Write the same value under N salted keys such as user one two three colon profile colon zero to N minus one. On read pick one replica at random or round robin. On write update all replicas in parallel or via a background fanout. This spreads load across shards at the cost of write amplification and more memory.
-
Refresh ahead with serve stale
Do not let a hot key expire in the middle of peak traffic. Refresh it before TTL using a background task. If a miss happens anyway, serve stale for a short window while refreshing in the background. Add TTL jitter to avoid sync expiry.
-
Use read replicas or multi cluster reads
Some caches support replica reads. For extreme hot spots, replicate the shard and allow reads from followers. You trade strict freshness for higher read capacity.
-
Isolate the key to a dedicated tier
Move a few top hot keys to a separate small pool with more resources. This is heavy weight but effective when a handful of keys dominate traffic.
-
Edge cache where possible
For content over HTTP place a CDN or regional cache in front. Push invalidations on updates. This removes load before it reaches your cache shards.
-
Change the data shape
If one key aggregates everything, split the value into smaller parts or paginate. Many small keys with parallel fetch is easier to scale than one giant hot key.
-
Rate limit and degrade gracefully
When all else fails, throttle requests for a short period or serve a simplified response. Protect the system first, then recover.
Real World Example
Consider a social app profile page for a global celebrity. The profile object is cached and lives on a single shard. A live event starts. Traffic spikes for that one profile and overwhelms its shard while other shards sit idle. A practical fix is to replicate the profile cache value into thirty two salted keys. The app reads a random replica and writes fan out to all replicas on update. A background job refreshes ahead every few seconds with jitter. Each app node keeps a tiny near cache so most reads never leave the process. If the user posts a new update, the write path bumps the version and triggers a fast refresh, while stale while revalidate serves the old value for a brief window. The shard imbalance disappears and the database load stays flat.
Common Pitfalls or Trade offs
-
Replication multiplies memory and write cost. Use it only for a small top K set.
-
Coalescing requires careful locks. If the worker fails, all waiters might time out. Add timeouts and circuit breakers.
-
Refresh ahead without jitter creates synchronized storms. Always randomize.
-
Multi cluster reads relax consistency. Know your freshness needs and service level goals.
-
Edge caching adds another invalidation path. Keep a single source of truth for versioning and purge rules.
-
Isolating keys in a special tier increases operational complexity. Document ownership and on call plans.
Interview Tip
Expect a follow up like this. A single cache shard is at ninety percent CPU due to one hot key. You can add only one new technique before the next peak. What do you choose and why. A strong answer picks request coalescing or read striping with replicas, explains the trade off, and shows how you would verify the fix with metrics and a rollback plan.
Key Takeaways
-
Hot keys are a skew problem not a global capacity problem.
-
Start with visibility then remove stampedes with near cache and coalescing.
-
Stripe reads across replicas for a tiny top K set and refresh ahead with jitter.
-
Use edge caching and isolation when traffic is extreme.
-
Always measure tail latency and shard balance before and after any change.
Table of Comparison
| Strategy | Best For | Core Idea | Pros | Cons |
|---|---|---|---|---|
| Near cache on app nodes | Repeated reads from same node | Store small in-memory cache with short TTL and jitter | Reduces network hops and absorbs spikes | Risk of stale data and added memory use |
| Request coalescing | Miss storms or expiry stampedes | One worker fetches value, others wait | Eliminates duplicate requests | Needs careful lock and timeout handling |
| Striped replicas of hot key | Read-heavy hot spots | Write same value to multiple salted keys | Balances load across shards | Write amplification and memory overhead |
| Refresh ahead + Serve stale | Predictable expiry patterns | Refresh before TTL and serve stale window | Prevents thundering herd | Requires background jobs and version control |
| Replica reads / multi-cluster | Extreme read hot spots | Allow reads from replica shards | High read capacity, better availability | Lower consistency and higher cost |
| Key isolation tier | One or two dominant keys | Move to separate high-capacity tier | Limits blast radius | Adds operational complexity |
| Edge cache / CDN | HTTP content or global reads | Cache near user edge with invalidations | Offloads load early | Complex purge logic |
| Data model split | Large monolithic keys | Break data into smaller chunks | Parallelism and efficient caching | Increases application logic complexity |
FAQs
Q1. What is a hot key in a cache?
A hot key is a cache entry that gets a large share of requests compared to others which overloads the shard that owns it.
Q2. How do I detect hot keys quickly?
Collect per key request counts with sampling and use heavy hitter sketches to track the top K without full cardinality scans.
Q3. Is replication of hot keys always the best fix?
No. Replication helps read heavy keys but increases memory and write cost. Start with near cache and coalescing first.
Q4. How does serve stale help with hot keys?
It avoids herd effects at expiry. Clients receive a slightly old value while a background task refreshes the key.
Q5. Can a CDN solve all hot key issues?
A CDN helps for HTTP content but you still need proper invalidation and an origin cache plan for dynamic data.
Q6. When should I isolate a key to its own tier?
When a single key dominates traffic for long periods and simple methods fail. Isolation caps blast radius while you redesign.
Further Learning
To strengthen your system design foundation, explore:
- Grokking System Design Fundamentals: Learn the building blocks of caching, consistency, and scalability with practical diagrams.
- Grokking Scalable Systems for Interviews: Master real-world scaling problems and design strategies used in FAANG-level interviews.
GET YOUR FREE
Coding Questions Catalog
$197

$78
$78