How do you mitigate hot keys in a sharded cache?

A hot key is a cache entry that receives outsized traffic compared to the average key. In a sharded cache, consistent hashing places that key on a single shard. All requests pile onto that shard which creates an imbalance. The result is uneven utilization, long queues on one node, and cascading failures that ripple into downstream services and databases. This guide explains practical ways to mitigate hot keys that work in both interviews and production.

Why It Matters

Hot keys show up in real systems whenever you have celebrities, trending items, popular home pages, or global feature flags. The problem appears at any scale once traffic is skewed. Interviewers love this topic because it tests your ability to reason about uneven load, distributed systems, and scalable architecture beyond the happy path. A strong answer proves you can diagnose heavy hitters, select the right mitigation, and discuss the cost of each choice.

How It Works Step by Step

  1. Measure the skew

    Start with visibility. Track per key request counts and per shard queue depth. Use approximate heavy hitter detection to keep cost low such as a count min sketch or top K with sampling. Add alarms for any single key that exceeds a percent of total QPS.

  2. Add a small near cache on each app node

    L one per process caching absorbs repeated reads for hot items without extra cross network hops. Keep TTL short and add random jitter so keys do not expire together. This alone flattens many spikes.

  3. Collapse duplicate requests

    Request coalescing means only one thread fetches a missing key while others wait for that result. Use a single flight style lock keyed by cache key. This removes the stampede on miss or at expiry.

  4. Stripe the hot key across replicas

    Write the same value under N salted keys such as user one two three colon profile colon zero to N minus one. On read pick one replica at random or round robin. On write update all replicas in parallel or via a background fanout. This spreads load across shards at the cost of write amplification and more memory.

  5. Refresh ahead with serve stale

    Do not let a hot key expire in the middle of peak traffic. Refresh it before TTL using a background task. If a miss happens anyway, serve stale for a short window while refreshing in the background. Add TTL jitter to avoid sync expiry.

  6. Use read replicas or multi cluster reads

    Some caches support replica reads. For extreme hot spots, replicate the shard and allow reads from followers. You trade strict freshness for higher read capacity.

  7. Isolate the key to a dedicated tier

    Move a few top hot keys to a separate small pool with more resources. This is heavy weight but effective when a handful of keys dominate traffic.

  8. Edge cache where possible

    For content over HTTP place a CDN or regional cache in front. Push invalidations on updates. This removes load before it reaches your cache shards.

  9. Change the data shape

    If one key aggregates everything, split the value into smaller parts or paginate. Many small keys with parallel fetch is easier to scale than one giant hot key.

  10. Rate limit and degrade gracefully

    When all else fails, throttle requests for a short period or serve a simplified response. Protect the system first, then recover.

Real World Example

Consider a social app profile page for a global celebrity. The profile object is cached and lives on a single shard. A live event starts. Traffic spikes for that one profile and overwhelms its shard while other shards sit idle. A practical fix is to replicate the profile cache value into thirty two salted keys. The app reads a random replica and writes fan out to all replicas on update. A background job refreshes ahead every few seconds with jitter. Each app node keeps a tiny near cache so most reads never leave the process. If the user posts a new update, the write path bumps the version and triggers a fast refresh, while stale while revalidate serves the old value for a brief window. The shard imbalance disappears and the database load stays flat.

Common Pitfalls or Trade offs

  • Replication multiplies memory and write cost. Use it only for a small top K set.

  • Coalescing requires careful locks. If the worker fails, all waiters might time out. Add timeouts and circuit breakers.

  • Refresh ahead without jitter creates synchronized storms. Always randomize.

  • Multi cluster reads relax consistency. Know your freshness needs and service level goals.

  • Edge caching adds another invalidation path. Keep a single source of truth for versioning and purge rules.

  • Isolating keys in a special tier increases operational complexity. Document ownership and on call plans.

Interview Tip

Expect a follow up like this. A single cache shard is at ninety percent CPU due to one hot key. You can add only one new technique before the next peak. What do you choose and why. A strong answer picks request coalescing or read striping with replicas, explains the trade off, and shows how you would verify the fix with metrics and a rollback plan.

Key Takeaways

  • Hot keys are a skew problem not a global capacity problem.

  • Start with visibility then remove stampedes with near cache and coalescing.

  • Stripe reads across replicas for a tiny top K set and refresh ahead with jitter.

  • Use edge caching and isolation when traffic is extreme.

  • Always measure tail latency and shard balance before and after any change.

Table of Comparison

StrategyBest ForCore IdeaProsCons
Near cache on app nodesRepeated reads from same nodeStore small in-memory cache with short TTL and jitterReduces network hops and absorbs spikesRisk of stale data and added memory use
Request coalescingMiss storms or expiry stampedesOne worker fetches value, others waitEliminates duplicate requestsNeeds careful lock and timeout handling
Striped replicas of hot keyRead-heavy hot spotsWrite same value to multiple salted keysBalances load across shardsWrite amplification and memory overhead
Refresh ahead + Serve stalePredictable expiry patternsRefresh before TTL and serve stale windowPrevents thundering herdRequires background jobs and version control
Replica reads / multi-clusterExtreme read hot spotsAllow reads from replica shardsHigh read capacity, better availabilityLower consistency and higher cost
Key isolation tierOne or two dominant keysMove to separate high-capacity tierLimits blast radiusAdds operational complexity
Edge cache / CDNHTTP content or global readsCache near user edge with invalidationsOffloads load earlyComplex purge logic
Data model splitLarge monolithic keysBreak data into smaller chunksParallelism and efficient cachingIncreases application logic complexity

FAQs

Q1. What is a hot key in a cache?

A hot key is a cache entry that gets a large share of requests compared to others which overloads the shard that owns it.

Q2. How do I detect hot keys quickly?

Collect per key request counts with sampling and use heavy hitter sketches to track the top K without full cardinality scans.

Q3. Is replication of hot keys always the best fix?

No. Replication helps read heavy keys but increases memory and write cost. Start with near cache and coalescing first.

Q4. How does serve stale help with hot keys?

It avoids herd effects at expiry. Clients receive a slightly old value while a background task refreshes the key.

Q5. Can a CDN solve all hot key issues?

A CDN helps for HTTP content but you still need proper invalidation and an origin cache plan for dynamic data.

Q6. When should I isolate a key to its own tier?

When a single key dominates traffic for long periods and simple methods fail. Isolation caps blast radius while you redesign.

Further Learning

To strengthen your system design foundation, explore:

TAGS
System Design Interview
System Design Fundamentals
CONTRIBUTOR
Design Gurus Team
-

GET YOUR FREE

Coding Questions Catalog

Design Gurus Newsletter - Latest from our Blog
Boost your coding skills with our essential coding questions catalog.
Take a step towards a better tech career now!
Image
One-Stop Portal For Tech Interviews.
Copyright © 2025 Design Gurus, LLC. All rights reserved.