How do you implement graceful degradation and load shedding?
Graceful degradation keeps your product useful under stress by dimming or turning off non critical features while protecting core flows and SLOs. Load shedding is the protective gate that refuses new work when the system is near saturation so that admitted requests finish fast. Used together, they keep latency predictable, prevent cascading failures across distributed systems, and make your system design interview answers sound practical and senior level.
Why It Matters
Traffic is bursty, tail latency dominates user experience, and every dependency can fail at the worst time. Graceful degradation gives you a plan to serve something useful when parts of the stack are slow or unavailable. Load shedding protects the system by saying no early instead of timing out late. Together they improve reliability, maintain SLOs under overload, and reduce pager noise. Interviewers listen for this playbook because real systems win by protecting the most important work, not by trying to do everything during a spike.
How It Works Step by Step
1. Map critical journeys and optional features List what must never break checkout, sign in, watch video start from click and what can degrade recommendations, avatars, high resolution images, live counters.
2. Choose overload signals Pick fast and local signals that predict pain queue depth per worker, concurrent requests, CPU saturation, memory pressure, and user facing latency at P95 or P99. Avoid averages.
3. Add admission control at the edge Reject early at CDN or API gateway using token bucket or leaky bucket. Prefer fixed concurrency limits per route. Return a short 503 with Retry After or 429 for client controlled pacing.
4. Classify traffic into priority classes Gold for core flows, silver for important but deferrable, bronze for background or decorative. Give each class its own quota and queue limits. Never allow bronze to starve gold.
5. Implement graceful policies feature by feature Lower video bitrate, shrink image size, hide non essential widgets, collapse expensive joins, switch to cached or slightly stale data, reduce time windows or result counts. Make each policy a flag that can be turned on automatically.
6. Add a brownout controller Tie feature flags to health metrics. As P99 rises or queue depth grows, progressively disable optional features. When the system recovers, restore features smoothly to avoid oscillations.
7. Set strict timeouts, cancellation, and fallbacks Use short server side timeouts with request cancellation. Return partial responses without blocking on slow subcalls. Provide safe defaults when a dependency is unavailable.
8. Shed load where it is cheapest Discard at the door rather than mid pipeline. Use head based admission, drop oldest in low priority queues, or random early drop for bronze traffic. Log a compact event for later analysis.
9. Propagate backpressure across services When a downstream is hot, upstreams slow down. Convert 429 or 503 to smaller retry budgets and exponential backoff with jitter. Keep idempotency keys to avoid duplicate effects.
10. Plan for capacity and autoscaling but do not rely on it Autoscaling helps but reacts after the spike. Degradation and shedding keep the system safe until new capacity arrives.
11. Observe and alert on user centric metrics Track P95 and P99 latency, error rate, queue depth, shed rate, brownout level, and SLO compliance. Alert on trends before hard limits. Export clear events when a policy triggers.
12. Test with game days and failure injection Run synthetic spikes and dependency slowdowns. Verify that policies trigger, that gold requests stay fast, and that recovery is smooth.
Real World Example
Prime Day style traffic hits a retail site. The API gateway caps per tenant concurrency and returns 429 for chatty clients. The product page serves cached recommendations and hides real time inventory counters. Image service switches to medium resolution. Search suggestions are paused. Checkout, sign in, and payment remain top priority and stay within SLO. Orders complete, some non essential sugar is missing, and the system never tips into a meltdown.
Common Pitfalls
Shedding too late in the call chain Rejecting after three downstream hops wastes work and still times out the user. Shed at the edge or at the first hop.
Using averages instead of tail metrics Averages hide pain. Trigger on queue depth, concurrency, and P95 or P99.
All or nothing brownouts Binary on or off creates a jarring user experience. Prefer progressive steps and per feature flags.
Unfair policies across tenants Global drops hurt small tenants while large tenants blast through. Use per tenant quotas and per route limits.
Retry storms from clients Aggressive retries multiply load. Publish client rules with exponential backoff and jitter, plus Retry After hints.
Over relying on autoscaling Scaling lags the spike. You still need admission control and brownouts to bridge the gap.
No visibility or poor messaging Silent degradation confuses teams and users. Emit events and show subtle in product hints when a feature is temporarily simplified.
Interview Tip
When asked how to keep latency under a flash crowd, say you will protect the critical path with priority classes, shed at the gateway using token bucket with conservative retry budgets, and downgrade optional features via a brownout controller tied to P99 and queue depth. Then sketch two or three concrete degradations for that product, for example hide social widgets, serve cached search suggestions, and reduce image sizes. Specifics beat theory in a system design interview.
Key Takeaways
-
Load shedding rejects early to keep admitted work fast and predictable
-
Graceful degradation keeps the product useful by simplifying non essential features
-
Priorities, quotas, and per feature flags turn strategy into code
-
Trigger policies from tail latency, queue depth, and concurrency, not averages
-
Test routinely with game days to verify recovery and fairness
Table of Comparison
| Approach | Primary trigger | What it does | Best use | Main risk |
|---|---|---|---|---|
| Graceful degradation | Rising latency or partial dependency failure | Serves simpler results or hides optional features | Keep core journeys usable during partial outages | Too aggressive can hide important signals or reduce conversion |
| Load shedding | Saturated concurrency or deep queues | Refuses new work with quick error | Protect latency and stability during spikes | Poor fairness can starve small tenants |
| Backpressure | Downstream signals of overload | Slows or pauses upstream senders | Prevent cascades across microservices | Can propagate slowness system wide if not bounded |
| Rate limiting | Policy or contract thresholds | Caps request rate per client or route | Multi tenant fairness and abuse control | Static limits can underutilize capacity |
FAQs
Q1. What is the difference between graceful degradation and load shedding?
Graceful degradation keeps serving by simplifying non essential parts. Load shedding says no to some requests so that accepted ones meet SLO.
Q2. When should I prefer backpressure over load shedding?
Use backpressure when producers can naturally slow down for example internal services or streams. Use shedding at the edge or for untrusted clients that will not slow down.
Q3. Which algorithms help with load shedding?
Token bucket and leaky bucket for admission, per class queues with head based admission, tail drop or random early drop for low priority traffic.
Q4. How do I test my degradation plan?
Run synthetic spikes, inject latency in dependencies, and watch brownout levels and shed rate. Verify that P95 recovers quickly once load falls.
Q5. What status codes should clients see during shedding?
Use 429 for client rate limits and 503 with Retry After for server overload. Keep bodies short and cacheable where safe.
Q6. Does autoscaling remove the need for these tactics?
No. Autoscaling reacts after the spike. Degradation and shedding protect the system during the spike and during dependency slowdowns.
Further Learning
- Level up your overload playbook with the course Grokking Scalable Systems for Interviews at DesignGurus.io. It covers admission control, queues, and capacity planning in depth. Enroll in Grokking Scalable Systems for Interviews
- If you want a structured interview toolkit that ties these ideas to whiteboard friendly patterns, study Grokking the System Design Interview. Start Grokking the System Design Interview
GET YOUR FREE
Coding Questions Catalog
$197

$78
$78