How do you implement graceful degradation and load shedding?

Graceful degradation keeps your product useful under stress by dimming or turning off non critical features while protecting core flows and SLOs. Load shedding is the protective gate that refuses new work when the system is near saturation so that admitted requests finish fast. Used together, they keep latency predictable, prevent cascading failures across distributed systems, and make your system design interview answers sound practical and senior level.

Why It Matters

Traffic is bursty, tail latency dominates user experience, and every dependency can fail at the worst time. Graceful degradation gives you a plan to serve something useful when parts of the stack are slow or unavailable. Load shedding protects the system by saying no early instead of timing out late. Together they improve reliability, maintain SLOs under overload, and reduce pager noise. Interviewers listen for this playbook because real systems win by protecting the most important work, not by trying to do everything during a spike.

How It Works Step by Step

1. Map critical journeys and optional features List what must never break checkout, sign in, watch video start from click and what can degrade recommendations, avatars, high resolution images, live counters.

2. Choose overload signals Pick fast and local signals that predict pain queue depth per worker, concurrent requests, CPU saturation, memory pressure, and user facing latency at P95 or P99. Avoid averages.

3. Add admission control at the edge Reject early at CDN or API gateway using token bucket or leaky bucket. Prefer fixed concurrency limits per route. Return a short 503 with Retry After or 429 for client controlled pacing.

4. Classify traffic into priority classes Gold for core flows, silver for important but deferrable, bronze for background or decorative. Give each class its own quota and queue limits. Never allow bronze to starve gold.

5. Implement graceful policies feature by feature Lower video bitrate, shrink image size, hide non essential widgets, collapse expensive joins, switch to cached or slightly stale data, reduce time windows or result counts. Make each policy a flag that can be turned on automatically.

6. Add a brownout controller Tie feature flags to health metrics. As P99 rises or queue depth grows, progressively disable optional features. When the system recovers, restore features smoothly to avoid oscillations.

7. Set strict timeouts, cancellation, and fallbacks Use short server side timeouts with request cancellation. Return partial responses without blocking on slow subcalls. Provide safe defaults when a dependency is unavailable.

8. Shed load where it is cheapest Discard at the door rather than mid pipeline. Use head based admission, drop oldest in low priority queues, or random early drop for bronze traffic. Log a compact event for later analysis.

9. Propagate backpressure across services When a downstream is hot, upstreams slow down. Convert 429 or 503 to smaller retry budgets and exponential backoff with jitter. Keep idempotency keys to avoid duplicate effects.

10. Plan for capacity and autoscaling but do not rely on it Autoscaling helps but reacts after the spike. Degradation and shedding keep the system safe until new capacity arrives.

11. Observe and alert on user centric metrics Track P95 and P99 latency, error rate, queue depth, shed rate, brownout level, and SLO compliance. Alert on trends before hard limits. Export clear events when a policy triggers.

12. Test with game days and failure injection Run synthetic spikes and dependency slowdowns. Verify that policies trigger, that gold requests stay fast, and that recovery is smooth.

Real World Example

Prime Day style traffic hits a retail site. The API gateway caps per tenant concurrency and returns 429 for chatty clients. The product page serves cached recommendations and hides real time inventory counters. Image service switches to medium resolution. Search suggestions are paused. Checkout, sign in, and payment remain top priority and stay within SLO. Orders complete, some non essential sugar is missing, and the system never tips into a meltdown.

Common Pitfalls

Shedding too late in the call chain Rejecting after three downstream hops wastes work and still times out the user. Shed at the edge or at the first hop.

Using averages instead of tail metrics Averages hide pain. Trigger on queue depth, concurrency, and P95 or P99.

All or nothing brownouts Binary on or off creates a jarring user experience. Prefer progressive steps and per feature flags.

Unfair policies across tenants Global drops hurt small tenants while large tenants blast through. Use per tenant quotas and per route limits.

Retry storms from clients Aggressive retries multiply load. Publish client rules with exponential backoff and jitter, plus Retry After hints.

Over relying on autoscaling Scaling lags the spike. You still need admission control and brownouts to bridge the gap.

No visibility or poor messaging Silent degradation confuses teams and users. Emit events and show subtle in product hints when a feature is temporarily simplified.

Interview Tip

When asked how to keep latency under a flash crowd, say you will protect the critical path with priority classes, shed at the gateway using token bucket with conservative retry budgets, and downgrade optional features via a brownout controller tied to P99 and queue depth. Then sketch two or three concrete degradations for that product, for example hide social widgets, serve cached search suggestions, and reduce image sizes. Specifics beat theory in a system design interview.

Key Takeaways

  • Load shedding rejects early to keep admitted work fast and predictable

  • Graceful degradation keeps the product useful by simplifying non essential features

  • Priorities, quotas, and per feature flags turn strategy into code

  • Trigger policies from tail latency, queue depth, and concurrency, not averages

  • Test routinely with game days to verify recovery and fairness

Table of Comparison

ApproachPrimary triggerWhat it doesBest useMain risk
Graceful degradationRising latency or partial dependency failureServes simpler results or hides optional featuresKeep core journeys usable during partial outagesToo aggressive can hide important signals or reduce conversion
Load sheddingSaturated concurrency or deep queuesRefuses new work with quick errorProtect latency and stability during spikesPoor fairness can starve small tenants
BackpressureDownstream signals of overloadSlows or pauses upstream sendersPrevent cascades across microservicesCan propagate slowness system wide if not bounded
Rate limitingPolicy or contract thresholdsCaps request rate per client or routeMulti tenant fairness and abuse controlStatic limits can underutilize capacity

FAQs

Q1. What is the difference between graceful degradation and load shedding?

Graceful degradation keeps serving by simplifying non essential parts. Load shedding says no to some requests so that accepted ones meet SLO.

Q2. When should I prefer backpressure over load shedding?

Use backpressure when producers can naturally slow down for example internal services or streams. Use shedding at the edge or for untrusted clients that will not slow down.

Q3. Which algorithms help with load shedding?

Token bucket and leaky bucket for admission, per class queues with head based admission, tail drop or random early drop for low priority traffic.

Q4. How do I test my degradation plan?

Run synthetic spikes, inject latency in dependencies, and watch brownout levels and shed rate. Verify that P95 recovers quickly once load falls.

Q5. What status codes should clients see during shedding?

Use 429 for client rate limits and 503 with Retry After for server overload. Keep bodies short and cacheable where safe.

Q6. Does autoscaling remove the need for these tactics?

No. Autoscaling reacts after the spike. Degradation and shedding protect the system during the spike and during dependency slowdowns.

Further Learning

TAGS
System Design Interview
System Design Fundamentals
CONTRIBUTOR
Design Gurus Team
-

GET YOUR FREE

Coding Questions Catalog

Design Gurus Newsletter - Latest from our Blog
Boost your coding skills with our essential coding questions catalog.
Take a step towards a better tech career now!
Image
One-Stop Portal For Tech Interviews.
Copyright © 2025 Design Gurus, LLC. All rights reserved.