How do you implement graceful shutdown for long‑running requests?
Graceful shutdown is the skill of letting a service finish what it already started without accepting new work, then exiting cleanly. It protects user experience and data integrity during deploys, autoscaling, failures, or maintenance. In practice this means catching a termination signal, draining traffic, finishing in flight work with a deadline, and leaving the system in a known good state.
Why It Matters
A naive kill can drop requests, corrupt data, and lose money. Long running requests make this risk higher because they can outlive short grace periods. In a system design interview, you will be asked how your scalable architecture behaves during restarts and rolling deploys. Clear shutdown plans demonstrate mastery of reliability, correctness under partial failure, and operations readiness across distributed systems.
How It Works Step by Step
1) Catch the signal and flip a flag
The process receives SIGTERM or an equivalent control signal. Immediately set a global shutting down flag. This flag controls every code path that admits new work.
2) Advertise not ready but stay alive
Set readiness to false so load balancers and service discovery stop routing new requests. Keep liveness true so the platform does not restart you mid drain.
3) Stop accepting new work
Reject new connections early. For HTTP keep the listener up for existing connections but do not accept new ones. For gRPC and similar protocols stop new streams and allow current streams to complete.
4) Drain at the edge
Tell the load balancer to drain. With connection draining the balancer stops sending new traffic and waits for existing connections to finish. Set drain time to be larger than the typical long request but bounded by your rollout needs.
5) Extend server timeouts thoughtfully
Increase server write timeouts and idle timeouts during the drain window to avoid aborting slow responses. Pair this with a hard per request deadline to avoid unbounded wait.
6) Finish in flight work with deadlines
Attach a context with a remaining shutdown budget to every in flight operation. If the deadline expires, do a controlled cancel and return a retriable error to the caller.
7) Make writes idempotent and resumable
Use request ids, idempotency keys, or natural primary keys so a retried write does not double charge or duplicate rows. For multi step updates either stage changes or rely on atomic upserts.
8) Checkpoint background jobs
For workers that consume a queue, checkpoint progress and re queue unfinished jobs before exit. Use visibility timeouts or leases so another worker can pick up safely.
9) Handle streaming and websockets
For long lived streams send a friendly close, flush buffers, and close with a code that indicates going away. For server sent events end the stream after a final heartbeat. Encourage clients to reconnect with backoff.
10) Run finalizers
Flush logs and metrics, close pools, persist caches if needed, and write a shutdown metric that records duration and cause. Keep finalizers short and deterministic.
11) Enforce a hard limit
After the grace period, exit. The platform may send SIGKILL which the process cannot intercept. This ensures stuck drains do not block deploys forever.
12) Verify with chaos and load
Practice the entire flow in staging with traffic replay. Simulate mid request shutdowns and ensure correctness properties hold under stress.
Real World Example
Consider a video processing service similar to what a large streaming platform would run. Each request transcodes a clip that can take minutes. The service runs on a container platform. During a rolling deploy the platform sends SIGTERM. The service sets readiness to false so the load balancer stops sending new jobs, but it keeps liveness true. Current transcode tasks continue with a context deadline equal to the remaining grace window. Each job updates progress in a durable store every few seconds. If the deadline expires before completion the worker checkpoints progress and re queues the job with the same idempotency key. Another instance resumes the job from the last checkpoint. Observability shows that at p99 the drain finishes within the configured window. Users never see partial outputs and deploys complete predictably.
Common Pitfalls or Trade offs
1. Ignoring signal handling Processes that do not listen for termination signals are abruptly killed, causing data loss.
2. No separation between readiness and liveness If liveness checks fail early, orchestrators restart services before requests are finished.
3. Non-idempotent writes When retried, they cause duplicated records or billing issues.
4. Long grace periods They delay deployments and scaling operations unnecessarily.
5. Dropping streaming connections For services handling WebSockets or streams, closing connections abruptly triggers reconnection storms and user errors.
6. No observability around shutdown time Without metrics or logs, it’s impossible to detect slow drains or incomplete cleanups.
Interview Tip
If asked how your service rolls during deploys, start with the control plane story then zoom into data safety. Say that SIGTERM flips a shutting down flag, readiness goes false, the edge drains, per request contexts carry remaining budget, writes are idempotent, workers checkpoint and re queue, and a hard limit guarantees progress. Offer one numeric default such as a sixty second grace window with a p99 of forty five seconds.
Key Takeaways
-
Graceful shutdown protects user experience and data integrity during deploys and scale events.
-
The pattern is catch signal, stop admission, drain at the edge, finish in flight with deadlines, then exit.
-
Idempotent writes and resumable jobs convert cancellation into safe retry.
-
Separate readiness and liveness so the platform does not restart you mid drain.
-
Measure drain duration and error rates to tune grace windows.
Table of Comparison
| Approach | What Happens | Pros | Cons | Best Use Case |
|---|---|---|---|---|
| Graceful Shutdown | Completes in-flight requests before exit | Reliable, prevents data loss | Adds complexity, requires signal handling | Production-grade APIs, critical services |
| Immediate Termination | Kills all requests instantly | Fastest stop | Data loss, bad UX | Non-critical batch jobs |
| Load Balancer Drain Only | Stops routing new traffic but ignores app logic | Simple to configure | In-flight requests may fail | Stateless microservices with short requests |
| Queue-based Work Reassignment | Re-queues unfinished work | Guarantees eventual completion | Needs durable queues and checkpointing | Async background processing |
| Checkpoint and Resume | Saves partial progress and resumes later | Great for long-running tasks | Adds complexity in design | Streaming and video processing systems |
FAQs
Q1 What is a graceful shutdown for long running requests?
It is a controlled stop where the service halts admission, drains traffic, lets current requests complete with a deadline, persists state, and then exits cleanly.
Q2 How long should the grace window be for long running work?
Pick a window longer than your p99 request time plus a buffer. Start with sixty to one hundred twenty seconds for typical web services and increase for batch workers.
Q3 What is the difference between SIGTERM and SIGKILL?
SIGTERM asks the process to exit and can be handled to start a drain. SIGKILL cannot be caught and forces an immediate stop after the grace window expires.
Q4 How should I handle websockets during shutdown?
Stop admitting new connections, send a final message, flush buffers, and close with a code that indicates going away. Clients should reconnect with backoff.
Q5 Should the service return errors during drain?
Prefer not to. Mark readiness false so callers never reach you. If a request arrives anyway, respond with a retriable code and instruct the client to retry.
Q6 How do I test graceful shutdown behavior?
Run a load test, start long operations, then send a termination signal mid request. Verify no data loss, that retries are idempotent, and that drain duration stays within the window.
Further Learning
-
Master platform friendly draining patterns in Grokking Scalable Systems for Interviews and learn how to tune grace windows in real rollout scenarios.
-
Build strong foundations for readiness, liveness, idempotency, and retry logic in Grokking System Design Fundamentals with hands on design exercises.
GET YOUR FREE
Coding Questions Catalog
$197

$78
$78