How do you implement graceful shutdown for long‑running requests?

Graceful shutdown is the skill of letting a service finish what it already started without accepting new work, then exiting cleanly. It protects user experience and data integrity during deploys, autoscaling, failures, or maintenance. In practice this means catching a termination signal, draining traffic, finishing in flight work with a deadline, and leaving the system in a known good state.

Why It Matters

A naive kill can drop requests, corrupt data, and lose money. Long running requests make this risk higher because they can outlive short grace periods. In a system design interview, you will be asked how your scalable architecture behaves during restarts and rolling deploys. Clear shutdown plans demonstrate mastery of reliability, correctness under partial failure, and operations readiness across distributed systems.

How It Works Step by Step

1) Catch the signal and flip a flag

The process receives SIGTERM or an equivalent control signal. Immediately set a global shutting down flag. This flag controls every code path that admits new work.

2) Advertise not ready but stay alive

Set readiness to false so load balancers and service discovery stop routing new requests. Keep liveness true so the platform does not restart you mid drain.

3) Stop accepting new work

Reject new connections early. For HTTP keep the listener up for existing connections but do not accept new ones. For gRPC and similar protocols stop new streams and allow current streams to complete.

4) Drain at the edge

Tell the load balancer to drain. With connection draining the balancer stops sending new traffic and waits for existing connections to finish. Set drain time to be larger than the typical long request but bounded by your rollout needs.

5) Extend server timeouts thoughtfully

Increase server write timeouts and idle timeouts during the drain window to avoid aborting slow responses. Pair this with a hard per request deadline to avoid unbounded wait.

6) Finish in flight work with deadlines

Attach a context with a remaining shutdown budget to every in flight operation. If the deadline expires, do a controlled cancel and return a retriable error to the caller.

7) Make writes idempotent and resumable

Use request ids, idempotency keys, or natural primary keys so a retried write does not double charge or duplicate rows. For multi step updates either stage changes or rely on atomic upserts.

8) Checkpoint background jobs

For workers that consume a queue, checkpoint progress and re queue unfinished jobs before exit. Use visibility timeouts or leases so another worker can pick up safely.

9) Handle streaming and websockets

For long lived streams send a friendly close, flush buffers, and close with a code that indicates going away. For server sent events end the stream after a final heartbeat. Encourage clients to reconnect with backoff.

10) Run finalizers

Flush logs and metrics, close pools, persist caches if needed, and write a shutdown metric that records duration and cause. Keep finalizers short and deterministic.

11) Enforce a hard limit

After the grace period, exit. The platform may send SIGKILL which the process cannot intercept. This ensures stuck drains do not block deploys forever.

12) Verify with chaos and load

Practice the entire flow in staging with traffic replay. Simulate mid request shutdowns and ensure correctness properties hold under stress.

Real World Example

Consider a video processing service similar to what a large streaming platform would run. Each request transcodes a clip that can take minutes. The service runs on a container platform. During a rolling deploy the platform sends SIGTERM. The service sets readiness to false so the load balancer stops sending new jobs, but it keeps liveness true. Current transcode tasks continue with a context deadline equal to the remaining grace window. Each job updates progress in a durable store every few seconds. If the deadline expires before completion the worker checkpoints progress and re queues the job with the same idempotency key. Another instance resumes the job from the last checkpoint. Observability shows that at p99 the drain finishes within the configured window. Users never see partial outputs and deploys complete predictably.

Common Pitfalls or Trade offs

1. Ignoring signal handling Processes that do not listen for termination signals are abruptly killed, causing data loss.

2. No separation between readiness and liveness If liveness checks fail early, orchestrators restart services before requests are finished.

3. Non-idempotent writes When retried, they cause duplicated records or billing issues.

4. Long grace periods They delay deployments and scaling operations unnecessarily.

5. Dropping streaming connections For services handling WebSockets or streams, closing connections abruptly triggers reconnection storms and user errors.

6. No observability around shutdown time Without metrics or logs, it’s impossible to detect slow drains or incomplete cleanups.

Interview Tip

If asked how your service rolls during deploys, start with the control plane story then zoom into data safety. Say that SIGTERM flips a shutting down flag, readiness goes false, the edge drains, per request contexts carry remaining budget, writes are idempotent, workers checkpoint and re queue, and a hard limit guarantees progress. Offer one numeric default such as a sixty second grace window with a p99 of forty five seconds.

Key Takeaways

Graceful shutdown protects user experience and data integrity during deploys and scale events.
The pattern is catch signal, stop admission, drain at the edge, finish in flight with deadlines, then exit.
Idempotent writes and resumable jobs convert cancellation into safe retry.
Separate readiness and liveness so the platform does not restart you mid drain.
Measure drain duration and error rates to tune grace windows.

Table of Comparison

Approach	What Happens	Pros	Cons	Best Use Case
Graceful Shutdown	Completes in-flight requests before exit	Reliable, prevents data loss	Adds complexity, requires signal handling	Production-grade APIs, critical services
Immediate Termination	Kills all requests instantly	Fastest stop	Data loss, bad UX	Non-critical batch jobs
Load Balancer Drain Only	Stops routing new traffic but ignores app logic	Simple to configure	In-flight requests may fail	Stateless microservices with short requests
Queue-based Work Reassignment	Re-queues unfinished work	Guarantees eventual completion	Needs durable queues and checkpointing	Async background processing
Checkpoint and Resume	Saves partial progress and resumes later	Great for long-running tasks	Adds complexity in design	Streaming and video processing systems

FAQs

Q1 What is a graceful shutdown for long running requests?

It is a controlled stop where the service halts admission, drains traffic, lets current requests complete with a deadline, persists state, and then exits cleanly.

Q2 How long should the grace window be for long running work?

Pick a window longer than your p99 request time plus a buffer. Start with sixty to one hundred twenty seconds for typical web services and increase for batch workers.

Q3 What is the difference between SIGTERM and SIGKILL?

SIGTERM asks the process to exit and can be handled to start a drain. SIGKILL cannot be caught and forces an immediate stop after the grace window expires.

Q4 How should I handle websockets during shutdown?

Stop admitting new connections, send a final message, flush buffers, and close with a code that indicates going away. Clients should reconnect with backoff.

Q5 Should the service return errors during drain?

Prefer not to. Mark readiness false so callers never reach you. If a request arrives anyway, respond with a retriable code and instruct the client to retry.

Q6 How do I test graceful shutdown behavior?

Run a load test, start long operations, then send a termination signal mid request. Verify no data loss, that retries are idempotent, and that drain duration stays within the window.

Further Learning

Master platform friendly draining patterns in Grokking Scalable Systems for Interviews and learn how to tune grace windows in real rollout scenarios.
Build strong foundations for readiness, liveness, idempotency, and retry logic in Grokking System Design Fundamentals with hands on design exercises.