Graceful Shutdown¶
Graceful shutdown allows in-flight requests to complete before the gateway terminates, preventing request failures during deployments and restarts.
Overview¶
Clean Termination¶
Allow existing requests to finish rather than abruptly closing connections.
Zero-Downtime Deployments¶
Deploy updates without causing client-visible errors.
Configurable Grace Period¶
Control how long to wait for in-flight requests.
API Control¶
Trigger shutdown programmatically via HTTP API.
Why Graceful Shutdown?¶
Without graceful shutdown:
- Abrupt termination: Active requests are immediately disconnected
- Client errors: In-flight requests return connection errors
- Data loss: Streaming responses may be truncated
- Deployment failures: Rolling updates cause visible errors
With graceful shutdown:
- Request completion: Active requests finish normally
- No client errors: Users don't see deployment-related failures
- Clean streaming: Streaming responses complete before shutdown
- Smooth deployments: Zero-downtime rolling updates
How It Works¶
Shutdown Sequence¶
- Shutdown signal received (SIGTERM, SIGINT, or API call)
- Stop accepting new requests - New connections are rejected with 503
- Drain in-flight requests - Existing requests continue processing
- Grace period timer starts - After
shutdown-grace-period-secs, force shutdown - Clean exit - Once all requests complete (or grace period expires)
Configuration¶
Parameters¶
| Parameter | Default | Description |
|---|---|---|
--shutdown-grace-period-secs | 180 (3 min) | Time to wait for in-flight requests |
Recommended Configurations¶
Production Standard¶
Balanced grace period for typical workloads.
Use when: Standard production deployments
Batch Processing¶
Long grace period for long-running requests.
Use when: Batch inference, long-running generations
Critical Low-Latency¶
Minimal grace for latency-sensitive systems.
Use when: Very short requests, rapid scaling
Triggering Shutdown¶
Via Signal¶
# Find the SMG process
pgrep -f smg
# Send SIGTERM for graceful shutdown
kill -TERM <pid>
# Or SIGINT (Ctrl+C in terminal)
kill -INT <pid>
Via API¶
Kubernetes Integration¶
Kubernetes sends SIGTERM by default when terminating pods. Configure terminationGracePeriodSeconds to match or exceed your SMG grace period:
apiVersion: apps/v1
kind: Deployment
metadata:
name: smg
spec:
template:
spec:
terminationGracePeriodSeconds: 210 # SMG grace + buffer
containers:
- name: smg
args:
- --shutdown-grace-period-secs=180
Kubernetes timeout
Kubernetes will force-kill the pod after terminationGracePeriodSeconds. Set this higher than --shutdown-grace-period-secs to ensure SMG has time to complete its graceful shutdown.
Sizing the Grace Period¶
Consider these factors when setting the grace period:
| Factor | Impact on Grace Period |
|---|---|
| Average request duration | Grace period should exceed typical request time |
| Longest expected request | Batch jobs may need longer grace periods |
| Streaming responses | Long streams need extended grace periods |
| Deployment frequency | Frequent deployments may need shorter periods |
| Scaling responsiveness | Autoscaling may need faster termination |
Calculation Guidelines¶
Example: If your average request is 30s, p99 is 60s, and max streaming is 120s:
Integration with Load Balancers¶
For zero-downtime deployments, coordinate with your load balancer:
Pre-Stop Hook (Kubernetes)¶
Remove the pod from the load balancer before shutdown:
The sleep allows the load balancer to stop sending new traffic before SMG begins its graceful shutdown.
Health Check Coordination¶
During shutdown, SMG's health endpoint can return unhealthy to signal load balancers:
# Health check during normal operation
curl http://gateway:3001/health
# Returns 200 OK
# During graceful shutdown
curl http://gateway:3001/health
# Returns 503 Service Unavailable
Monitoring¶
Shutdown Events¶
Watch logs for shutdown-related messages:
# Graceful shutdown initiated
[INFO] Received shutdown signal, starting graceful shutdown
[INFO] Stopping new request acceptance
[INFO] Waiting for 5 in-flight requests to complete
# Requests completing
[INFO] In-flight requests: 5 -> 4
[INFO] In-flight requests: 4 -> 3
...
# Clean exit
[INFO] All requests completed, shutting down
Metrics During Shutdown¶
| Metric | Observation |
|---|---|
smg_requests_active | Should decrease towards 0 |
smg_requests_total | New requests should stop |
smg_shutdown_in_progress | 1 during graceful shutdown |
Tuning Guidelines¶
| Symptom | Potential Adjustment |
|---|---|
| Requests failing during deployment | Increase --shutdown-grace-period-secs |
| Slow scaling down | Decrease --shutdown-grace-period-secs |
| Kubernetes force-killing pods | Increase terminationGracePeriodSeconds |
| Streaming responses truncated | Match grace period to max stream duration |