|
| 1 | +# Graceful shutdown |
| 2 | + |
| 3 | +Nova includes built-in graceful shutdown support for safe deployments. When the BEAM VM receives a `SIGTERM` signal (or the application is stopped), Nova will: |
| 4 | + |
| 5 | +1. Wait an optional delay for load balancers to stop routing traffic |
| 6 | +2. Suspend the Cowboy listener (stop accepting new connections) |
| 7 | +3. Wait for in-flight requests to complete |
| 8 | +4. Stop the listener |
| 9 | + |
| 10 | +This ensures that active requests are served before the node exits. |
| 11 | + |
| 12 | +## Configuration |
| 13 | + |
| 14 | +The following parameters can be set under the `nova` key in your `sys.config`: |
| 15 | + |
| 16 | +| Key | Description | Default | |
| 17 | +|-----|-------------|---------| |
| 18 | +| `shutdown_delay` | Milliseconds to wait before suspending the listener. Gives load balancers time to remove the node from their routing pool. | `0` | |
| 19 | +| `shutdown_drain_timeout` | Maximum milliseconds to wait for active connections to finish after the listener is suspended. | `15000` | |
| 20 | + |
| 21 | +Example configuration: |
| 22 | + |
| 23 | +```erlang |
| 24 | +{nova, [ |
| 25 | + {bootstrap_application, my_app}, |
| 26 | + {shutdown_delay, 5000}, |
| 27 | + {shutdown_drain_timeout, 15000} |
| 28 | +]} |
| 29 | +``` |
| 30 | + |
| 31 | +## Kubernetes deployments |
| 32 | + |
| 33 | +When deploying Nova to Kubernetes, the shutdown sequence interacts with the pod lifecycle: |
| 34 | + |
| 35 | +1. A new pod starts and passes its readiness probe |
| 36 | +2. Kubernetes sends `SIGTERM` to the old pod |
| 37 | +3. **In parallel**: the pod is removed from Service endpoints and the BEAM receives the signal |
| 38 | + |
| 39 | +There is a propagation delay (typically 1-5 seconds) between Kubernetes sending `SIGTERM` and all load balancers/proxies updating their routing tables. During this window, new requests can still arrive at the shutting-down pod. |
| 40 | + |
| 41 | +### Recommended settings |
| 42 | + |
| 43 | +Set `shutdown_delay` to cover the propagation window: |
| 44 | + |
| 45 | +```erlang |
| 46 | +{nova, [ |
| 47 | + {bootstrap_application, my_app}, |
| 48 | + {shutdown_delay, 5000}, |
| 49 | + {shutdown_drain_timeout, 15000} |
| 50 | +]} |
| 51 | +``` |
| 52 | + |
| 53 | +Set `terminationGracePeriodSeconds` in your pod spec higher than the sum of `shutdown_delay` and `shutdown_drain_timeout` to avoid a `SIGKILL` before drain completes: |
| 54 | + |
| 55 | +```yaml |
| 56 | +spec: |
| 57 | + terminationGracePeriodSeconds: 30 |
| 58 | + containers: |
| 59 | + - name: my_app |
| 60 | + # ... |
| 61 | +``` |
| 62 | + |
| 63 | +> #### Important |
| 64 | +> |
| 65 | +> Without `shutdown_delay`, you may see occasional 502 errors during rolling deployments because requests reach a pod that has already stopped accepting connections. |
| 66 | +
|
| 67 | +### Health probes |
| 68 | + |
| 69 | +If you use readiness probes, your health endpoint should reflect the application state. During shutdown, the endpoint should return an error status so Kubernetes stops routing traffic sooner. The [nova_resilience](https://github.com/novaframework/nova_resilience) library provides a health gate that automatically returns 503 during shutdown. |
| 70 | + |
| 71 | +## How it works |
| 72 | + |
| 73 | +Nova implements graceful shutdown in `nova_app:prep_stop/1`, which is called by OTP before the supervision tree is terminated. The sequence is: |
| 74 | + |
| 75 | +1. **Delay** — Sleep for `shutdown_delay` milliseconds. During this time, the listener is still active and serving requests normally. This covers the load balancer propagation window. |
| 76 | +2. **Suspend** — Call `ranch:suspend_listener(nova_listener)` to stop accepting new TCP connections. Existing connections continue to be served. |
| 77 | +3. **Drain** — Poll `ranch:info/1` every 500ms until active connections reach zero or `shutdown_drain_timeout` is exceeded. |
| 78 | +4. **Stop** — Call `cowboy:stop_listener(nova_listener)` to fully shut down the listener. |
| 79 | + |
| 80 | +After `prep_stop` returns, OTP proceeds with the normal supervision tree shutdown. |
0 commit comments