Skip to content

Commit 664928b

Browse files
authored
docs: add graceful shutdown guide (#379)
1 parent 7e8b89e commit 664928b

File tree

3 files changed

+83
-0
lines changed

3 files changed

+83
-0
lines changed

guides/configuration.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,8 @@ Following parameters should be defined under the `nova`-key in your *sys.config*
3434
| `use_strict_routing` | If the routing module should work under the strict mode. Using strict mode will cause errors if non-deterministic paths are detected. This is a beta-function so use with caution. | `boolean()` | `false` |
3535
| `bootstrap_application` | Define which application to bootstrap with Nova. This should be the name of your application. | `atom()` | *Will crash if not defined* |
3636
| `cowboy_configuration` | If you need some additional configuration done to Cowboy this is the place. Check `nova_sup` module to learn which keys that can be defined. | `map()` | `#{}` |
37+
| `shutdown_delay` | Milliseconds to wait before suspending the listener during shutdown. Useful for letting load balancers drain traffic. See the [Graceful shutdown](graceful-shutdown.md) guide. | `integer()` | `0` |
38+
| `shutdown_drain_timeout` | Maximum milliseconds to wait for active connections to finish during shutdown. See the [Graceful shutdown](graceful-shutdown.md) guide. | `integer()` | `15000` |
3739

3840
## Application parameters
3941

guides/graceful-shutdown.md

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
# Graceful shutdown
2+
3+
Nova includes built-in graceful shutdown support for safe deployments. When the BEAM VM receives a `SIGTERM` signal (or the application is stopped), Nova will:
4+
5+
1. Wait an optional delay for load balancers to stop routing traffic
6+
2. Suspend the Cowboy listener (stop accepting new connections)
7+
3. Wait for in-flight requests to complete
8+
4. Stop the listener
9+
10+
This ensures that active requests are served before the node exits.
11+
12+
## Configuration
13+
14+
The following parameters can be set under the `nova` key in your `sys.config`:
15+
16+
| Key | Description | Default |
17+
|-----|-------------|---------|
18+
| `shutdown_delay` | Milliseconds to wait before suspending the listener. Gives load balancers time to remove the node from their routing pool. | `0` |
19+
| `shutdown_drain_timeout` | Maximum milliseconds to wait for active connections to finish after the listener is suspended. | `15000` |
20+
21+
Example configuration:
22+
23+
```erlang
24+
{nova, [
25+
{bootstrap_application, my_app},
26+
{shutdown_delay, 5000},
27+
{shutdown_drain_timeout, 15000}
28+
]}
29+
```
30+
31+
## Kubernetes deployments
32+
33+
When deploying Nova to Kubernetes, the shutdown sequence interacts with the pod lifecycle:
34+
35+
1. A new pod starts and passes its readiness probe
36+
2. Kubernetes sends `SIGTERM` to the old pod
37+
3. **In parallel**: the pod is removed from Service endpoints and the BEAM receives the signal
38+
39+
There is a propagation delay (typically 1-5 seconds) between Kubernetes sending `SIGTERM` and all load balancers/proxies updating their routing tables. During this window, new requests can still arrive at the shutting-down pod.
40+
41+
### Recommended settings
42+
43+
Set `shutdown_delay` to cover the propagation window:
44+
45+
```erlang
46+
{nova, [
47+
{bootstrap_application, my_app},
48+
{shutdown_delay, 5000},
49+
{shutdown_drain_timeout, 15000}
50+
]}
51+
```
52+
53+
Set `terminationGracePeriodSeconds` in your pod spec higher than the sum of `shutdown_delay` and `shutdown_drain_timeout` to avoid a `SIGKILL` before drain completes:
54+
55+
```yaml
56+
spec:
57+
terminationGracePeriodSeconds: 30
58+
containers:
59+
- name: my_app
60+
# ...
61+
```
62+
63+
> #### Important
64+
>
65+
> Without `shutdown_delay`, you may see occasional 502 errors during rolling deployments because requests reach a pod that has already stopped accepting connections.
66+
67+
### Health probes
68+
69+
If you use readiness probes, your health endpoint should reflect the application state. During shutdown, the endpoint should return an error status so Kubernetes stops routing traffic sooner. The [nova_resilience](https://github.com/novaframework/nova_resilience) library provides a health gate that automatically returns 503 during shutdown.
70+
71+
## How it works
72+
73+
Nova implements graceful shutdown in `nova_app:prep_stop/1`, which is called by OTP before the supervision tree is terminated. The sequence is:
74+
75+
1. **Delay** — Sleep for `shutdown_delay` milliseconds. During this time, the listener is still active and serving requests normally. This covers the load balancer propagation window.
76+
2. **Suspend** — Call `ranch:suspend_listener(nova_listener)` to stop accepting new TCP connections. Existing connections continue to be served.
77+
3. **Drain** — Poll `ranch:info/1` every 500ms until active connections reach zero or `shutdown_drain_timeout` is exceeded.
78+
4. **Stop** — Call `cowboy:stop_listener(nova_listener)` to fully shut down the listener.
79+
80+
After `prep_stop` returns, OTP proceeds with the normal supervision tree shutdown.

rebar.config

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,7 @@
7171
<<"guides/handlers.md">>,
7272
<<"guides/plugins.md">>,
7373
<<"guides/pubsub.md">>,
74+
<<"guides/graceful-shutdown.md">>,
7475
<<"guides/building-releases.md">>,
7576
<<"guides/books-and-links.md">>,
7677
<<"guides/rebar3_nova.md">>]},

0 commit comments

Comments
 (0)