Skip to content

Commit 12f462c

Browse files
committed
docs: add README, getting-started, adapters, and shutdown guides
Prepare for hex publishing with ex_doc guides covering: - Quick start and K8s deployment - Built-in and custom adapters - Graceful shutdown lifecycle and priority ordering
1 parent 757ddd3 commit 12f462c

5 files changed

Lines changed: 483 additions & 6 deletions

File tree

README.md

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
# nova_resilience
2+
3+
Production-grade resilience patterns for [Nova](https://github.com/novaframework/nova) web applications.
4+
5+
Bridges Nova and [Seki](https://github.com/Taure/seki) to provide dependency health checking, Kubernetes-ready probes, circuit breakers, bulkheads, and ordered graceful shutdown — all via declarative configuration.
6+
7+
## Quick start
8+
9+
Add to your deps:
10+
11+
```erlang
12+
{deps, [
13+
nova,
14+
seki,
15+
nova_resilience
16+
]}.
17+
```
18+
19+
Add to your app's `applications`:
20+
21+
```erlang
22+
{applications, [kernel, stdlib, nova, seki, nova_resilience]}.
23+
```
24+
25+
Register health routes in your Nova config:
26+
27+
```erlang
28+
{my_app, [
29+
{nova_apps, [nova_resilience]}
30+
]}.
31+
```
32+
33+
Configure dependencies:
34+
35+
```erlang
36+
{nova_resilience, [
37+
{dependencies, [
38+
#{name => primary_db,
39+
type => database,
40+
adapter => pgo,
41+
pool => default,
42+
critical => true,
43+
shutdown_priority => 2}
44+
]}
45+
]}.
46+
```
47+
48+
That's it. Your app now has `/health`, `/ready`, and `/live` endpoints, automatic startup gating, and ordered shutdown.
49+
50+
## What it does
51+
52+
### Startup
53+
54+
1. App starts, nova_resilience provisions health checks for each dependency
55+
2. `/ready` returns **503** until all critical dependencies are healthy
56+
3. Kubernetes readiness probe detects this and holds traffic
57+
4. Once all critical deps respond, `/ready` returns **200** and traffic flows
58+
59+
### Running
60+
61+
Execute calls through the resilience stack:
62+
63+
```erlang
64+
case nova_resilience:call(primary_db, fun() ->
65+
pgo:query(<<"SELECT * FROM users WHERE id = $1">>, [Id])
66+
end) of
67+
{ok, #{rows := Rows}} -> {json, #{users => Rows}};
68+
{error, circuit_open} -> {json, 503, #{}, #{error => <<"db unavailable">>}};
69+
{error, bulkhead_full} -> {json, 503, #{}, #{error => <<"overloaded">>}}
70+
end.
71+
```
72+
73+
### Shutdown
74+
75+
On SIGTERM (or application stop):
76+
77+
1. `/ready` immediately returns **503** (load balancer stops sending traffic)
78+
2. Waits `shutdown_delay` for in-flight LB health checks to propagate
79+
3. Tears down dependencies in `shutdown_priority` order
80+
4. Nova drains HTTP connections and stops
81+
82+
No manual `prep_stop` calls needed — shutdown is fully automatic.
83+
84+
## Health endpoints
85+
86+
| Endpoint | Purpose | Response |
87+
|----------|---------|----------|
88+
| `GET /health` | Full health report | `{"status":"healthy","dependencies":{...},"vm":{...}}` |
89+
| `GET /ready` | Kubernetes readiness probe | 200 when ready, 503 when not |
90+
| `GET /live` | Kubernetes liveness probe | 200 if process is responsive |
91+
92+
## Configuration
93+
94+
```erlang
95+
{nova_resilience, [
96+
{dependencies, [...]}, %% List of dependency configs
97+
{health_check_interval, 10000}, %% ms between health checks
98+
{vm_checks, true}, %% Include BEAM VM in health report
99+
{gate_timeout, 30000}, %% Max ms to wait for deps on startup
100+
{shutdown_delay, 5000}, %% ms to wait after marking not-ready
101+
{shutdown_drain_timeout, 15000},%% Max ms to drain per priority group
102+
{health_prefix, <<"">>} %% Prefix for health routes
103+
]}.
104+
```
105+
106+
## Built-in adapters
107+
108+
| Type | Adapter | Auto health check |
109+
|------|---------|-------------------|
110+
| `database` | `pgo` (default) | `SELECT 1` via pgo |
111+
| `database` | `kura` | `SELECT 1` via kura repo |
112+
| `kafka` | `brod` | `brod:get_partitions_count/2` |
113+
| any | custom module | Implement `nova_resilience_adapter` behaviour |
114+
115+
## License
116+
117+
Apache-2.0

guides/adapters.md

Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
# Adapters
2+
3+
Adapters provide built-in health checks and shutdown logic for known dependency types. You can use built-in adapters or write your own.
4+
5+
## Built-in adapters
6+
7+
### pgo (default for `database` type)
8+
9+
Health check runs `SELECT 1` against the pgo pool.
10+
11+
```erlang
12+
#{name => primary_db,
13+
type => database,
14+
%% adapter => pgo is implicit
15+
pool => default}
16+
```
17+
18+
Optional `pool` field — defaults to pgo's default pool if omitted.
19+
20+
### kura
21+
22+
Health check runs `SELECT 1` through the kura repo layer.
23+
24+
```erlang
25+
#{name => primary_db,
26+
type => database,
27+
adapter => kura,
28+
repo => my_repo}
29+
```
30+
31+
The `repo` field is required — it's the kura repo module that implements `kura_repo` behaviour.
32+
33+
### brod (default for `kafka` type)
34+
35+
Health check calls `brod:get_partitions_count/2` to verify broker connectivity.
36+
37+
```erlang
38+
#{name => events,
39+
type => kafka,
40+
client => my_brod_client,
41+
topic => <<"events">>}
42+
```
43+
44+
Both `client` and `topic` are required.
45+
46+
## Custom adapters
47+
48+
Implement the `nova_resilience_adapter` behaviour:
49+
50+
```erlang
51+
-module(my_redis_adapter).
52+
-behaviour(nova_resilience_adapter).
53+
54+
-export([health_check/1, wrap_call/2, shutdown/1]).
55+
56+
health_check(#{pool := Pool}) ->
57+
case eredis:q(Pool, [<<"PING">>]) of
58+
{ok, <<"PONG">>} -> ok;
59+
{error, Reason} -> {error, Reason}
60+
end.
61+
62+
wrap_call(_Config, Fun) ->
63+
Fun().
64+
65+
shutdown(_Config) ->
66+
ok.
67+
```
68+
69+
Then reference it in your config:
70+
71+
```erlang
72+
#{name => cache,
73+
type => custom,
74+
adapter => my_redis_adapter,
75+
pool => redis_pool,
76+
critical => false,
77+
shutdown_priority => 0}
78+
```
79+
80+
## Overriding health checks
81+
82+
Any dependency can override the adapter's health check with a custom `{Module, Function}` tuple:
83+
84+
```erlang
85+
#{name => primary_db,
86+
type => database,
87+
adapter => pgo,
88+
health_check => {my_app_health, deep_db_check}}
89+
```
90+
91+
The function must return `ok | {error, Reason}`.
92+
93+
## Runtime registration
94+
95+
Register dependencies at runtime for services discovered dynamically:
96+
97+
```erlang
98+
nova_resilience:register_dependency(inventory_service, #{
99+
type => custom,
100+
adapter => my_http_adapter,
101+
url => "http://inventory:8080",
102+
critical => false,
103+
breaker => #{failure_threshold => 5, wait_duration => 30000}
104+
}).
105+
106+
%% Then use it
107+
nova_resilience:call(inventory_service, fun() ->
108+
httpc:request("http://inventory:8080/api/stock")
109+
end).
110+
111+
%% Unregister when no longer needed
112+
nova_resilience:unregister_dependency(inventory_service).
113+
```

guides/getting-started.md

Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
# Getting Started
2+
3+
This guide walks through adding nova_resilience to an existing Nova application.
4+
5+
## Installation
6+
7+
Add `seki` and `nova_resilience` to your `rebar.config` deps:
8+
9+
```erlang
10+
{deps, [
11+
nova,
12+
seki,
13+
nova_resilience
14+
]}.
15+
```
16+
17+
Add them to your `.app.src` applications list:
18+
19+
```erlang
20+
{applications, [
21+
kernel, stdlib, nova, seki, nova_resilience
22+
]}.
23+
```
24+
25+
## Register health routes
26+
27+
Add `nova_resilience` to your app's `nova_apps` so the health endpoints get registered:
28+
29+
```erlang
30+
%% In sys.config
31+
{my_app, [
32+
{nova_apps, [nova_resilience]}
33+
]}.
34+
```
35+
36+
This gives you `/health`, `/ready`, and `/live` endpoints automatically.
37+
38+
## Configure dependencies
39+
40+
Add a `nova_resilience` section to your `sys.config`:
41+
42+
```erlang
43+
{nova_resilience, [
44+
{dependencies, [
45+
#{name => primary_db,
46+
type => database,
47+
adapter => pgo,
48+
pool => default,
49+
critical => true,
50+
shutdown_priority => 2}
51+
]}
52+
]}.
53+
```
54+
55+
### Required fields
56+
57+
- `name` — Atom identifying the dependency
58+
- `type``database`, `kafka`, or `custom`
59+
60+
### Optional fields
61+
62+
| Field | Default | Description |
63+
|-------|---------|-------------|
64+
| `adapter` | auto from type | `pgo`, `kura`, `brod`, or custom module |
65+
| `critical` | `false` | If true, `/ready` returns 503 when this dep is unhealthy |
66+
| `shutdown_priority` | `10` | Lower numbers shut down first |
67+
| `breaker` | none | Circuit breaker options (map) |
68+
| `bulkhead` | none | Concurrency limiter options (map) |
69+
| `retry` | none | Retry options (map) |
70+
| `default_timeout` | none | Default deadline in ms |
71+
| `health_check` | auto from adapter | `{Module, Function}` tuple for custom health checks |
72+
73+
## Using the resilience stack
74+
75+
Wrap calls to external dependencies:
76+
77+
```erlang
78+
case nova_resilience:call(primary_db, fun() ->
79+
pgo:query(<<"SELECT * FROM users">>)
80+
end) of
81+
{ok, Result} ->
82+
%% Result is whatever your fun returned
83+
handle_result(Result);
84+
{error, circuit_open} ->
85+
%% Dependency has too many failures, breaker tripped
86+
{json, 503, #{}, #{error => <<"service unavailable">>}};
87+
{error, bulkhead_full} ->
88+
%% Too many concurrent requests to this dependency
89+
{json, 503, #{}, #{error => <<"overloaded">>}};
90+
{error, deadline_exceeded} ->
91+
%% Request deadline expired
92+
{json, 504, #{}, #{error => <<"timeout">>}}
93+
end.
94+
```
95+
96+
Without a breaker or bulkhead configured, `call/2` still wraps the call with health tracking and telemetry.
97+
98+
## Kubernetes deployment
99+
100+
### Pod spec
101+
102+
```yaml
103+
containers:
104+
- name: my-app
105+
livenessProbe:
106+
httpGet:
107+
path: /live
108+
port: 8080
109+
initialDelaySeconds: 5
110+
periodSeconds: 10
111+
readinessProbe:
112+
httpGet:
113+
path: /ready
114+
port: 8080
115+
initialDelaySeconds: 2
116+
periodSeconds: 5
117+
startupProbe:
118+
httpGet:
119+
path: /ready
120+
port: 8080
121+
failureThreshold: 30
122+
periodSeconds: 2
123+
```
124+
125+
### How it works
126+
127+
1. Pod starts, nova_resilience checks all critical dependencies
128+
2. Startup probe polls `/ready` — returns 503 until deps are healthy
129+
3. Once ready, Kubernetes routes traffic to the pod
130+
4. On rolling deploy, SIGTERM is sent — nova_resilience marks not-ready, drains, shuts down deps
131+
5. Kubernetes stops routing traffic (readiness probe fails)
132+
6. Graceful termination completes
133+
134+
### Shutdown timing
135+
136+
Configure these to match your Kubernetes `terminationGracePeriodSeconds`:
137+
138+
```erlang
139+
{nova_resilience, [
140+
{shutdown_delay, 5000}, %% Wait for LB to notice not-ready
141+
{shutdown_drain_timeout, 15000} %% Max time to drain per dep group
142+
]}.
143+
```
144+
145+
Total shutdown time = `shutdown_delay` + (`shutdown_drain_timeout` * number of priority groups) + Nova's HTTP drain. Set your `terminationGracePeriodSeconds` accordingly.

0 commit comments

Comments
 (0)