This guide walks through adding nova_resilience to an existing Nova application.
Add nova_resilience to your rebar.config deps:
{deps, [
nova,
nova_resilience
]}.Add it to your .app.src applications list:
{applications, [
kernel, stdlib, nova, nova_resilience
]}.Add nova_resilience to your app's nova_apps so the health endpoints get registered:
%% In sys.config
{my_app, [
{nova_apps, [nova_resilience]}
]}.This gives you /health, /ready, and /live endpoints automatically.
To prefix the health routes (e.g. behind /internal):
{nova_resilience, [
{health_prefix, ~"/internal"}
]}.This registers /internal/health, /internal/ready, and /internal/live.
Add a nova_resilience section to your sys.config:
{nova_resilience, [
{dependencies, [
#{name => primary_db,
type => database,
adapter => pgo,
pool => default,
critical => true,
breaker => #{failure_threshold => 5, wait_duration => 30000},
bulkhead => #{max_concurrent => 25},
shutdown_priority => 2},
#{name => events,
type => kafka,
client => my_brod_client,
topic => ~"events",
critical => true,
breaker => #{failure_threshold => 3, wait_duration => 10000},
shutdown_priority => 1}
]}
]}.name— Atom identifying the dependency (must be unique)
| Field | Default | Description |
|---|---|---|
type |
custom |
database, kafka, or custom |
adapter |
auto from type | pgo, kura, brod, or custom module |
critical |
false |
If true, /ready returns 503 when this dep is unhealthy |
shutdown_priority |
10 |
Lower numbers shut down first |
breaker |
none | Circuit breaker options (map) |
bulkhead |
none | Concurrency limiter options (map) |
retry |
none | Retry options (map) |
default_timeout |
none | Default deadline in ms |
health_check |
auto from adapter | {Module, Function} tuple for custom health checks |
Wrap calls to external dependencies:
case nova_resilience:call(primary_db, fun() ->
pgo:query(~"SELECT * FROM users")
end) of
{ok, Result} ->
handle_result(Result);
{error, circuit_open} ->
{json, 503, #{}, #{error => ~"service unavailable"}};
{error, bulkhead_full} ->
{json, 503, #{}, #{error => ~"overloaded"}};
{error, deadline_exceeded} ->
{json, 504, #{}, #{error => ~"timeout"}}
end.Without a breaker or bulkhead configured, call/2 still wraps the call with telemetry and deadline tracking.
Override per-call settings with call/3:
nova_resilience:call(primary_db, Fun, #{
timeout => 5000, %% Override deadline for this call
retry => #{ %% Override retry for this call
max_attempts => 3,
base_delay => 100,
max_delay => 2000
}
}).Pass retry => false to disable retry for a single call.
Once running, verify your setup:
# Full health report
curl http://localhost:8080/health | jq .
# Readiness probe
curl -s -o /dev/null -w "%{http_code}" http://localhost:8080/ready
# Liveness probe
curl -s -o /dev/null -w "%{http_code}" http://localhost:8080/liveExample /health response:
{
"status": "healthy",
"dependencies": {
"primary_db": {
"status": "healthy",
"details": {},
"name": "primary_db",
"breaker": {"state": "closed", "failure_count": 0},
"bulkhead": {"current": 3, "max": 25, "available": 22}
}
},
"vm": {
"memory_mb": 64,
"process_count": 312,
"run_queue": 0,
"uptime_seconds": 3600,
"node": "my_app@hostname"
}
}containers:
- name: my-app
livenessProbe:
httpGet:
path: /live
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 2
periodSeconds: 5
startupProbe:
httpGet:
path: /ready
port: 8080
failureThreshold: 30
periodSeconds: 2- Pod starts, nova_resilience checks all critical dependencies
- Startup probe polls
/ready— returns 503 until deps are healthy - Once ready, Kubernetes routes traffic to the pod
- On rolling deploy, SIGTERM fires — nova_resilience marks not-ready, drains, shuts down deps
- Kubernetes stops routing traffic (readiness probe fails)
- Graceful termination completes
Total shutdown time = shutdown_delay + (shutdown_drain_timeout x priority groups) + Nova HTTP drain.
For defaults (5s delay, 15s drain, 1 priority group, 15s Nova drain):
total = 5 + 15 + 15 = 35 seconds
Set terminationGracePeriodSeconds: 45 to give headroom.
In development you may not have all dependencies running. Disable the startup gate to skip health check blocking:
%% dev.config
{nova_resilience, [
{gate_enabled, false},
{dependencies, [...]}
]}.With gate_enabled => false, /ready returns 200 immediately on startup regardless of dependency health.
To have /health return 503 when critical dependencies are unhealthy (useful for monitoring systems that scrape /health):
{nova_resilience, [
{health_severity, critical}
]}.With the default info severity, /health always returns 200 with the full report. With critical, it returns 503 when the system is unhealthy.
Register dependencies dynamically (e.g. discovered via service mesh):
ok = nova_resilience:register_dependency(inventory_api, #{
type => custom,
adapter => my_http_adapter,
url => "http://inventory:8080",
breaker => #{failure_threshold => 5, wait_duration => 30000}
}).
%% Use it
nova_resilience:call(inventory_api, fun() ->
httpc:request("http://inventory:8080/api/stock")
end).
%% Unregister when no longer needed
nova_resilience:unregister_dependency(inventory_api).- Circuit Breakers & Bulkheads — Protect against cascading failures
- Deadline Propagation — Manage request timeout budgets
- Adapters — Write custom adapters for your dependencies
- Graceful Shutdown — Ordered teardown and Kubernetes integration
- Telemetry — Monitoring and observability