Getting Started

This guide walks through adding nova_resilience to an existing Nova application.

Installation

Add nova_resilience to your rebar.config deps:

{deps, [
    nova,
    nova_resilience
]}.

Add it to your .app.src applications list:

{applications, [
    kernel, stdlib, nova, nova_resilience
]}.

Register health routes

Add nova_resilience to your app's nova_apps so the health endpoints get registered:

%% In sys.config
{my_app, [
    {nova_apps, [nova_resilience]}
]}.

This gives you /health, /ready, and /live endpoints automatically.

To prefix the health routes (e.g. behind /internal):

{nova_resilience, [
    {health_prefix, ~"/internal"}
]}.

This registers /internal/health, /internal/ready, and /internal/live.

Configure dependencies

Add a nova_resilience section to your sys.config:

{nova_resilience, [
    {dependencies, [
        #{name => primary_db,
          type => database,
          adapter => pgo,
          pool => default,
          critical => true,
          breaker => #{failure_threshold => 5, wait_duration => 30000},
          bulkhead => #{max_concurrent => 25},
          shutdown_priority => 2},

        #{name => events,
          type => kafka,
          client => my_brod_client,
          topic => ~"events",
          critical => true,
          breaker => #{failure_threshold => 3, wait_duration => 10000},
          shutdown_priority => 1}
    ]}
]}.

Required fields

name — Atom identifying the dependency (must be unique)

Optional fields

Field	Default	Description
`type`	`custom`	`database`, `kafka`, or `custom`
`adapter`	auto from type	`pgo`, `kura`, `brod`, or custom module
`critical`	`false`	If true, `/ready` returns 503 when this dep is unhealthy
`shutdown_priority`	`10`	Lower numbers shut down first
`breaker`	none	Circuit breaker options (map)
`bulkhead`	none	Concurrency limiter options (map)
`retry`	none	Retry options (map)
`default_timeout`	none	Default deadline in ms
`health_check`	auto from adapter	`{Module, Function}` tuple for custom health checks

Using the resilience stack

Wrap calls to external dependencies:

case nova_resilience:call(primary_db, fun() ->
    pgo:query(~"SELECT * FROM users")
end) of
    {ok, Result} ->
        handle_result(Result);
    {error, circuit_open} ->
        {json, 503, #{}, #{error => ~"service unavailable"}};
    {error, bulkhead_full} ->
        {json, 503, #{}, #{error => ~"overloaded"}};
    {error, deadline_exceeded} ->
        {json, 504, #{}, #{error => ~"timeout"}}
end.

Without a breaker or bulkhead configured, call/2 still wraps the call with telemetry and deadline tracking.

Call options

Override per-call settings with call/3:

nova_resilience:call(primary_db, Fun, #{
    timeout => 5000,           %% Override deadline for this call
    retry => #{                %% Override retry for this call
        max_attempts => 3,
        base_delay => 100,
        max_delay => 2000
    }
}).

Pass retry => false to disable retry for a single call.

Health endpoints

Once running, verify your setup:

# Full health report
curl http://localhost:8080/health | jq .

# Readiness probe
curl -s -o /dev/null -w "%{http_code}" http://localhost:8080/ready

# Liveness probe
curl -s -o /dev/null -w "%{http_code}" http://localhost:8080/live

Example /health response:

{
  "status": "healthy",
  "dependencies": {
    "primary_db": {
      "status": "healthy",
      "details": {},
      "name": "primary_db",
      "breaker": {"state": "closed", "failure_count": 0},
      "bulkhead": {"current": 3, "max": 25, "available": 22}
    }
  },
  "vm": {
    "memory_mb": 64,
    "process_count": 312,
    "run_queue": 0,
    "uptime_seconds": 3600,
    "node": "my_app@hostname"
  }
}

Kubernetes deployment

Pod spec

containers:
  - name: my-app
    livenessProbe:
      httpGet:
        path: /live
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 10
    readinessProbe:
      httpGet:
        path: /ready
        port: 8080
      initialDelaySeconds: 2
      periodSeconds: 5
    startupProbe:
      httpGet:
        path: /ready
        port: 8080
      failureThreshold: 30
      periodSeconds: 2

Lifecycle

Pod starts, nova_resilience checks all critical dependencies
Startup probe polls /ready — returns 503 until deps are healthy
Once ready, Kubernetes routes traffic to the pod
On rolling deploy, SIGTERM fires — nova_resilience marks not-ready, drains, shuts down deps
Kubernetes stops routing traffic (readiness probe fails)
Graceful termination completes

Shutdown timing

Total shutdown time = shutdown_delay + (shutdown_drain_timeout x priority groups) + Nova HTTP drain.

For defaults (5s delay, 15s drain, 1 priority group, 15s Nova drain):

total = 5 + 15 + 15 = 35 seconds

Set terminationGracePeriodSeconds: 45 to give headroom.

Development and testing

In development you may not have all dependencies running. Disable the startup gate to skip health check blocking:

%% dev.config
{nova_resilience, [
    {gate_enabled, false},
    {dependencies, [...]}
]}.

With gate_enabled => false, /ready returns 200 immediately on startup regardless of dependency health.

To have /health return 503 when critical dependencies are unhealthy (useful for monitoring systems that scrape /health):

{nova_resilience, [
    {health_severity, critical}
]}.

With the default info severity, /health always returns 200 with the full report. With critical, it returns 503 when the system is unhealthy.

Runtime registration

ok = nova_resilience:register_dependency(inventory_api, #{
    type => custom,
    adapter => my_http_adapter,
    url => "http://inventory:8080",
    breaker => #{failure_threshold => 5, wait_duration => 30000}
}).

%% Use it
nova_resilience:call(inventory_api, fun() ->
    httpc:request("http://inventory:8080/api/stock")
end).

%% Unregister when no longer needed
nova_resilience:unregister_dependency(inventory_api).

Next steps

Circuit Breakers & Bulkheads — Protect against cascading failures
Deadline Propagation — Manage request timeout budgets
Adapters — Write custom adapters for your dependencies
Graceful Shutdown — Ordered teardown and Kubernetes integration
Telemetry — Monitoring and observability

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting Started

Installation

Register health routes

Configure dependencies

Required fields

Optional fields

Using the resilience stack

Call options

Health endpoints

Kubernetes deployment

Pod spec

Lifecycle

Shutdown timing

Development and testing

Runtime registration

Next steps

FilesExpand file tree

getting-started.md

Latest commit

History

getting-started.md

File metadata and controls

Getting Started

Installation

Register health routes

Configure dependencies

Required fields

Optional fields

Using the resilience stack

Call options

Health endpoints

Kubernetes deployment

Pod spec

Lifecycle

Shutdown timing

Development and testing

Runtime registration

Next steps