novaframework
diff --git a/‎README.md‎
Lines changed: 117 additions & 0 deletions b/‎README.md‎
Lines changed: 117 additions & 0 deletions
diff --git a/‎guides/adapters.md‎
Lines changed: 113 additions & 0 deletions b/‎guides/adapters.md‎
Lines changed: 113 additions & 0 deletions
diff --git a/‎guides/getting-started.md‎
Lines changed: 145 additions & 0 deletions b/‎guides/getting-started.md‎
Lines changed: 145 additions & 0 deletions
@@ -0,0 +1,117 @@
+# nova_resilience
+
+Production-grade resilience patterns for [Nova](https://github.com/novaframework/nova) web applications.
+
+Bridges Nova and [Seki](https://github.com/Taure/seki) to provide dependency health checking, Kubernetes-ready probes, circuit breakers, bulkheads, and ordered graceful shutdown — all via declarative configuration.
+
+## Quick start
+
+Add to your deps:
+
+```erlang
+{deps, [
+    nova,
+    seki,
+    nova_resilience
+]}.
+```
+
+Add to your app's `applications`:
+
+```erlang
+{applications, [kernel, stdlib, nova, seki, nova_resilience]}.
+```
+
+Register health routes in your Nova config:
+
+```erlang
+{my_app, [
+    {nova_apps, [nova_resilience]}
+]}.
+```
+
+Configure dependencies:
+
+```erlang
+{nova_resilience, [
+    {dependencies, [
+        #{name => primary_db,
+          type => database,
+          adapter => pgo,
+          pool => default,
+          critical => true,
+          shutdown_priority => 2}
+    ]}
+]}.
+```
+
+That's it. Your app now has `/health`, `/ready`, and `/live` endpoints, automatic startup gating, and ordered shutdown.
+
+## What it does
+
+### Startup
+
+1. App starts, nova_resilience provisions health checks for each dependency
+2. `/ready` returns **503** until all critical dependencies are healthy
+3. Kubernetes readiness probe detects this and holds traffic
+4. Once all critical deps respond, `/ready` returns **200** and traffic flows
+
+### Running
+
+Execute calls through the resilience stack:
+
+```erlang
+case nova_resilience:call(primary_db, fun() ->
+    pgo:query(<<"SELECT * FROM users WHERE id = $1">>, [Id])
+end) of
+    {ok, #{rows := Rows}} -> {json, #{users => Rows}};
+    {error, circuit_open} -> {json, 503, #{}, #{error => <<"db unavailable">>}};
+    {error, bulkhead_full} -> {json, 503, #{}, #{error => <<"overloaded">>}}
+end.
+```
+
+### Shutdown
+
+On SIGTERM (or application stop):
+
+1. `/ready` immediately returns **503** (load balancer stops sending traffic)
+2. Waits `shutdown_delay` for in-flight LB health checks to propagate
+3. Tears down dependencies in `shutdown_priority` order
+4. Nova drains HTTP connections and stops
+
+No manual `prep_stop` calls needed — shutdown is fully automatic.
+
+## Health endpoints
+
+| Endpoint | Purpose | Response |
+|----------|---------|----------|
+| `GET /health` | Full health report | `{"status":"healthy","dependencies":{...},"vm":{...}}` |
+| `GET /ready` | Kubernetes readiness probe | 200 when ready, 503 when not |
+| `GET /live` | Kubernetes liveness probe | 200 if process is responsive |
+
+## Configuration
+
+```erlang
+{nova_resilience, [
+    {dependencies, [...]},          %% List of dependency configs
+    {health_check_interval, 10000}, %% ms between health checks
+    {vm_checks, true},              %% Include BEAM VM in health report
+    {gate_timeout, 30000},          %% Max ms to wait for deps on startup
+    {shutdown_delay, 5000},         %% ms to wait after marking not-ready
+    {shutdown_drain_timeout, 15000},%% Max ms to drain per priority group
+    {health_prefix, <<"">>}         %% Prefix for health routes
+]}.
+```
+
+## Built-in adapters
+
+| Type | Adapter | Auto health check |
+|------|---------|-------------------|
+| `database` | `pgo` (default) | `SELECT 1` via pgo |
+| `database` | `kura` | `SELECT 1` via kura repo |
+| `kafka` | `brod` | `brod:get_partitions_count/2` |
+| any | custom module | Implement `nova_resilience_adapter` behaviour |
+
+## License
+
+Apache-2.0
@@ -0,0 +1,113 @@
+# Adapters
+
+Adapters provide built-in health checks and shutdown logic for known dependency types. You can use built-in adapters or write your own.
+
+## Built-in adapters
+
+### pgo (default for `database` type)
+
+Health check runs `SELECT 1` against the pgo pool.
+
+```erlang
+#{name => primary_db,
+  type => database,
+  %% adapter => pgo is implicit
+  pool => default}
+```
+
+Optional `pool` field — defaults to pgo's default pool if omitted.
+
+### kura
+
+Health check runs `SELECT 1` through the kura repo layer.
+
+```erlang
+#{name => primary_db,
+  type => database,
+  adapter => kura,
+  repo => my_repo}
+```
+
+The `repo` field is required — it's the kura repo module that implements `kura_repo` behaviour.
+
+### brod (default for `kafka` type)
+
+Health check calls `brod:get_partitions_count/2` to verify broker connectivity.
+
+```erlang
+#{name => events,
+  type => kafka,
+  client => my_brod_client,
+  topic => <<"events">>}
+```
+
+Both `client` and `topic` are required.
+
+## Custom adapters
+
+Implement the `nova_resilience_adapter` behaviour:
+
+```erlang
+-module(my_redis_adapter).
+-behaviour(nova_resilience_adapter).
+
+-export([health_check/1, wrap_call/2, shutdown/1]).
+
+health_check(#{pool := Pool}) ->
+    case eredis:q(Pool, [<<"PING">>]) of
+        {ok, <<"PONG">>} -> ok;
+        {error, Reason} -> {error, Reason}
+    end.
+
+wrap_call(_Config, Fun) ->
+    Fun().
+
+shutdown(_Config) ->
+    ok.
+```
+
+Then reference it in your config:
+
+```erlang
+#{name => cache,
+  type => custom,
+  adapter => my_redis_adapter,
+  pool => redis_pool,
+  critical => false,
+  shutdown_priority => 0}
+```
+
+## Overriding health checks
+
+Any dependency can override the adapter's health check with a custom `{Module, Function}` tuple:
+
+```erlang
+#{name => primary_db,
+  type => database,
+  adapter => pgo,
+  health_check => {my_app_health, deep_db_check}}
+```
+
+The function must return `ok | {error, Reason}`.
+
+## Runtime registration
+
+Register dependencies at runtime for services discovered dynamically:
+
+```erlang
+nova_resilience:register_dependency(inventory_service, #{
+    type => custom,
+    adapter => my_http_adapter,
+    url => "http://inventory:8080",
+    critical => false,
+    breaker => #{failure_threshold => 5, wait_duration => 30000}
+}).
+
+%% Then use it
+nova_resilience:call(inventory_service, fun() ->
+    httpc:request("http://inventory:8080/api/stock")
+end).
+
+%% Unregister when no longer needed
+nova_resilience:unregister_dependency(inventory_service).
+```
@@ -0,0 +1,145 @@
+# Getting Started
+
+This guide walks through adding nova_resilience to an existing Nova application.
+
+## Installation
+
+Add `seki` and `nova_resilience` to your `rebar.config` deps:
+
+```erlang
+{deps, [
+    nova,
+    seki,
+    nova_resilience
+]}.
+```
+
+Add them to your `.app.src` applications list:
+
+```erlang
+{applications, [
+    kernel, stdlib, nova, seki, nova_resilience
+]}.
+```
+
+## Register health routes
+
+Add `nova_resilience` to your app's `nova_apps` so the health endpoints get registered:
+
+```erlang
+%% In sys.config
+{my_app, [
+    {nova_apps, [nova_resilience]}
+]}.
+```
+
+This gives you `/health`, `/ready`, and `/live` endpoints automatically.
+
+## Configure dependencies
+
+Add a `nova_resilience` section to your `sys.config`:
+
+```erlang
+{nova_resilience, [
+    {dependencies, [
+        #{name => primary_db,
+          type => database,
+          adapter => pgo,
+          pool => default,
+          critical => true,
+          shutdown_priority => 2}
+    ]}
+]}.
+```
+
+### Required fields
+
+- `name` — Atom identifying the dependency
+- `type` — `database`, `kafka`, or `custom`
+
+### Optional fields
+
+| Field | Default | Description |
+|-------|---------|-------------|
+| `adapter` | auto from type | `pgo`, `kura`, `brod`, or custom module |
+| `critical` | `false` | If true, `/ready` returns 503 when this dep is unhealthy |
+| `shutdown_priority` | `10` | Lower numbers shut down first |
+| `breaker` | none | Circuit breaker options (map) |
+| `bulkhead` | none | Concurrency limiter options (map) |
+| `retry` | none | Retry options (map) |
+| `default_timeout` | none | Default deadline in ms |
+| `health_check` | auto from adapter | `{Module, Function}` tuple for custom health checks |
+
+## Using the resilience stack
+
+Wrap calls to external dependencies:
+
+```erlang
+case nova_resilience:call(primary_db, fun() ->
+    pgo:query(<<"SELECT * FROM users">>)
+end) of
+    {ok, Result} ->
+        %% Result is whatever your fun returned
+        handle_result(Result);
+    {error, circuit_open} ->
+        %% Dependency has too many failures, breaker tripped
+        {json, 503, #{}, #{error => <<"service unavailable">>}};
+    {error, bulkhead_full} ->
+        %% Too many concurrent requests to this dependency
+        {json, 503, #{}, #{error => <<"overloaded">>}};
+    {error, deadline_exceeded} ->
+        %% Request deadline expired
+        {json, 504, #{}, #{error => <<"timeout">>}}
+end.
+```
+
+Without a breaker or bulkhead configured, `call/2` still wraps the call with health tracking and telemetry.
+
+## Kubernetes deployment
+
+### Pod spec
+
+```yaml
+containers:
+  - name: my-app
+    livenessProbe:
+      httpGet:
+        path: /live
+        port: 8080
+      initialDelaySeconds: 5
+      periodSeconds: 10
+    readinessProbe:
+      httpGet:
+        path: /ready
+        port: 8080
+      initialDelaySeconds: 2
+      periodSeconds: 5
+    startupProbe:
+      httpGet:
+        path: /ready
+        port: 8080
+      failureThreshold: 30
+      periodSeconds: 2
+```
+
+### How it works
+
+1. Pod starts, nova_resilience checks all critical dependencies
+2. Startup probe polls `/ready` — returns 503 until deps are healthy
+3. Once ready, Kubernetes routes traffic to the pod
+4. On rolling deploy, SIGTERM is sent — nova_resilience marks not-ready, drains, shuts down deps
+5. Kubernetes stops routing traffic (readiness probe fails)
+6. Graceful termination completes
+
+### Shutdown timing
+
+Configure these to match your Kubernetes `terminationGracePeriodSeconds`:
+
+```erlang
+{nova_resilience, [
+    {shutdown_delay, 5000},          %% Wait for LB to notice not-ready
+    {shutdown_drain_timeout, 15000}  %% Max time to drain per dep group
+]}.
+```
+
+Total shutdown time = `shutdown_delay` + (`shutdown_drain_timeout` * number of priority groups) + Nova's HTTP drain. Set your `terminationGracePeriodSeconds` accordingly.