Skip to content

Commit 29b1f03

Browse files
jamesmblairJames Blairclaude
authored
docs: add ADR-0009 for state backend health checks reporting degraded (#84)
Documents the decision that state connector health checks always report Degraded rather than Unhealthy, preventing container orchestrators from recycling the portal over downstream outages. Co-authored-by: James Blair <jblair@codeforamerica.org> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 4de42df commit 29b1f03

1 file changed

Lines changed: 29 additions & 0 deletions

File tree

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# 9. State Backend Health Checks Report Degraded, Not Unhealthy
2+
3+
Date: 2026-03-17
4+
5+
## Status
6+
7+
Accepted
8+
9+
## Context
10+
11+
State connector plugins register health checks that verify connectivity to their backend systems (DC's SQL database, CO's CBMS API). Initially these checks reported `Unhealthy` when the backend was misconfigured or unreachable. In ASP.NET Core's health check framework, the overall `/health` endpoint status is the worst individual check status — so a single `Unhealthy` check makes the entire endpoint report `Unhealthy`, which can cause container orchestrators to kill and replace the container.
12+
13+
The portal can still serve requests (login, static pages, cached data) when a state backend is down. Recycling the container won't fix a downstream outage and can cascade into a worse situation.
14+
15+
## Decision
16+
17+
State connector health checks always report `Degraded` — never `Unhealthy` — regardless of whether the issue is missing configuration or an unreachable backend. The structured JSON response from `/health` includes per-check descriptions and exception details, so monitoring and alerting can still distinguish between misconfiguration and connectivity failures.
18+
19+
### Alternatives considered
20+
21+
- **Report `Unhealthy` for connectivity failures, `Degraded` for missing config.** More semantically precise, but the operational consequence (container recycling) is undesirable in both cases.
22+
- **Have connectors report `Unhealthy` but remap to `Degraded` at the portal level.** ASP.NET Core's overall status is the worst individual status with no built-in remapping. A portal-side wrapper could intercept results, but adds complexity for the same operational outcome.
23+
- **Make the failure status configurable via appsettings.** Flexibility for future states, but adds a configuration knob that would likely be set once and forgotten — and increases maintenance burden for state partners. Can revisit if needed as more states onboard.
24+
25+
## Consequences
26+
27+
- The `/health` endpoint never returns `Unhealthy` due to a state backend issue, preventing aggressive container recycling.
28+
- Monitoring must inspect the per-check details (description, exception) in the JSON response rather than relying solely on the top-level status to detect backend outages.
29+
- The portal's own health (e.g., its database) can still report `Unhealthy` if needed in the future — this decision applies only to state connector checks.

0 commit comments

Comments
 (0)