Skip to content

Health Model v190 delta

Antonios Voulvoulis edited this page Apr 17, 2026 · 1 revision

Health Model — v1.90 Delta (Additions Only)

This is NOT a full page replacement. These sections should be ADDED to the existing Health-Model.md page, which is already well-written. The existing 4-axis model, evaluation order, and module-level health derivation are correct and should be preserved.


Section to ADD after "Evaluation Order" (before module details)

Validator vs Watchdog (Boundary Clarification)

NFTBan has two independent monitoring systems. They answer different questions and must not be confused.

System Question it answers Inputs Outputs
Validator Is the firewall protecting correctly? Kernel state (nft -j list ruleset), systemd, config PROTECTED / DEGRADED / DOWN / IDLE
Watchdog Is the daemon running within resource bounds? /proc, Go runtime, nftables (netlink) Pressure scores, operating modes

Key distinction: The validator can report PROTECTED while the watchdog is in DEGRADED mode. This is not a contradiction — it means the firewall is enforcing correctly but the daemon is under resource pressure.

The validator does not:

  • Monitor CPU, memory, or disk pressure
  • Track daemon RSS or goroutine count
  • Adjust daemon behavior (workers, collection intervals)
  • Read from the watchdog

The watchdog does not:

  • Evaluate kernel health or module states
  • Compute PROTECTED/DEGRADED/DOWN/IDLE
  • Interpret kernel counters or set populations
  • Override validator health conclusions

Both systems produce Prometheus metrics on the daemon /metrics endpoint. Validator health is at nftban_schema_validation_status and nftban_health_status. Watchdog pressure is at nftban_pressure_score{dim} and nftban_operating_mode{mode}.

For full watchdog documentation, see Watchdog & Profiles.


Section to ADD to "Limitations" at bottom of page

Health vs Pressure

The health model described on this page covers protection correctness. It does not cover runtime resource management. For pressure monitoring, operating modes (NORMAL/DEGRADED/SURVIVAL), server profiles, and adaptive behavior under load, see Watchdog & Profiles.

A system can be simultaneously:

  • PROTECTED (validator) + NORMAL (watchdog) — healthy and running well
  • PROTECTED (validator) + DEGRADED (watchdog) — protecting but under pressure
  • DEGRADED (validator) + NORMAL (watchdog) — structure broken but resources OK
  • DOWN (validator) + SURVIVAL (watchdog) — both systems report problems

Clone this wiki locally