release: v1.96.0 — rebuild recovery bridge#456
Merged
Conversation
Add internal/rebuild package with: - OperationResult enum (SUCCESS, FAILED_RECOVERED, FAILED_DEGRADED, FAILED_FATAL) - FailureClass enum (12 classes: PREVALIDATION_FAILED through RETRY_EXHAUSTED) - ModuleRestoreResult enum (3-level verification: structure, wiring, activation) - RetryDisposition enum + GetRetryDisposition() policy function - RecoveryMarker struct with JSON persistence (read/write/clear) - ModuleRestoreReport with per-module tracking - Comprehensive tests for policy logic, marker lifecycle, serialization Contract: V196_REBUILD_RECOVERY_CONTRACT.md Invariants: INV-RR-001 through INV-RR-010 No behavior change — foundation types only. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
…arkers Wire explicit failure classes into every rebuild exit path: - PREVALIDATION_FAILED (lines 1201, 1220): no marker written (INV-RR-005) - APPLY_FAILED (line 1248): marker written, retryable - DAEMON_RESTART_FAILED (line 1315): tracked for conditional retry - MODULE_RESTORE_FAILED: per-module tracking (ddos, portscan, botguard) - POSTVALIDATION_REGRESSION (line 1413): marker + rollback result + restored health - ROLLBACK_FAILED (line 1418): marker + enhanced exit 3 recovery instructions - SUCCESS (line 1438): stale marker cleared - DEGRADED (line 1443): classified by root cause (module/daemon/hard-fail) New file: nftban_rebuild_classify.sh — failure class constants, module restore tracking, recovery marker read/write/clear helpers. No retry activated. No systemd changes. Classification + markers only. Preserves existing exit-code behavior (0/1/2/3). Contract: V196_REBUILD_RECOVERY_CONTRACT.md §5-§7 Invariants: INV-RR-005 (prevalidation excluded), INV-RR-007 (module visible) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two fixes for smoke prerequisite detection discovered during v1.95 lab validation: 1. Validator path resolution: smoke used exec.LookPath() (PATH-based) for nftban-validate, but the binary is installed at /usr/lib/nftban/bin/ which is not in PATH. Now uses constants.ValidatorBinPath (shared with evidence layer). T1/T2 truth tests now PASS instead of false-SKIP. 2. Module-enabled detection: smoke checked wrong config keys (BOTGUARD_ENABLED instead of HTTP_BOTGUARD_ENABLED, LOGINMON_ENABLED instead of NFTBAN_LOGIN_ALERT_ENABLED). Replaced ad-hoc config parsing with validator-backed detection (single validator call, cached via sync.Once, config-file fallback when validator binary is missing). Module gating now matches `nftban health --json` exactly. Adds internal/constants/paths.go for single canonical binary path definition. Verified on RHEL-family and Debian-family hosts: - Before: 6/10 PASS, 4 SKIP (false SKIPs) - After: 9-10/10 PASS, 0-1 SKIP (only genuinely disabled modules) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ilures Add retry wrapper around rebuild core: - firewall_rebuild() becomes retry envelope - _firewall_rebuild_core() is the original rebuild logic (renamed, unchanged) - At most 1 immediate retry (INV-RR-006) - Only retries eligible classes: APPLY_FAILED, DAEMON_RESTART_FAILED, MODULE_RESTORE_FAILED, MODULE_RESTORE_INCOMPLETE - POSTVALIDATION_REGRESSION retried only if daemon-related (tightening #2) - Never retries: PREVALIDATION_FAILED, ROLLBACK_FAILED, structural failures - Updates marker retry_count on failed retry - Marks exhausted after cap reached Exit codes unchanged. No systemd changes. No deferred retry yet. Contract: V196_REBUILD_RECOVERY_CONTRACT.md §5.2, §10 PR-03 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add systemd-driven deferred rebuild recovery: - nftban-rebuild-recovery.timer: fires ONCE 60s after boot (not recurring) - nftban-rebuild-recovery.service: oneshot, reads marker, attempts rebuild - nftban_rebuild_recovery.sh: recovery script with full safety checks - Exits immediately if no marker, exhausted, or non-retryable class - Checks daemon availability before attempting recovery - Clears marker on success - Marks exhausted after cap (3 total = 1 immediate + 2 deferred) - No boot loop: oneshot service, non-repeating timer Polkit: added nftban-rebuild-recovery.service/.timer to operator whitelist in 10-nftban-systemd.rules. Systemd hardening: full hardening applied (PrivateTmp, NoNewPrivileges, ProtectKernel*, RestrictAddressFamilies, etc.) Contract: V196_REBUILD_RECOVERY_CONTRACT.md §10 PR-04 INV-RR-006: Retry bounded and persisted Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…fication Add post-module-restore verification step between steps 8-12 and POST validation. Closes the silent daemon-dependent module restoration gap. Verification checks (Level 1+2 per contract): - DDoS: nft list chain ip nftban nftban_ddos_filter - Portscan: nft list chain ip nftban nftban_portscan - BotGuard: nft list chain ip nftban nftban_botguard If a module reported RESTORE_OK but its chain is missing from kernel, result is downgraded to RESTORE_INCOMPLETE. This prevents false PROTECTED when module enable command returned 0 but the chain was not actually created (daemon dependency failure). Level 3 (activation evidence) is not checked here — requires traffic and produces WARNING only, not DEGRADED (per contract tightening #3). Contract: V196_REBUILD_RECOVERY_CONTRACT.md §8 INV-RR-007: Module restore failure is surfaced, not silent Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…state
Fixes from lab4 failure matrix testing:
1. Module verification chain names corrected to actual kernel names:
- DDoS: nftban_ddos_filter → ddos_protection
- Portscan: nftban_portscan → portscan_detection
- BotGuard: nftban_botguard → botguard_filter
2. Post-rebuild regression check now treats 'idle' as acceptable:
- idle = structurally equivalent to protected (all checks pass,
no traffic observed yet after flush+reload)
- protected → idle is NOT a regression, should not trigger rollback
- protected → degraded/down still triggers rollback (unchanged)
3. Success path now accepts both protected and idle as exit 0.
Discovered during v1.96 failure matrix testing on lab4.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Constants in nftban_rebuild_classify.sh are used by sourcing scripts (cmd_firewall.sh), not within the file itself. ShellCheck correctly flags them as unused within scope. Add per-line SC2034 disable directives to document intent. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements v1.96 rebuild recovery bridge — the missing recovery semantics between safe rebuild (v1.70+) and lifecycle canonization (v1.97+).
Core additions:
/var/lib/nftban/state/rebuild_recovery.json)Includes hotfix:
a15bcf80— smoke validator path resolution + module gating truth alignment (v1.95.1 patch, will deduplicate on rebase if merged separately)Lab4-derived fix:
78f01a32— corrected chain names to actual kernel names (ddos_protection, portscan_detection) and treatsidleas valid post-rebuild state (structurally equivalent to protected, no false rollback).Commit Stack (7 commits)
6b669fb5a15bcf807f3b706ee618c4e9789045d2a091e23778f01a32Contract
V196_REBUILD_RECOVERY_CONTRACT.md(locked 2026-04-17, 550 lines)10 invariants (INV-RR-001 through INV-RR-010), 3 tightenings applied:
Lab4 Validation Matrix
Not fully destructive-tested on lab4
Files Changed
New files (9):
internal/rebuild/types.go— OperationResult, FailureClass, ModuleRestoreResult enumsinternal/rebuild/policy.go— Retry policy constants + disposition logicinternal/rebuild/marker.go— RecoveryMarker JSON persistenceinternal/rebuild/policy_test.go— Unit testsinternal/rebuild/marker_test.go— Unit testscli/lib/nftban/core/nftban_rebuild_classify.sh— Shell classification helperscli/lib/nftban/core/nftban_rebuild_recovery.sh— Deferred retry scriptinstall/systemd/nftban-rebuild-recovery.service— Oneshot recovery serviceinstall/systemd/nftban-rebuild-recovery.timer— Boot-triggered timer (once, 60s delay)Modified files (2):
cli/lib/nftban/cli/cmd_firewall.sh— Classification wiring + retry wrapper + module verificationpackaging/polkit-1/rules.d/10-nftban-systemd.rules— Added recovery service to operator whitelistTest plan
🤖 Generated with Claude Code