Skip to content

feat: v1.98.0 Phase 1 — lifecycle bridge + auto-heal wiring + validate re-check#458

Merged
itcmsgr merged 8 commits intomainfrom
feat/v1.98-install-canonization
Apr 17, 2026
Merged

feat: v1.98.0 Phase 1 — lifecycle bridge + auto-heal wiring + validate re-check#458
itcmsgr merged 8 commits intomainfrom
feat/v1.98-install-canonization

Conversation

@itcmsgr
Copy link
Copy Markdown
Owner

@itcmsgr itcmsgr commented Apr 17, 2026

Summary

v1.98 Phase 1: Architecture batch (PR-07 through PR-12). No default path change yet.

What this PR adds

  • PR-07: Lifecycle bridge — observational event emission at all installer phases (INV-I-004)
  • PR-08/09: Detect + FHS parity evidence (documented, not code changes)
  • PR-10: Health check → health fix auto-trigger wiring (ExecStartPost in nftban-health.service)
  • PR-11: Validate phase VALIDATE_1 → safe auto-fix → VALIDATE_2 flow (INV-I-010 through INV-I-013)
  • PR-12: Logging integration (via lifecycle bridge events)
  • DEB-PERM-001 documentation (existing permissions module already handles the fix)
  • VERSION bump to 1.98.0

Key invariants enforced

  • INV-I-004: Lifecycle is observational only — does NOT drive installer decisions
  • INV-I-010: Post-install safe auto-fix trigger (one-shot after initial validation)
  • INV-I-011: Safe auto-fix scope is allowlisted only (permissions/ownership, not authority/SSH)
  • INV-I-012: One-shot only — auto-fix runs at most once per install
  • INV-I-013: Re-validation mandatory — only post-fix result determines success

Auto-heal gap closed

  • Before: Health check ran periodically (timer) but health fix was manual-only. Permission drift between installs was never auto-corrected.
  • After: Health check triggers health fix via ExecStartPost. Installer validate phase runs fix → re-validate flow.
  • Verified on lab2: Simulated DEB permission drift → auto-heal chain corrected it automatically.

NOT in this PR (Phase 2, after G2 gate)

  • PR-13: Feature flag (Go installer default)
  • PR-14: install.sh bootstrap reduction
  • PR-15: Legacy script deletion

Phase 2 requires G2 parity gate on real hosts before proceeding.

Lab Validation

Test Host Result
Installer builds lab4 (AlmaLinux 9) PASS
Lifecycle + rebuild tests lab4 63 tests PASS
Detect parity (SSH, authority, distro) lab4 + lab2 + monitor PASS (3 hosts)
Custom SSH port (55000) monitor PASS
FHS permissions (11/12 match) lab4 vs lab2 PASS (DEB drift auto-healed)
UFW conflict simulation lab2 SSH lockout proved conflict detection is critical
Auto-heal chain (drift → fix → corrected) lab2 PASS
DEB permission auto-heal (/usr/sbin/nftban) lab2 PASS (root:root 755 → root:nftban 750)

Contract

  • V198_INSTALL_CANONIZATION_CONTRACT.md (13 invariants, INV-I-001 through INV-I-013)
  • V198_PR08_PR09_PARITY_EVIDENCE.md (detect + FHS evidence)

Test plan

  • Installer binary builds on lab4
  • 50 lifecycle tests + 13 rebuild tests PASS
  • Lifecycle bridge emits events at all phases
  • Auto-heal chain works on lab2 (drift → fix → corrected)
  • Validate re-check flow: VALIDATE_1 → fix → VALIDATE_2
  • Pre-commit hooks pass
  • No default behavior change for users

🤖 Generated with Claude Code

itcmsgr and others added 5 commits April 17, 2026 11:46
Wire lifecycle event emission into existing installer phases:

- lifecycleBridge: observes installer decisions, emits lifecycle events
- observeDetect(): records authority + detection at DETECT completion
- observePlan(): records authority action from installer decision
- observeResult(): maps installer StateFile to lifecycle outcome
- v1.96 recovery marker read for last_operation truth

Integration points in runInstall():
- After phaseDetect: emit detect + plan observations
- On phase failure: emit result with failure outcome
- After phaseValidate: emit final result

INV-I-004 ENFORCED: Lifecycle is OBSERVATIONAL ONLY.
Bridge mirrors decisions — does NOT influence installer execution.
Installer logic remains the source of execution truth.

No behavior change. Additive lifecycle logging only.

Contract: V198_INSTALL_CANONIZATION_CONTRACT.md §4.1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The /usr/sbin/nftban permission drift on DEB (root:root 755 instead of
root:nftban 750) is already handled by the existing permissions module:
  nftban_permissions_enforce_all() → perms_enforce_sbin()
  uses $PERMS_SBIN from NFTBAN_SBIN_DIR (distro-config based, not hardcoded)

Replace hardcoded fix with comment documenting the existing path.
The permissions module (nftban_permissions.sh:230) already:
- uses distro-aware path ($PERMS_SBIN)
- creates nftban group if missing
- sets root:nftban 0750 on /usr/sbin/nftban*

Verified on lab2 (Ubuntu 24.04): nftban health fix permissions
correctly fixes the drift via the existing module.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add ExecStartPost to nftban-health.service that triggers
nftban-health-fix.service after each health check cycle.

This closes the auto-heal gap:
- Health CHECK runs periodically as User=nftban (timer)
- Health CHECK can fix services/nftables (polkit + CAP_NET_ADMIN)
- Health CHECK cannot fix root-owned file permissions
- Health FIX runs as root and CAN fix permissions/ownership
- Previously: health FIX was manual-only, never auto-triggered
- Now: health CHECK triggers health FIX on every cycle

The fix service is idempotent — if no permission issues exist,
it completes instantly with no changes. Uses --no-block to avoid
blocking the health check timer.

The `-` prefix on ExecStartPost makes it non-fatal if the fix
service fails or is already running.

Install/update path already calls RunPermissionsEnforce() in
phaseValidate, so this only affects the background periodic path.

Contract: V198_INSTALL_CANONIZATION_CONTRACT.md INV-I-010
Evidence: V198_PR08_PR09_PARITY_EVIDENCE.md (DEB-PERM-001)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…-validate

Add VALIDATE_1 → safe auto-fix → VALIDATE_2 flow to phaseValidate:

If initial assertions fail:
  1. Log failed assertions (VALIDATE_1)
  2. Run 'nftban health fix all' (one-shot, INV-I-012)
  3. Re-run assertions (VALIDATE_2, INV-I-013)
  4. Only VALIDATE_2 result determines final outcome

This closes the operational gap where install could leave safe-fixable
drift (e.g. DEB /usr/sbin/nftban permissions) that would cause DEGRADED
when a single auto-fix pass would have corrected it.

The auto-fix runs at most ONCE per install (INV-I-012).
Re-validation is mandatory (INV-I-013).
Only allowlisted safe fixes are applied (INV-I-011).

If VALIDATE_2 still fails → DEGRADED (INV-I-008, no false success).

Contract: V198_INSTALL_CANONIZATION_CONTRACT.md INV-I-010 through INV-I-013

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 17, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

itcmsgr and others added 3 commits April 17, 2026 12:42
Blocker #1 (phases.go:295): installer validate called 'health fix all'
which runs 9 unbounded steps including disabling UFW/firewalld/fail2ban,
triggering rebuild, GeoIP download, and panel enable. Violates INV-I-011
(allowlist scope) and INV-I-006 (authority takeover).

Fix: Replace with 'permissions enforce' — bounded, safe, idempotent.
Only fixes ownership/mode on NFTBan-managed paths. Does not cross
authority boundaries or mutate external firewall state.

Blocker #2 (nftban-health.service ExecStartPost): unconditionally
triggered 'nftban-health-fix.service' (which runs fix all) on every
health check timer cycle. Violates INV-I-012 (one-shot) and ships
unbounded root remediation to every host.

Fix: Remove ExecStartPost trigger. Root-level permission fixes now
run only during install/update (phaseValidate → permissions enforce)
or manual operator invocation. Document the rationale for future
bounded safe-fix target.

Audit: V198_FOUNDATION_BATCH_AUDIT.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Policy gate requires FHS spec version to match VERSION file.
Regenerated via build/generate-fhs-outputs.sh.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ate module

Add bounded safe fix state machine as a testable module:

- validate.RunWithBoundedFix(): VALIDATE_1 → permissions enforce → VALIDATE_2
- Only calls 'permissions enforce' (INV-I-011), never 'health fix all'
- Fix runs at most once (INV-I-012)
- Only VALIDATE_2 result determines final outcome (INV-I-013)

NB-6 test cases (from V198_PR13_GO_DECISION.md §11):
- Test 1: V1 passes → no fix called → success
- Test 2: V1 fails → fix runs → V2 passes → COMMITTED
- Test 3: V1 fails → fix runs → V2 still fails → DEGRADED
- Test 4: permissions enforce called at most once
- Test 5: no destructive side-effects (no service disable, no package removal)

MockExecutor enhanced with:
- OnCommand(): register callbacks for simulating side-effects
- CommandCalled(): assert command was/wasn't executed
- CommandCallCount(): assert execution count bounds
- Callback firing in Run() for command simulation

Contract: V198_INSTALL_CANONIZATION_CONTRACT.md INV-I-010 through INV-I-013
Audit closure: NB-6 from V198_FOUNDATION_BATCH_AUDIT.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@itcmsgr itcmsgr merged commit bba638c into main Apr 17, 2026
48 checks passed
@itcmsgr itcmsgr deleted the feat/v1.98-install-canonization branch April 17, 2026 10:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant