v1.100 PR-22B — lifecycle truth repair (dry-run honesty + history gating + authority alignment + panel consent) by itcmsgr · Pull Request #482 · itcmsgr/nftban

itcmsgr · 2026-04-19T21:02:43Z

Why this PR exists

Second independent audit of the install/update lifecycle found systemic issues beyond what PR-22A fixed for uninstall. The install path silently ignored --dry-run. The update path wrote three files under /var/lib/nftban/ during every "dry-run" and recorded successful previews as install_fail. The authority classifier had no Ambiguous state on the install side and implicitly auto-approved takeover on panel-managed hosts. Two different definitions of "nftban authoritative" coexisted. CI whitelisted writes under /var/lib/nftban/state/ and never snapshot the history file.

This PR is boundary-repair ONLY. It does not advance uninstall mutation, does not add install preview capability, does not begin PR-23 work.

PR-22B does not add install preview capability; it removes false dry-run semantics by refusing unsupported install dry-run invocations.

Depends on: #481 (PR-22A uninstall boundary repair).
Scope lock: 12 items from the repair contract seed — nothing else.

What changed (12-item mapping)

1. Dry-run honesty

Surface	Before	After
`--mode=install --dry-run`	Silently ignored — all 5 mutating phases executed	Refused with explicit error + non-zero exit
`update_dryrun.go` → `os.WriteFile(update_plan.json)`	Written every run	Removed — stdout only
`phaseDetect` → `sf.Transition(StateDetectComplete)` during update dry-run	Wrote `install_state`	In-memory only — `StateFile.DryRun=true` suppresses `WriteAtomic`

2. History gating

New state.IsApplyTerminal(s) helper — explicit allowlist of states representing a completed apply attempt (Committed / Degraded / FailedSSH / FailedAbort / FailedRender / FailedRebuild / FailedNoFirewall / FailedTakeover).
main.go gate: if !cfg.dryRun && state.IsApplyTerminal(sf.State) { writeHistory(...) }. Structural, not mode-based. Every dry-run + every non-terminal state is excluded by construction.

3. Authority predicate + Ambiguous + panel consent

authority.IsNftbanAuthoritative(exec) — canonical shared predicate requiring table + chain + active daemon. update.Preflight P-1 now uses this single source of truth. Previously two weaker definitions existed.
authority.Decision gains Ambiguous. Orphan table or daemon-without-table routes here. phaseSwitch treats Ambiguous with the same emergency-SSH injection as Takeover/Fresh — never silent-continue.
authority.Classify now takes a panelAutoApprove bool parameter (wired from cfg.panelAutoTakeover). Panel detection alone no longer auto-approves takeover. Operators must pass --panel-auto-takeover or NFTBAN_PANEL_AUTO_TAKEOVER=1.

4. Lifecycle bridge truthfulness

observeResult.DryRun now wired from cfg.dryRun (was hard-coded false).
observePlan / mapAuthority switches: compared authority.Decision values (UPPERCASE) against lowercase literals — every switch silently hit default, so every consumer saw PreserveAuthority regardless of the real decision. Now pinned to authority package constants. Ambiguous routes through ActionTakeAuthority + new AuthorityUnknown owner.

5. Flag validation

Combination	Before	After
`--mode=install --dry-run`	Silently proceeds as real install	Refused
`--takeover --dry-run`	Both applied (meaningless)	Refused
`--rpm --deb`	Silently picks one	Refused
`--force-delete-operator-config` without `--purge`	Silently dropped	Refused

6. CI truth surfaces

ci-update-canonization.yml G3-U3: seeds update-history.json, snapshots install_state, hard-asserts byte-identical after dry-run. Removed || true soft diff handling. Widened G3-U5..U10 structural grep scope to include update_dryrun.go; extended pattern list (os.WriteFile, os.Create, os.MkdirAll, os.Rename, nft create, apt-get purge, dnf erase).
ci-install-canonization.yml NEW: G3-IN-REFUSE-DRY-RUN (install dry-run exits non-zero with refusal message; history untouched). G3-IN-FLAG-COMBOS (invalid combos rejected).

7. Purity tests + reusable audit harness

internal/installer/audit/harness.go NEW: reusable PurityHarness with AssertNoExecutorWrites / AssertNoDirectoryCreations / AssertNoMutationCommands / AssertNoStateDirEntries + AssertAllPurity convenience method. Self-tests prove it catches each class.
state.IsApplyTerminal allowlist tests + DryRun Transition regression test.
authority.Classify ambiguous-on-orphan-table + symmetric daemon-without-table + table-without-chain + predicate-requires-all-three tests.
uninstall.Plan render ↔ JSON equivalence grid (audit item 11).
history_test.go write-gate predicate unit tests.

Falsifiability proof

Could the audit's install/update findings pass this CI? No.

Previous bug	Gate that would now catch it
`--mode=install --dry-run` silently proceeds	`G3-IN-REFUSE-DRY-RUN`
`update_dryrun.go` writes `update_plan.json`	`G3-U5..U10` extended `os.WriteFile(` grep
`phaseDetect` → `Transition` writes install_state during update dry-run	`G3-U3` hard assertion on `install_state` sha256
`writeHistory` records update dry-run as `install_fail`	`G3-U3` hard assertion on `update-history.json` sha256
Orphan nftban table classifies as `Update`	`TestClassify_Ambiguous_OrphanTable`
Panel silently auto-approves takeover	`TestClassify_PanelDetected_NoFlag_Aborts`
Two "nftban authoritative" predicates drift	`update.Preflight` now imports `authority.IsNftbanAuthoritative` — single source

Explicit non-goals

This PR does not add uninstall mutation, does not add install preview capability, does not change purge/remove semantics, does not redesign installer history generally (only the write-gate predicate), does not tighten PriorRecordUsable semantics (deferred), does not add an exec-trace CI gate (deferred), and does not begin PR-23 work.

Acceptance criteria (repair contract §6 extended)

All dry-run paths are observational or explicitly refused
No dry-run writes history or install_state by default
Authority ambiguity is surfaced conservatively (Ambiguous + emergency-SSH in phaseSwitch)
Silent panel takeover removed; explicit opt-in required
Canonical authority predicate; no duplicate definitions
Reusable purity harness exists + proves itself in self-tests
CI would fail on the exact bugs the audit found

Test plan

ci-install-canonization matrix green (ubuntu-24.04 + almalinux-9)
ci-update-canonization matrix green (including new hard-asserts)
ci-uninstall-canonization matrix green (regression-safe)
Runtime Truth matrix green
Build & Test green (all unit tests including new harness self-tests)
Classifier tests pass with new 5-state enum
Flag-combo CI refusals assert correctly

🤖 Generated with Claude Code

…y gating, authority alignment, panel consent Extended audit found systemic issues beyond uninstall that would invite the exact class of false confidence PR-22A closed, but for install and update paths. This PR is boundary-repair ONLY — does not add uninstall mutation, does not add install preview capability, does not begin PR-23 work. PR-22B does not add install preview capability; it removes false dry-run semantics by refusing unsupported install dry-run invocations. Scope (12 items from the repair contract): ## 1. Dry-run honesty - flags.go: refuse `--mode=install --dry-run` with an explicit error. An honest install dry-run orchestrator is deferred — until it lands, the flag combination has no truthful meaning. - state.StateFile.DryRun field + guard in Transition(): when DryRun is true, in-memory fields are updated but no file write occurs. main.go wires sf.DryRun = cfg.dryRun at construction. Any shared phase function (phaseDetect in particular) that was persisting install_state during update dry-run is now in-memory-only. - update_dryrun.go: removed os.WriteFile(update_plan.json). Plan renders to stdout only; no default filesystem persistence. ## 2. History gating (terminal-state allowlist) - state.IsApplyTerminal(s) helper — explicit allowlist of states representing a completed apply attempt. Catch-all default mapped StateDetectComplete / StateUninstallPlanning / intermediate states to install_fail; now they are excluded from history writes entirely. - main.go: writeHistory gated on `!cfg.dryRun && IsApplyTerminal(sf.State)`. Removes the PR-22A mode-name heuristic (never mode-based, always structural). ## 3. Authority predicate + Ambiguous + panel consent - authority.IsNftbanAuthoritative(exec) — canonical shared predicate requiring table + chain + active daemon. update.Preflight P-1 now uses this single source of truth; no two different definitions of "nftban authoritative." - authority.Decision gains `Ambiguous` — orphan table or daemon-without-table routes here instead of silently classifying as Update or Fresh. - authority.Classify signature adds panelAutoApprove bool. Panel detection alone no longer auto-approves takeover — operators who want the old behaviour must pass --panel-auto-takeover explicitly (flag + NFTBAN_PANEL_AUTO_TAKEOVER=1 env mirror). - phases.go phaseSwitch treats Ambiguous with the same pre-switch emergency-SSH injection as Takeover/Fresh. Never silent-continue, never skip safety paths. ## 4. Lifecycle bridge truthfulness - lifecycle_bridge observeResult: DryRun is wired from cfg.dryRun (was hard-coded false). - observePlan / mapAuthority: switches now compare against authority.Decision constants. Previous version compared UPPERCASE enum values against lowercase literals — every switch silently hit default, so every lifecycle consumer saw PreserveAuthority regardless of the real decision. - internal/lifecycle: new AuthorityUnknown owner for the Ambiguous case. ## 5. Flag validation (reject contradictory combinations) - refuse --mode=install --dry-run - refuse --takeover with --dry-run - reject --rpm with --deb - reject --force-delete-operator-config without --purge (both for install paths and the uninstall early-return block) ## 6. CI truth surfaces - ci-update-canonization.yml G3-U3: snapshot + hard-assert /var/lib/nftban/update-history.json and /var/lib/nftban/state/ install_state byte-identical after dry-run. Removed || true soft diff handling. Widened G3-U5..U10 structural grep to include update_dryrun.go and extended the pattern list (os.WriteFile, os.Create, os.MkdirAll, os.Rename, nft create, apt-get purge, dnf erase). - ci-install-canonization.yml NEW: G3-IN-REFUSE-DRY-RUN gate asserts `--mode=install --dry-run` exits non-zero with an explicit refusal message; G3-IN-FLAG-COMBOS asserts invalid combos are rejected; no history pollution from refused runs. ## 7. Forbidden-side-effect tests + reusable audit harness - internal/installer/audit/harness.go: reusable PurityHarness with AssertNoExecutorWrites / AssertNoDirectoryCreations / AssertNoMutationCommands / AssertNoStateDirEntries. Self-tests in harness_test.go prove it catches each class. - state machine_test.go: IsApplyTerminal allowlist + DryRun-Transition regression tests. - authority classify_test.go: Ambiguous on orphan-table, symmetric on daemon-without-table, table-without-chain routed to Ambiguous, predicate requires all three (table+chain+daemon). - uninstall_test.go: render ↔ JSON equivalence grid (audit item 11). - history_test.go: write-gate predicate unit tests (dry-run skips, non-terminal skips, apply-terminal writes). Explicit non-goals: - no uninstall mutation added - no install preview capability added (PR-22B only removes false dry-run semantics by refusing) - no purge/remove semantics change - no validator contract change - no generic history redesign beyond the terminal-state gate - no PR-23 work begun Depends on: #481 (PR-22A uninstall boundary repair) Refs: V1100_LIFECYCLE_COMPLETION_CONTRACT.md §13 (frozen 2026-04-19) Refs: internal/installer/uninstall/contract.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-19T21:03:29Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

OpenSSF Scorecard

Package

Version

Score

Details

actions/actions/checkout

34e114876b0b11c390a56381ad16ebd13914f8d5

🟢 5.7

Details

Check	Score	Reason
Maintained	⚠️ 0	0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0
Dangerous-Workflow	🟢 10	no dangerous workflow patterns detected
Binary-Artifacts	🟢 10	no binaries found in the repo
Code-Review	🟢 10	all changesets reviewed
Token-Permissions	⚠️ 0	detected GitHub workflow tokens with excessive permissions
CII-Best-Practices	⚠️ 0	no effort to earn an OpenSSF best practices badge detected
Fuzzing	⚠️ 0	project is not fuzzed
Packaging	⚠️ -1	packaging workflow not detected
License	🟢 10	license file detected
Signed-Releases	⚠️ -1	no releases found
Pinned-Dependencies	🟢 3	dependency not pinned by hash detected -- score normalized to 3
Security-Policy	🟢 9	security policy file detected
Branch-Protection	🟢 5	branch protection is not maximal on development and all release branches
SAST	🟢 8	SAST tool detected but not run on all commits

actions/actions/setup-go

d35c59abb061a4a6fb18e82ac0862c26744d6ab5

🟢 5.7

Details

Check	Score	Reason
Maintained	🟢 6	7 commit(s) and 1 issue activity found in the last 90 days -- score normalized to 6
Code-Review	🟢 10	all changesets reviewed
Binary-Artifacts	🟢 10	no binaries found in the repo
Packaging	⚠️ -1	packaging workflow not detected
Dangerous-Workflow	🟢 10	no dangerous workflow patterns detected
CII-Best-Practices	⚠️ 0	no effort to earn an OpenSSF best practices badge detected
Token-Permissions	⚠️ 0	detected GitHub workflow tokens with excessive permissions
Pinned-Dependencies	⚠️ 0	dependency not pinned by hash detected -- score normalized to 0
Fuzzing	⚠️ 0	project is not fuzzed
License	🟢 10	license file detected
Signed-Releases	⚠️ -1	no releases found
Security-Policy	🟢 9	security policy file detected
Branch-Protection	⚠️ 0	branch protection not enabled on development/release branches
SAST	🟢 10	SAST tool is run on all commits

Scanned Files

.github/workflows/ci-install-canonization.yml

Two CI failures from the first PR-22B push: 1. go vet failure in internal/installer/audit/harness_test.go — my recordingT mock could not satisfy testing.TB (unexported method). Switch the harness method signatures from *testing.T to testing.TB so any TB implementation works; rewrite self-tests to use t.Run's return value (expectInnerFail helper) to observe subtest failures without implementing TB externally. 2. G3-U3 FAIL because the history-file seed happened AFTER the before snapshot, so the seeded file appeared as "new" in after. Move the seed BEFORE snapshot so both before and after include it; use size-only snapshot (drop mtime/%T@) so stat-touch no-ops don't trigger the hard diff assertion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ist gap Re-audit of PR-22A+PR-22B stack returned GO WITH CONDITIONS with three must-fix blockers before merge. All three are closed here, plus the ApplyWhitelist entry needed for the new IsNftbanAuthoritative probe. N-1 — refuse --repair --dry-run -------------------------------- Previously the validation block was skipped entirely for repair mode, so --dry-run was silently accepted while phaseSwitch continued to mutate kernel + services. sf.DryRun=true suppresses state-file writes but not kernel mutation. flags.go now refuses the combination with an explicit error before the validation block. N-2 — adopt audit.PurityHarness + add update dry-run purity test ---------------------------------------------------------------- uninstall_dryrun_test.go had its own inline forbidden-command list and inline state-dir check — parallel code to audit.PurityHarness. Replace with a single AssertAllPurity call. No more list-drift risk. cmd/nftban-installer/update_dryrun_test.go NEW — invokes runUpdateDryRun under MockExecutor + temp stateDir and asserts AssertAllPurity. Covers both preflight-pass (happy path) and preflight-fail branches. Closes the audit gap where update dry-run was defended only at CI filesystem-snapshot granularity. Harness redesign for testability: Check* methods return []string of violation messages; Assert* methods call them and report via t.Errorf. Self-tests now exercise Check* directly (no testing.TB mock needed). Previous attempt used a recordingT mock that could not satisfy testing.TB due to its unexported private() method. N-3 — doc comment typo in state/machine.go ------------------------------------------ The package-level IsApplyTerminal func doc said "alias for IsApplyTerminal" (self-referential). Reworded to "alias for the (InstallState) IsApplyTerminal method." Bonus — ApplyWhitelist gap uncovered by re-audit CI --------------------------------------------------- update.IsNftbanAuthoritative added a "nft list chain ip nftban input" probe (required daemon+chain+table predicate). Preflight runs this on every apply entry. The contract-audit harness in update_apply_test.go rejected the command as non-whitelisted. Add "nft list chain ip nftban input" to ApplyWhitelist — read-only, part of preflight P-1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…3 blockers + standing lifecycle-truth rule Post-PR-22B hygiene per approved plan. One tight commit, no code changes. CHANGELOG.md — new [Unreleased] section: - summary of PR-22A + PR-22B structural repair - data-integrity note on the lifecycle-bridge authority-mapping bug: pre-PR-22B `observePlan`/`mapAuthority` switches silently hit default arms because of UPPERCASE-vs-lowercase comparison. Between v1.98 and the merge of PR-22B (#482), any lifecycle-telemetry consumer saw `PreserveAuthority`/`AuthorityNone` regardless of real decision. Kernel behavior + install_state + update-history unaffected — only the lifecycle bridge's external reporting surface. Forensic interpretation of pre-PR-22B lifecycle output should treat the authority decision as "unknown," not "preserve." internal/installer/uninstall/contract.md — two new sections: 1. Standing lifecycle-truth rule: codifies the merge-discipline constraint — no new lifecycle code may bypass the shared authority predicate, history gate, or dry-run contract. Enumerates the five concrete requirements that every new lifecycle PR must respect, and points at the CI gates that should catch bypass attempts. 2. Pre-PR-23 blockers: explicit table of the six follow-up PRs that must land before PR-23 (uninstall mutation) can start: (1) prior-authority record hardening (2) external-firewall detection unification (3) kernel/service snapshot CI gate (4) exec-trace CI gate (5) auto-elevate shim removal gate (6) payload integrity minimum checks Plus the Phase 3 gating rule: verification audit after items 1-6 land, with three focused questions, no exploratory scope. No code changes. No behavior changes. Institutional-memory commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…23 blockers + standing lifecycle-truth rule (#483) Post-PR-22B hygiene per approved plan. One tight commit, no code changes. CHANGELOG.md — new [Unreleased] section: - summary of PR-22A + PR-22B structural repair - data-integrity note on the lifecycle-bridge authority-mapping bug: pre-PR-22B `observePlan`/`mapAuthority` switches silently hit default arms because of UPPERCASE-vs-lowercase comparison. Between v1.98 and the merge of PR-22B (#482), any lifecycle-telemetry consumer saw `PreserveAuthority`/`AuthorityNone` regardless of real decision. Kernel behavior + install_state + update-history unaffected — only the lifecycle bridge's external reporting surface. Forensic interpretation of pre-PR-22B lifecycle output should treat the authority decision as "unknown," not "preserve." internal/installer/uninstall/contract.md — two new sections: 1. Standing lifecycle-truth rule: codifies the merge-discipline constraint — no new lifecycle code may bypass the shared authority predicate, history gate, or dry-run contract. Enumerates the five concrete requirements that every new lifecycle PR must respect, and points at the CI gates that should catch bypass attempts. 2. Pre-PR-23 blockers: explicit table of the six follow-up PRs that must land before PR-23 (uninstall mutation) can start: (1) prior-authority record hardening (2) external-firewall detection unification (3) kernel/service snapshot CI gate (4) exec-trace CI gate (5) auto-elevate shim removal gate (6) payload integrity minimum checks Plus the Phase 3 gating rule: verification audit after items 1-6 land, with three focused questions, no exploratory scope. No code changes. No behavior changes. Institutional-memory commit. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

itcmsgr and others added 2 commits April 20, 2026 00:05

itcmsgr merged commit bd7ea75 into main Apr 19, 2026
60 checks passed

itcmsgr deleted the fix/v1.100-pr-22b-lifecycle-truth-repair branch April 19, 2026 21:22

itcmsgr mentioned this pull request Apr 19, 2026

chore(v1.100 Phase 1): lifecycle-bridge data-integrity note + pre-PR-23 blockers + standing lifecycle-truth rule #483

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.100 PR-22B — lifecycle truth repair (dry-run honesty + history gating + authority alignment + panel consent)#482

v1.100 PR-22B — lifecycle truth repair (dry-run honesty + history gating + authority alignment + panel consent)#482
itcmsgr merged 3 commits intomainfrom
fix/v1.100-pr-22b-lifecycle-truth-repair

itcmsgr commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

itcmsgr commented Apr 19, 2026

Why this PR exists

What changed (12-item mapping)

1. Dry-run honesty

2. History gating

3. Authority predicate + Ambiguous + panel consent

4. Lifecycle bridge truthfulness

5. Flag validation

6. CI truth surfaces

7. Purity tests + reusable audit harness

Falsifiability proof

Explicit non-goals

Acceptance criteria (repair contract §6 extended)

Test plan

Uh oh!

github-actions Bot commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

OpenSSF Scorecard

Scanned Files

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Apr 19, 2026 •

edited

Loading