Conversation
…y gating, authority alignment, panel consent Extended audit found systemic issues beyond uninstall that would invite the exact class of false confidence PR-22A closed, but for install and update paths. This PR is boundary-repair ONLY — does not add uninstall mutation, does not add install preview capability, does not begin PR-23 work. PR-22B does not add install preview capability; it removes false dry-run semantics by refusing unsupported install dry-run invocations. Scope (12 items from the repair contract): ## 1. Dry-run honesty - flags.go: refuse `--mode=install --dry-run` with an explicit error. An honest install dry-run orchestrator is deferred — until it lands, the flag combination has no truthful meaning. - state.StateFile.DryRun field + guard in Transition(): when DryRun is true, in-memory fields are updated but no file write occurs. main.go wires sf.DryRun = cfg.dryRun at construction. Any shared phase function (phaseDetect in particular) that was persisting install_state during update dry-run is now in-memory-only. - update_dryrun.go: removed os.WriteFile(update_plan.json). Plan renders to stdout only; no default filesystem persistence. ## 2. History gating (terminal-state allowlist) - state.IsApplyTerminal(s) helper — explicit allowlist of states representing a completed apply attempt. Catch-all default mapped StateDetectComplete / StateUninstallPlanning / intermediate states to install_fail; now they are excluded from history writes entirely. - main.go: writeHistory gated on `!cfg.dryRun && IsApplyTerminal(sf.State)`. Removes the PR-22A mode-name heuristic (never mode-based, always structural). ## 3. Authority predicate + Ambiguous + panel consent - authority.IsNftbanAuthoritative(exec) — canonical shared predicate requiring table + chain + active daemon. update.Preflight P-1 now uses this single source of truth; no two different definitions of "nftban authoritative." - authority.Decision gains `Ambiguous` — orphan table or daemon-without-table routes here instead of silently classifying as Update or Fresh. - authority.Classify signature adds panelAutoApprove bool. Panel detection alone no longer auto-approves takeover — operators who want the old behaviour must pass --panel-auto-takeover explicitly (flag + NFTBAN_PANEL_AUTO_TAKEOVER=1 env mirror). - phases.go phaseSwitch treats Ambiguous with the same pre-switch emergency-SSH injection as Takeover/Fresh. Never silent-continue, never skip safety paths. ## 4. Lifecycle bridge truthfulness - lifecycle_bridge observeResult: DryRun is wired from cfg.dryRun (was hard-coded false). - observePlan / mapAuthority: switches now compare against authority.Decision constants. Previous version compared UPPERCASE enum values against lowercase literals — every switch silently hit default, so every lifecycle consumer saw PreserveAuthority regardless of the real decision. - internal/lifecycle: new AuthorityUnknown owner for the Ambiguous case. ## 5. Flag validation (reject contradictory combinations) - refuse --mode=install --dry-run - refuse --takeover with --dry-run - reject --rpm with --deb - reject --force-delete-operator-config without --purge (both for install paths and the uninstall early-return block) ## 6. CI truth surfaces - ci-update-canonization.yml G3-U3: snapshot + hard-assert /var/lib/nftban/update-history.json and /var/lib/nftban/state/ install_state byte-identical after dry-run. Removed || true soft diff handling. Widened G3-U5..U10 structural grep to include update_dryrun.go and extended the pattern list (os.WriteFile, os.Create, os.MkdirAll, os.Rename, nft create, apt-get purge, dnf erase). - ci-install-canonization.yml NEW: G3-IN-REFUSE-DRY-RUN gate asserts `--mode=install --dry-run` exits non-zero with an explicit refusal message; G3-IN-FLAG-COMBOS asserts invalid combos are rejected; no history pollution from refused runs. ## 7. Forbidden-side-effect tests + reusable audit harness - internal/installer/audit/harness.go: reusable PurityHarness with AssertNoExecutorWrites / AssertNoDirectoryCreations / AssertNoMutationCommands / AssertNoStateDirEntries. Self-tests in harness_test.go prove it catches each class. - state machine_test.go: IsApplyTerminal allowlist + DryRun-Transition regression tests. - authority classify_test.go: Ambiguous on orphan-table, symmetric on daemon-without-table, table-without-chain routed to Ambiguous, predicate requires all three (table+chain+daemon). - uninstall_test.go: render ↔ JSON equivalence grid (audit item 11). - history_test.go: write-gate predicate unit tests (dry-run skips, non-terminal skips, apply-terminal writes). Explicit non-goals: - no uninstall mutation added - no install preview capability added (PR-22B only removes false dry-run semantics by refusing) - no purge/remove semantics change - no validator contract change - no generic history redesign beyond the terminal-state gate - no PR-23 work begun Depends on: #481 (PR-22A uninstall boundary repair) Refs: V1100_LIFECYCLE_COMPLETION_CONTRACT.md §13 (frozen 2026-04-19) Refs: internal/installer/uninstall/contract.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.OpenSSF Scorecard
Scanned Files
|
Two CI failures from the first PR-22B push: 1. go vet failure in internal/installer/audit/harness_test.go — my recordingT mock could not satisfy testing.TB (unexported method). Switch the harness method signatures from *testing.T to testing.TB so any TB implementation works; rewrite self-tests to use t.Run's return value (expectInnerFail helper) to observe subtest failures without implementing TB externally. 2. G3-U3 FAIL because the history-file seed happened AFTER the before snapshot, so the seeded file appeared as "new" in after. Move the seed BEFORE snapshot so both before and after include it; use size-only snapshot (drop mtime/%T@) so stat-touch no-ops don't trigger the hard diff assertion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ist gap Re-audit of PR-22A+PR-22B stack returned GO WITH CONDITIONS with three must-fix blockers before merge. All three are closed here, plus the ApplyWhitelist entry needed for the new IsNftbanAuthoritative probe. N-1 — refuse --repair --dry-run -------------------------------- Previously the validation block was skipped entirely for repair mode, so --dry-run was silently accepted while phaseSwitch continued to mutate kernel + services. sf.DryRun=true suppresses state-file writes but not kernel mutation. flags.go now refuses the combination with an explicit error before the validation block. N-2 — adopt audit.PurityHarness + add update dry-run purity test ---------------------------------------------------------------- uninstall_dryrun_test.go had its own inline forbidden-command list and inline state-dir check — parallel code to audit.PurityHarness. Replace with a single AssertAllPurity call. No more list-drift risk. cmd/nftban-installer/update_dryrun_test.go NEW — invokes runUpdateDryRun under MockExecutor + temp stateDir and asserts AssertAllPurity. Covers both preflight-pass (happy path) and preflight-fail branches. Closes the audit gap where update dry-run was defended only at CI filesystem-snapshot granularity. Harness redesign for testability: Check* methods return []string of violation messages; Assert* methods call them and report via t.Errorf. Self-tests now exercise Check* directly (no testing.TB mock needed). Previous attempt used a recordingT mock that could not satisfy testing.TB due to its unexported private() method. N-3 — doc comment typo in state/machine.go ------------------------------------------ The package-level IsApplyTerminal func doc said "alias for IsApplyTerminal" (self-referential). Reworded to "alias for the (InstallState) IsApplyTerminal method." Bonus — ApplyWhitelist gap uncovered by re-audit CI --------------------------------------------------- update.IsNftbanAuthoritative added a "nft list chain ip nftban input" probe (required daemon+chain+table predicate). Preflight runs this on every apply entry. The contract-audit harness in update_apply_test.go rejected the command as non-whitelisted. Add "nft list chain ip nftban input" to ApplyWhitelist — read-only, part of preflight P-1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
itcmsgr
added a commit
that referenced
this pull request
Apr 19, 2026
…3 blockers + standing lifecycle-truth rule Post-PR-22B hygiene per approved plan. One tight commit, no code changes. CHANGELOG.md — new [Unreleased] section: - summary of PR-22A + PR-22B structural repair - data-integrity note on the lifecycle-bridge authority-mapping bug: pre-PR-22B `observePlan`/`mapAuthority` switches silently hit default arms because of UPPERCASE-vs-lowercase comparison. Between v1.98 and the merge of PR-22B (#482), any lifecycle-telemetry consumer saw `PreserveAuthority`/`AuthorityNone` regardless of real decision. Kernel behavior + install_state + update-history unaffected — only the lifecycle bridge's external reporting surface. Forensic interpretation of pre-PR-22B lifecycle output should treat the authority decision as "unknown," not "preserve." internal/installer/uninstall/contract.md — two new sections: 1. Standing lifecycle-truth rule: codifies the merge-discipline constraint — no new lifecycle code may bypass the shared authority predicate, history gate, or dry-run contract. Enumerates the five concrete requirements that every new lifecycle PR must respect, and points at the CI gates that should catch bypass attempts. 2. Pre-PR-23 blockers: explicit table of the six follow-up PRs that must land before PR-23 (uninstall mutation) can start: (1) prior-authority record hardening (2) external-firewall detection unification (3) kernel/service snapshot CI gate (4) exec-trace CI gate (5) auto-elevate shim removal gate (6) payload integrity minimum checks Plus the Phase 3 gating rule: verification audit after items 1-6 land, with three focused questions, no exploratory scope. No code changes. No behavior changes. Institutional-memory commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2 tasks
itcmsgr
added a commit
that referenced
this pull request
Apr 19, 2026
…23 blockers + standing lifecycle-truth rule (#483) Post-PR-22B hygiene per approved plan. One tight commit, no code changes. CHANGELOG.md — new [Unreleased] section: - summary of PR-22A + PR-22B structural repair - data-integrity note on the lifecycle-bridge authority-mapping bug: pre-PR-22B `observePlan`/`mapAuthority` switches silently hit default arms because of UPPERCASE-vs-lowercase comparison. Between v1.98 and the merge of PR-22B (#482), any lifecycle-telemetry consumer saw `PreserveAuthority`/`AuthorityNone` regardless of real decision. Kernel behavior + install_state + update-history unaffected — only the lifecycle bridge's external reporting surface. Forensic interpretation of pre-PR-22B lifecycle output should treat the authority decision as "unknown," not "preserve." internal/installer/uninstall/contract.md — two new sections: 1. Standing lifecycle-truth rule: codifies the merge-discipline constraint — no new lifecycle code may bypass the shared authority predicate, history gate, or dry-run contract. Enumerates the five concrete requirements that every new lifecycle PR must respect, and points at the CI gates that should catch bypass attempts. 2. Pre-PR-23 blockers: explicit table of the six follow-up PRs that must land before PR-23 (uninstall mutation) can start: (1) prior-authority record hardening (2) external-firewall detection unification (3) kernel/service snapshot CI gate (4) exec-trace CI gate (5) auto-elevate shim removal gate (6) payload integrity minimum checks Plus the Phase 3 gating rule: verification audit after items 1-6 land, with three focused questions, no exploratory scope. No code changes. No behavior changes. Institutional-memory commit. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why this PR exists
Second independent audit of the install/update lifecycle found systemic issues beyond what PR-22A fixed for uninstall. The install path silently ignored
--dry-run. The update path wrote three files under/var/lib/nftban/during every "dry-run" and recorded successful previews asinstall_fail. The authority classifier had noAmbiguousstate on the install side and implicitly auto-approved takeover on panel-managed hosts. Two different definitions of "nftban authoritative" coexisted. CI whitelisted writes under/var/lib/nftban/state/and never snapshot the history file.This PR is boundary-repair ONLY. It does not advance uninstall mutation, does not add install preview capability, does not begin PR-23 work.
Depends on: #481 (PR-22A uninstall boundary repair).
Scope lock: 12 items from the repair contract seed — nothing else.
What changed (12-item mapping)
1. Dry-run honesty
--mode=install --dry-runupdate_dryrun.go→os.WriteFile(update_plan.json)phaseDetect→sf.Transition(StateDetectComplete)during update dry-runinstall_stateStateFile.DryRun=truesuppressesWriteAtomic2. History gating
state.IsApplyTerminal(s)helper — explicit allowlist of states representing a completed apply attempt (Committed / Degraded / FailedSSH / FailedAbort / FailedRender / FailedRebuild / FailedNoFirewall / FailedTakeover).main.gogate:if !cfg.dryRun && state.IsApplyTerminal(sf.State) { writeHistory(...) }. Structural, not mode-based. Every dry-run + every non-terminal state is excluded by construction.3. Authority predicate + Ambiguous + panel consent
authority.IsNftbanAuthoritative(exec)— canonical shared predicate requiring table + chain + active daemon.update.PreflightP-1 now uses this single source of truth. Previously two weaker definitions existed.authority.DecisiongainsAmbiguous. Orphan table or daemon-without-table routes here.phaseSwitchtreatsAmbiguouswith the same emergency-SSH injection asTakeover/Fresh— never silent-continue.authority.Classifynow takes apanelAutoApprove boolparameter (wired fromcfg.panelAutoTakeover). Panel detection alone no longer auto-approves takeover. Operators must pass--panel-auto-takeoverorNFTBAN_PANEL_AUTO_TAKEOVER=1.4. Lifecycle bridge truthfulness
observeResult.DryRunnow wired fromcfg.dryRun(was hard-codedfalse).observePlan/mapAuthorityswitches: comparedauthority.Decisionvalues (UPPERCASE) against lowercase literals — every switch silently hitdefault, so every consumer sawPreserveAuthorityregardless of the real decision. Now pinned toauthoritypackage constants.Ambiguousroutes throughActionTakeAuthority+ newAuthorityUnknownowner.5. Flag validation
--mode=install --dry-run--takeover --dry-run--rpm --deb--force-delete-operator-configwithout--purge6. CI truth surfaces
ci-update-canonization.yml G3-U3: seedsupdate-history.json, snapshotsinstall_state, hard-asserts byte-identical after dry-run. Removed|| truesoft diff handling. WidenedG3-U5..U10structural grep scope to includeupdate_dryrun.go; extended pattern list (os.WriteFile,os.Create,os.MkdirAll,os.Rename,nft create,apt-get purge,dnf erase).ci-install-canonization.ymlNEW:G3-IN-REFUSE-DRY-RUN(install dry-run exits non-zero with refusal message; history untouched).G3-IN-FLAG-COMBOS(invalid combos rejected).7. Purity tests + reusable audit harness
internal/installer/audit/harness.goNEW: reusablePurityHarnesswithAssertNoExecutorWrites/AssertNoDirectoryCreations/AssertNoMutationCommands/AssertNoStateDirEntries+AssertAllPurityconvenience method. Self-tests prove it catches each class.state.IsApplyTerminalallowlist tests +DryRunTransition regression test.authority.Classifyambiguous-on-orphan-table + symmetric daemon-without-table + table-without-chain + predicate-requires-all-three tests.uninstall.Planrender ↔ JSON equivalence grid (audit item 11).history_test.gowrite-gate predicate unit tests.Falsifiability proof
Could the audit's install/update findings pass this CI? No.
--mode=install --dry-runsilently proceedsG3-IN-REFUSE-DRY-RUNupdate_dryrun.gowritesupdate_plan.jsonG3-U5..U10extendedos.WriteFile(grepphaseDetect→Transitionwrites install_state during update dry-runG3-U3hard assertion oninstall_statesha256writeHistoryrecords update dry-run asinstall_failG3-U3hard assertion onupdate-history.jsonsha256UpdateTestClassify_Ambiguous_OrphanTableTestClassify_PanelDetected_NoFlag_Abortsupdate.Preflightnow importsauthority.IsNftbanAuthoritative— single sourceExplicit non-goals
This PR does not add uninstall mutation, does not add install preview capability, does not change purge/remove semantics, does not redesign installer history generally (only the write-gate predicate), does not tighten
PriorRecordUsablesemantics (deferred), does not add an exec-trace CI gate (deferred), and does not begin PR-23 work.Acceptance criteria (repair contract §6 extended)
Ambiguous+ emergency-SSH inphaseSwitch)Test plan
ci-install-canonizationmatrix green (ubuntu-24.04 + almalinux-9)ci-update-canonizationmatrix green (including new hard-asserts)ci-uninstall-canonizationmatrix green (regression-safe)Runtime Truthmatrix greenBuild & Testgreen (all unit tests including new harness self-tests)🤖 Generated with Claude Code