Conversation
Independent audit of merged PR #480 found three concrete violations of the PR-22 contract Stop Condition ("no writes under /var/lib/nftban/"): 1. os.WriteFile(<stateDir>/uninstall_plan.json, …) on every dry-run 2. sf.Transition(StateUninstallPlanning, …) persisted install_state 3. unconditional writeHistory() recorded a successful plan preview as "install_fail" in update-history.json — silently poisoning every operator dashboard that alerts on install_fail Plus one classifier false-negative: 4. partial nftban (table OR daemon, not both) WITHOUT external firewall fell through to AuthorityNone — PR-23 release logic would skip kernel cleanup of the orphan table Plus CI blind spot: 5. G3-UN-NO-MUTATION grep missed os.WriteFile/Create/MkdirAll/Rename; snapshot step did not cover /var/lib/nftban/ at all This PR is BOUNDARY REPAIR ONLY. It does not add uninstall mutation, does not change purge/remove semantics, does not expand prior-authority logic, does not begin PR-23. Scope locked to R1-R5 from the repair contract seed. R1 (uninstall_dryrun.go): remove os.WriteFile + sf.Transition. Plan renders to stdout only. Option B — no installer-state persistence during dry-run. R2 (main.go): guard writeHistory with cfg.mode != "uninstall" so a successful plan preview is not recorded as install_fail. R3 (authority.go): explicit case nftbanPartial && !extPresent → AuthorityAmbiguous with diagnostic note; tighten the AuthorityNone case with !nftbanPartial. Regression test added (TestClassify_PartialNFTBan_NoExternal_IsAmbiguous) plus symmetric daemon-up-no-table case. R4 (uninstall_dryrun_test.go): new falsifiable purity test calls runUninstallDryRun with MockExecutor + real tempdir and asserts: - zero executor writes (WriteFileAtomic) - zero directory creates - zero mutation commands (nft add/flush/delete, systemctl lifecycle, ufw/firewall-cmd, package-manager removal) - zero files on disk under temp stateDir Two tests: no-authority host and ambiguous host. R5 (ci-uninstall-canonization.yml): - extend grep patterns: os.WriteFile/Create/MkdirAll/Mkdir/Rename, sf.Transition, nft create, apt-get purge, dnf erase - widen E2E snapshot to include /var/lib/nftban/ (was only /etc/nftban + /usr/lib/nftban + /usr/sbin/nftban*) - new G3-UN-HISTORY-PURITY gate: seeds realistic update-history.json, runs dry-run twice (explicit + implicit), asserts byte-identical hash — catches the exact class that escaped PR #480 - run the new orchestrator purity test in CI Acceptance criteria (all pass by construction): - uninstall dry-run no longer writes misleading plan/history artifacts under /var/lib/nftban/ - successful uninstall dry-run does not create install_fail - partial nftban without external returns AuthorityAmbiguous - purity test exists and would fail on regression - CI would fail on the exact write-pattern class that escaped in PR #480 (os.WriteFile + history pollution + snapshot blind spot) - no uninstall mutation capability added - scope remains boundary repair only Depends on: 547aa08 (PR #480) Refs: V1100_LIFECYCLE_COMPLETION_CONTRACT.md §13 (frozen 2026-04-19) Refs: internal/installer/uninstall/contract.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
The G3-UN-NO-MUTATION grep extension added in PR-22A matched historical references in the doc comment that explains what was removed from the orchestrator. The grep is text-based so it cannot distinguish between code and comments. Reword the comment to describe removed behavior in prose rather than literal Go identifiers. Zero behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
itcmsgr
added a commit
that referenced
this pull request
Apr 19, 2026
…y gating, authority alignment, panel consent Extended audit found systemic issues beyond uninstall that would invite the exact class of false confidence PR-22A closed, but for install and update paths. This PR is boundary-repair ONLY — does not add uninstall mutation, does not add install preview capability, does not begin PR-23 work. PR-22B does not add install preview capability; it removes false dry-run semantics by refusing unsupported install dry-run invocations. Scope (12 items from the repair contract): ## 1. Dry-run honesty - flags.go: refuse `--mode=install --dry-run` with an explicit error. An honest install dry-run orchestrator is deferred — until it lands, the flag combination has no truthful meaning. - state.StateFile.DryRun field + guard in Transition(): when DryRun is true, in-memory fields are updated but no file write occurs. main.go wires sf.DryRun = cfg.dryRun at construction. Any shared phase function (phaseDetect in particular) that was persisting install_state during update dry-run is now in-memory-only. - update_dryrun.go: removed os.WriteFile(update_plan.json). Plan renders to stdout only; no default filesystem persistence. ## 2. History gating (terminal-state allowlist) - state.IsApplyTerminal(s) helper — explicit allowlist of states representing a completed apply attempt. Catch-all default mapped StateDetectComplete / StateUninstallPlanning / intermediate states to install_fail; now they are excluded from history writes entirely. - main.go: writeHistory gated on `!cfg.dryRun && IsApplyTerminal(sf.State)`. Removes the PR-22A mode-name heuristic (never mode-based, always structural). ## 3. Authority predicate + Ambiguous + panel consent - authority.IsNftbanAuthoritative(exec) — canonical shared predicate requiring table + chain + active daemon. update.Preflight P-1 now uses this single source of truth; no two different definitions of "nftban authoritative." - authority.Decision gains `Ambiguous` — orphan table or daemon-without-table routes here instead of silently classifying as Update or Fresh. - authority.Classify signature adds panelAutoApprove bool. Panel detection alone no longer auto-approves takeover — operators who want the old behaviour must pass --panel-auto-takeover explicitly (flag + NFTBAN_PANEL_AUTO_TAKEOVER=1 env mirror). - phases.go phaseSwitch treats Ambiguous with the same pre-switch emergency-SSH injection as Takeover/Fresh. Never silent-continue, never skip safety paths. ## 4. Lifecycle bridge truthfulness - lifecycle_bridge observeResult: DryRun is wired from cfg.dryRun (was hard-coded false). - observePlan / mapAuthority: switches now compare against authority.Decision constants. Previous version compared UPPERCASE enum values against lowercase literals — every switch silently hit default, so every lifecycle consumer saw PreserveAuthority regardless of the real decision. - internal/lifecycle: new AuthorityUnknown owner for the Ambiguous case. ## 5. Flag validation (reject contradictory combinations) - refuse --mode=install --dry-run - refuse --takeover with --dry-run - reject --rpm with --deb - reject --force-delete-operator-config without --purge (both for install paths and the uninstall early-return block) ## 6. CI truth surfaces - ci-update-canonization.yml G3-U3: snapshot + hard-assert /var/lib/nftban/update-history.json and /var/lib/nftban/state/ install_state byte-identical after dry-run. Removed || true soft diff handling. Widened G3-U5..U10 structural grep to include update_dryrun.go and extended the pattern list (os.WriteFile, os.Create, os.MkdirAll, os.Rename, nft create, apt-get purge, dnf erase). - ci-install-canonization.yml NEW: G3-IN-REFUSE-DRY-RUN gate asserts `--mode=install --dry-run` exits non-zero with an explicit refusal message; G3-IN-FLAG-COMBOS asserts invalid combos are rejected; no history pollution from refused runs. ## 7. Forbidden-side-effect tests + reusable audit harness - internal/installer/audit/harness.go: reusable PurityHarness with AssertNoExecutorWrites / AssertNoDirectoryCreations / AssertNoMutationCommands / AssertNoStateDirEntries. Self-tests in harness_test.go prove it catches each class. - state machine_test.go: IsApplyTerminal allowlist + DryRun-Transition regression tests. - authority classify_test.go: Ambiguous on orphan-table, symmetric on daemon-without-table, table-without-chain routed to Ambiguous, predicate requires all three (table+chain+daemon). - uninstall_test.go: render ↔ JSON equivalence grid (audit item 11). - history_test.go: write-gate predicate unit tests (dry-run skips, non-terminal skips, apply-terminal writes). Explicit non-goals: - no uninstall mutation added - no install preview capability added (PR-22B only removes false dry-run semantics by refusing) - no purge/remove semantics change - no validator contract change - no generic history redesign beyond the terminal-state gate - no PR-23 work begun Depends on: #481 (PR-22A uninstall boundary repair) Refs: V1100_LIFECYCLE_COMPLETION_CONTRACT.md §13 (frozen 2026-04-19) Refs: internal/installer/uninstall/contract.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
14 tasks
itcmsgr
added a commit
that referenced
this pull request
Apr 19, 2026
…ry gating + authority alignment + panel consent (#482) Closes the systemic install/update audit findings. Scope-locked boundary repair: - Dry-run honesty: `--mode=install --dry-run` refused explicitly; `--repair --dry-run` refused; `StateFile.DryRun` suppresses `Transition` writes; `update_dryrun.go` no longer writes `update_plan.json`. - History gating: new `state.IsApplyTerminal(s)` explicit allowlist; `writeHistory` gated `!cfg.dryRun && IsApplyTerminal(sf.State)` — structural, not mode-heuristic. - Authority: canonical `IsNftbanAuthoritative(exec)` (table + chain + daemon) — used by both `authority.Classify` and `update.Preflight` P-1; new `Ambiguous` variant for orphan-table/daemon-down; `phaseSwitch` routes `Ambiguous` through emergency-SSH injection. - Panel consent: silent panel auto-approve replaced with `--panel-auto-takeover` default-off flag + `NFTBAN_PANEL_AUTO_TAKEOVER` env mirror. - Lifecycle bridge: real `DryRun` wired from `cfg.dryRun`; `observePlan`/`mapAuthority` switches pinned to `authority.Decision` constants (fixes pre-existing UPPERCASE-vs-lowercase bug that silently mapped every consumer to `PreserveAuthority` from v1.98). - Flag validation: `--takeover+--dry-run`, `--rpm+--deb`, `--force-delete-operator-config` without `--purge` all rejected. - CI: new `ci-install-canonization.yml` (G3-IN-REFUSE-DRY-RUN + G3-IN-FLAG-COMBOS); tightened `ci-update-canonization.yml` G3-U3 with history + state hard assertions, no `|| true`; extended G3-U5..U10 grep scope + patterns. - Reusable `audit.PurityHarness` at `internal/installer/audit/harness.go`, adopted by uninstall + new update purity tests. Three re-audit blockers closed in commit 966a175: refuse `--repair --dry-run`; port uninstall test to `PurityHarness` + add update purity test; fix `IsApplyTerminal` alias doc typo; plus `nft list chain ip nftban input` added to `ApplyWhitelist` for the tightened predicate. Pre-PR-23 blockers (not in scope; tracked as follow-up PRs): 1. Prior-authority record hardening (timestamp + installer version + active_at_install semantics) 2. External-firewall detection unification 3. Kernel/service snapshot CI gate 4. Exec-trace CI gate 5. Auto-elevate uninstall shim removal gate 6. Payload integrity minimum checks Depends on: #481 (PR-22A) Refs: V1100_LIFECYCLE_COMPLETION_CONTRACT.md §13 (frozen 2026-04-19)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why this PR exists
Independent audit of merged PR #480 (commit
547aa089) found that PR-22's observational scaffold shipped with three concrete contract violations plus a classifier false-negative plus a CI blind spot. This PR is a boundary-repair PR, not a feature PR. It does not advance uninstall mutation.Scope: R1–R5 from the repair contract seed.
Depends on:
547aa089(PR #480).A. Before/after behavior — filesystem writes during dry-run
Option B chosen from the repair contract: installer-state persistence during uninstall dry-run is removed entirely. A read-only planning run has no reason to mutate install_state.
B. Classifier bug proof — partial-nftban false-negative
Why it matters: PR-23's release logic will read `CurrentAuthority`. If the classifier says `AuthorityNone` while the kernel still holds an orphan `ip nftban` table, release skips kernel cleanup and the table persists. This is exactly the silent-takeover-setup risk the PR-22 contract promised to prevent.
Regression tests added:
Existing `TestClassify_PartialNFTBan_WithExternal_IsAmbiguous` already covered the with-external case.
Code change: `authority.go` now has an explicit `case nftbanPartial && !extPresent: AuthorityAmbiguous`. The none-case is tightened to `!nftbanPresent && !nftbanPartial && !extPresent` so the pattern is exhaustive by construction.
C. Purity proof — falsifiable tests that catch the previous failure class
New unit test: `cmd/nftban-installer/uninstall_dryrun_test.go`
New CI jobs (on top of existing `G3-UN-NO-MUTATION` + `G3-UN-PLAN-RENDERS`):
Falsifiability check — could PR #480's bug have passed this CI? No.
D. Scope proof — what this PR does NOT do
This PR does not add uninstall mutation, does not flush or remove nftables tables, does not disable services or timers, does not delete files from `/etc/nftban/`, does not restore external firewalls, does not change purge/remove semantics, does not expand prior-authority restore logic, does not redesign installer history generally, does not tighten `PriorRecordUsable` semantics (audit item 9 — deferred to PR-23/24), does not add flag-combination validation (audit item 10 — deferred), does not touch the auto-elevation block in `flags.go` (documented in `contract.md` as PR-23 gate), and does not begin PR-23 work in any form.
If any proposed change is not required to close R1–R5, it is out of scope for this PR.
Acceptance criteria (repair contract §6)
Merge standard
Per the repair contract §9: this PR is a repair gate for PR-23. PR-23 must not start until this PR is merged and verified. A green result means only this:
It does not authorize mutation work by itself.
Test plan
🤖 Generated with Claude Code