Skip to content

feat(v1.100 PR-23): uninstall mutation phase 1 — authority release core#491

Merged
itcmsgr merged 3 commits intomainfrom
feat/v1.100-pr-23-uninstall-mutation-authority-release
Apr 20, 2026
Merged

feat(v1.100 PR-23): uninstall mutation phase 1 — authority release core#491
itcmsgr merged 3 commits intomainfrom
feat/v1.100-pr-23-uninstall-mutation-authority-release

Conversation

@itcmsgr
Copy link
Copy Markdown
Owner

@itcmsgr itcmsgr commented Apr 20, 2026

First bounded uninstall mutation path. Replaces the PR-22 auto-elevate shim with explicit consent (`--dry-run` XOR `--confirm-mutation`) and adds the kernel + service authority-release sequence. Per locked contract seed: authority release ONLY.

Scope (locked)

  • ✅ Kernel: flush + delete `ip nftban` / `ip6 nftban` tables
  • ✅ Service: stop + disable + mask `nftband.service`
  • ✅ Safety: emergency SSH injection before mutation, removal after
  • ✅ End-state validation before emergency teardown

Not in scope:

  • ❌ NO filesystem artifact deletion
  • ❌ NO `.conf.local` read/write
  • ❌ NO external firewall restoration (PR-24)
  • ❌ NO purge vs remove matrix (PR-25)
  • ❌ NO uninstall-history schema (Option A — tracked as follow-up)

Consent model

Flag Behavior
`--mode=uninstall --dry-run` Observational plan
`--mode=uninstall --confirm-mutation` Authority release core
`--mode=uninstall` (neither) REFUSED
`--mode=uninstall --dry-run --confirm-mutation` REFUSED

The G3-UN-SHIM-LOCK gate (PR-P2-5) enforced the shim-removal + mutation-landing coupling: `shim_present=0` + `mutation_present=1` → PASS post-PR-23 state.

Preflight refusal table (correction 1 — ambiguity split)

`Classify.State` `Classify.Ambiguity` Apply behavior
`AuthorityNFTBan` `AmbiguityNone` Proceed
`AuthorityAmbiguous` `AmbiguityOrphanNFTBan` Proceed (recoverable — clean up orphan nftban)
`AuthorityAmbiguous` `AmbiguityConflictExternal` Refuse — operator must resolve
`AuthorityExternal` Refuse — nothing to release
`AuthorityNone` Refuse — nothing to release

Mutation sequence (10 ordered steps)

  1. Inject emergency SSH safety table
  2. Stop `nftband.service`
  3. Flush `ip nftban`
  4. Flush `ip6 nftban` (skip if absent)
  5. Delete `ip nftban`
  6. Delete `ip6 nftban` (skip if absent)
  7. Disable `nftband.service`
  8. Mask `nftband.service`
  9. Validate end-state (authority released + service masked + emergency SSH still present — correction 2)
  10. Remove emergency SSH (warn-only on failure)

Failure mapping (explicit, not implicit)

Step failed Terminal state
1 (inject) `StateFailedNoFirewall`
2 (stop) log warn, continue
3–6 (flush/delete) `StateUninstallFailedRelease`
7–8 (disable/mask) `StateDegraded`
9 (validation) `StateUninstallFailedRelease`
10 (emergency remove) `StateUninstallReleased` + warn log
all pass `StateUninstallReleased`

History representation (Option A locked 2026-04-20)

Uninstall events are intentionally excluded from `update-history.json`. The install-centric status vocabulary cannot truthfully represent uninstall outcomes; rather than lie or over-engineer, skip entirely. Forensic trail: `/var/log/nftban/installer.log`. Dedicated uninstall-history schema tracked as explicit follow-up in contract.md.

Tests

  • `apply_test.go` (7 tests): HappyPath, EmergencyInjectFail_NoMutation, FlushFail_MidFlight_FailedRelease, ServiceMaskFail_Degraded (positive control), OrphanNFTBan_NoExternal_RunsFullSequence, IPv6Absent_SkipsGracefully, Validation_RequiresEmergencyStillPresentAtStep9. One test (Step9_IP4TableStillPresent) marked `t.Skip` pending deeper mock-hook infrastructure; covered by real-host evidence.
  • `uninstall_test.go`: AmbiguityKind populated assertions on every classifier ambiguity test + new `TestClassify_Invariant_NonAmbiguousHasNoAmbiguityKind`.
  • `machine_test.go`: `StateUninstallReleased` + `StateUninstallFailedRelease` mappings in IsApplyTerminal + ExitCode.

Real-host evidence (MERGE BLOCKER per reviewer checklist §9 — SATISFIED)

Binary provenance: frozen-commit CI artifact, `sha256=dba88f4a2fa58d09e2a67c1f53db96e94e4b757c007d02776774b1f666c1cdb3`, identical on both hosts.

lab2 — Ubuntu 24.04 / DEB (2026-04-20)

Criterion Result
`ip nftban` absent after apply
`ip6 nftban` absent after apply
emergency artifact absent after apply
`nftband.service` inactive
`nftband.service` masked (LoadState + UnitFileState)
apply exit code 0
SSH continuity (sshd pid unchanged)
terminal state `UNINSTALL_RELEASED`
reinstall round-trip (`nftban firewall rebuild` → classifier re-reads `AuthorityNFTBan`)

lab2 also exercised the PR-P2-2A Path B classifier fix (merged as #492): Ubuntu's stock `inet filter` nftables baseline was misclassified as iptables before the fix. Corroboration rule now requires iptables.service active OR iptables-save rules ≥3 before classifying iptables; ghost table alone is informational.

lab4 — AlmaLinux 9.7 / RPM (2026-04-20)

Criterion Result
`ip nftban` absent after apply
`ip6 nftban` absent after apply
emergency artifact absent after apply
`nftband.service` inactive
`nftband.service` masked (LoadState + UnitFileState)
apply exit code 0
SSH continuity (sshd pid 811021 unchanged)
terminal state `UNINSTALL_RELEASED`
wall clock 857ms (lab2 comparable)

Alma ships no Ubuntu-style `inet filter` baseline, so no classifier stress — clean-host path was exercised with zero regression. Step-by-step mutation sequence was identical to lab2. No material behavior difference between the two distro families.

Coverage caveat (explicit, for reviewer honesty)

Real-host coverage does not include a naturally occurring `AmbiguityOrphanNFTBan` host state (ip nftban table present without the nftband.service running, and no external firewall). That branch is covered by unit tests (`TestApply_OrphanNFTBan_NoExternal_RunsFullSequence`) and conservative preflight logic, but not by live-host execution evidence in this PR. lab2 and lab4 were both in pure `AuthorityNFTBan` state during the run.

Extended evidence (deferred, non-blocking)

  • monitor — not required for merge; may run post-merge as extended evidence.
  • srv1 — not required for merge; deliberately deferred as the live-BotGuard host where real traffic makes PR-23 scope-risky regardless.
  • lab4 reinstall round-trip — not required; lab2 round-trip is sufficient representative proof of lifecycle coherence.

Follow-up items (tracked, not blocking merge)

  1. Uninstall-history schema decision — define a truthful uninstall-event record (separate file, schema extension, or dedicated CLI). Preferred before PR-24.
  2. Mock-hook infrastructure for step-9 residual test — current `MockExecutor.NftDeleteTable` unconditionally mutates; deeper hook needed to simulate "delete returned ok but table persists" edge. Covered by real-host evidence for now.

Test plan

  • `Build & Test` green — all existing tests + 7 new apply tests pass
  • `ci-uninstall-canonization` matrix green:
    • G3-UN-SHIM-LOCK: shim_present=0 + mutation_present=1 → PASS post-PR-23
    • G3-UN-NO-MUTATION: apply.go excluded; observational surface still clean
    • G3-UN-PLAN-RENDERS: dry-run unchanged, still passes
    • G3-UN-HISTORY-PURITY: history file untouched by any uninstall invocation
    • G3-EXEC-TRACE: wraps only dry-run path, still passes
    • G3-KS-SNAPSHOT: wraps only dry-run path, still passes
  • `ci-install-canonization` + `ci-update-canonization` green (no touch)
  • lab2 real-host evidence: install → uninstall → verify kernel/service released → reinstall OK
  • lab4 real-host evidence: same

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 20, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

itcmsgr and others added 3 commits April 20, 2026 12:11
First bounded uninstall mutation path. Replaces the PR-22 auto-elevate
shim with explicit consent (--dry-run XOR --confirm-mutation) and adds
the kernel + service authority-release sequence. Per locked contract
seed: authority release ONLY, no filesystem deletion, no external
firewall restoration, no purge matrix.

## Scope (locked 2026-04-20; approved corrections 2026-04-20)

- ✅ Kernel: flush + delete ip nftban / ip6 nftban tables
- ✅ Service: stop + disable + mask nftband.service
- ✅ Safety: emergency SSH injection before mutation, removal after
- ✅ End-state validation before emergency teardown

**Not in scope** (scope-lock enforced):
- ❌ NO filesystem artifact deletion
- ❌ NO .conf.local read/write
- ❌ NO external firewall restoration (PR-24)
- ❌ NO purge vs remove matrix (PR-25)
- ❌ NO post-verify gate beyond step 9 (PR-26)
- ❌ NO uninstall-history schema (tracked as follow-up — Option A locked)

## Consent model

`--mode=uninstall` now requires exactly ONE of:

  --dry-run          : observational plan (unchanged from PR-22)
  --confirm-mutation : authority release core (PR-23 new)

Passing neither is refused. Passing both is refused. This replaces
the PR-22 auto-elevate shim that PR-P2-5 (G3-UN-SHIM-LOCK) was
designed to force into removal at PR-23 time — the gate's
shim_present flips 0 at the same commit mutation_present flips 1.

## Classifier — new AmbiguityKind field (purely additive)

`ClassifyResult.Ambiguity` sub-classifies AuthorityAmbiguous so Apply
can distinguish recoverable from blocking ambiguity without re-
deriving host signals:

  AmbiguityNone              : State != Ambiguous (invariant)
  AmbiguityOrphanNFTBan      : partial nftban + NO external
                              → RECOVERABLE — Apply proceeds via
                                emergency-SSH path to clean up
  AmbiguityConflictExternal  : full nftban + external, OR
                              partial nftban + external, OR
                              multiple externals active
                              → BLOCKING — Apply refuses

Consumers read the classifier output rather than re-probing (prevents
the divergent-interpretation drift that the audits kept flagging).

## Mutation sequence (10 explicit steps in uninstall/apply.go)

 1. Inject emergency SSH safety table (switchop.InjectEmergencySSH)
 2. Stop nftband.service (prevent it fighting kernel mutation)
 3. Flush ip nftban
 4. Flush ip6 nftban (skip if absent)
 5. Delete ip nftban
 6. Delete ip6 nftban (skip if absent)
 7. Disable nftband.service
 8. Mask nftband.service (prevent accidental restart)
 9. Validate end-state:
      - no ip/ip6 nftban table
      - nftband.service not active
      - emergency SSH table STILL PRESENT (correction 2 — step 10
        is what removes it, not step 9)
10. Remove emergency SSH (warn-only on failure)

## Failure mapping (explicit, not implicit)

| Step | Kernel state after | Terminal state |
|------|-------------------|----------------|
|  1   | untouched         | StateFailedNoFirewall |
|  2   | untouched         | log warn, continue |
| 3-6  | partial cleanup   | StateUninstallFailedRelease |
| 7-8  | kernel released   | StateDegraded |
|  9   | end-state wrong   | StateUninstallFailedRelease |
| 10   | released + warn   | StateUninstallReleased |
| all  | AuthorityNone     | StateUninstallReleased |

Every failure branch has a declared post-state; no silent partial
success; no silent fallback.

## New state machine terminals

- StateUninstallReleased      : success; ExitCode → ExitCommitted (0)
- StateUninstallFailedRelease : mutation started but incomplete;
                                ExitCode → ExitFailed (2)

Both IsApplyTerminal() return true. History write guard in main.go
ALSO excludes cfg.mode=="uninstall" so these states never land in
update-history.json (Option A — see below).

## History representation (Option A locked 2026-04-20)

update-history.json has install-centric status vocabulary
(success / install_fail / verify_fail). None truthfully represents
uninstall success. Rather than lie (record as "success") or
over-engineer (add schema values / new file) inside PR-23, skip
entirely:

  main.go writeHistory guard now reads:
    if !cfg.dryRun && cfg.mode != "uninstall" && IsApplyTerminal(state)

Forensic trail for uninstall events is /var/log/nftban/installer.log
(structured step-by-step audit).

A dedicated uninstall-history schema is tracked as an explicit
follow-up item in contract.md. Not a PR-24 blocker; a truthfulness
gap until resolved.

## Files

- internal/installer/uninstall/authority.go     — AmbiguityKind + field
- internal/installer/uninstall/apply.go (NEW)   — Apply orchestrator
- internal/installer/uninstall/apply_test.go(N) — 7 tests (happy +
                                                  emergency-inject-fail +
                                                  flush-mid-fail +
                                                  mask-fail positive
                                                  control + orphan-
                                                  cleanup +
                                                  ipv6-absent-skip +
                                                  step-9 invariant;
                                                  step-9-residual marked
                                                  Skip pending mock-hook
                                                  infrastructure)
- internal/installer/uninstall/uninstall_test.go — AmbiguityKind tests
- internal/installer/state/machine.go           — new terminals
- internal/installer/state/machine_test.go      — terminal mappings
- cmd/nftban-installer/flags.go                 — shim removal +
                                                  --confirm-mutation
- cmd/nftban-installer/main.go                  — routing + history gate
- cmd/nftban-installer/uninstall_apply.go (NEW) — dispatcher
- internal/installer/uninstall/contract.md      — PR-23 contract doc
- .github/workflows/ci-uninstall-canonization.yml
                                                — G3-UN-NO-MUTATION
                                                  narrowed to exclude
                                                  apply.go (legitimate
                                                  mutation surface)

## CI gate interaction

- G3-UN-SHIM-LOCK: shim_present=0 + mutation_present=1 → PASS (post-
  PR-23 state). This is the coupling PR-P2-5 was designed to enforce.
- G3-UN-NO-MUTATION: scope narrowed with `grep -vE '/apply\.go$'` so
  the dry-run observational surface remains mutation-free while
  apply.go is exempt from the forbidden-pattern list.
- G3-EXEC-TRACE: unchanged; wraps only the dry-run invocation, not the
  new confirm-mutation path.
- G3-KS-SNAPSHOT: unchanged; dry-run-scoped.
- G3-UN-PLAN-RENDERS: unchanged; covers only the dry-run render.

The mutation-path gates (real-host evidence of correct authority
release) are documented in the PR body as a merge blocker per
reviewer checklist §9 — to be captured on lab2 + lab4 before merge.

Refs: V1100_LIFECYCLE_COMPLETION_CONTRACT.md §13 (frozen 2026-04-19)
      internal/installer/uninstall/contract.md §"PR-23 authority
      release — landed contract"
Authorization: locked PR-23 contract seed + reviewer checklist
(2026-04-20) with approved corrections (AmbiguityKind split +
step-9 emergency-still-present invariant + Option A history)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…injection hook

Root cause of 5 failing apply tests: MockExecutor.Run("nft","-f",tmpfile)
returns exit 0 but doesn't mutate the NftTables map, so Apply's step-9
validation (which checks NftTableExists("inet", "nftban_install_emergency")
per correction 2) correctly reported "emergency SSH unexpectedly missing"
and returned UNINSTALL_FAILED_RELEASE.

The production code is right; the mock needs kernel fidelity for what
`nft -f <rules-file>` does on a real host (creates the table).

Fix: new hookEmergencySSHInject helper registers an OnCommand callback
on the "nft -f /tmp/.nftban-emergency-ssh.nft" key that sets
NftTables["inet:nftban_install_emergency"]=true when Run fires. Paired
with MockExecutor.NftDeleteTable which already removes the entry when
switchop.RemoveEmergencySSH runs at step 10.

Applied to seedAuthoritativeHost + the two tests that build mocks
directly (TestApply_OrphanNFTBan_NoExternal + TestApply_IPv6Absent).

TestApply_EmergencyInjectFail_NoMutation unaffected — the forced
exit-1 result still makes switchop.InjectEmergencySSH return an error,
so Apply's EmergencyInjected stays false and the test passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sent-refusal gates

The uninstall canonization workflow had a gate asserting that
`--mode=uninstall` without --dry-run auto-elevates to dry-run and exits 0.
That was the correct PR-22 behaviour but is a contract violation under
PR-23 — the auto-elevate shim was removed and the correct post-PR-23
behaviour is an explicit refusal demanding --dry-run OR --confirm-mutation.

Replace the single obsolete gate with two PR-23 gates:

- G3-UN-CONSENT-REQUIRED: bare `--mode=uninstall` must exit non-zero
  with a message mentioning the two-flag choice, touch NOTHING on disk.
- G3-UN-CONSENT-BOTH-FLAGS: `--mode=uninstall --dry-run --confirm-mutation`
  must also refuse with a "mutually exclusive" message.

Together these enforce the consent model at the CLI surface:
  neither → refused
  dry-run → observational
  confirm-mutation → apply
  both → refused

No code changes — CI gate update only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@itcmsgr itcmsgr force-pushed the feat/v1.100-pr-23-uninstall-mutation-authority-release branch from 6a20ea8 to 1d5b2ad Compare April 20, 2026 09:11
@itcmsgr itcmsgr merged commit 78c4248 into main Apr 20, 2026
60 checks passed
@itcmsgr itcmsgr deleted the feat/v1.100-pr-23-uninstall-mutation-authority-release branch April 20, 2026 10:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant