Skip to content

feat(v1.99 PR-18): update apply orchestration (G3-U5..U10) — CONTRACT-ENFORCED#475

Merged
itcmsgr merged 5 commits intomainfrom
feat/v1.99-pr18-update-apply
Apr 19, 2026
Merged

feat(v1.99 PR-18): update apply orchestration (G3-U5..U10) — CONTRACT-ENFORCED#475
itcmsgr merged 5 commits intomainfrom
feat/v1.99-pr18-update-apply

Conversation

@itcmsgr
Copy link
Copy Markdown
Owner

@itcmsgr itcmsgr commented Apr 19, 2026

0. Executive Summary

This PR introduces the update apply phase as a strictly orchestration-only layer that delegates all state mutation, validation, recovery, and authority enforcement to the existing rebuild/lifecycle authority.

No new apply logic is introduced.

This PR enforces that:

  • Update apply is not an execution engine
  • Update apply is not a second authority path
  • Update apply is only a controlled entrypoint into rebuild

1. Core Contract (Hard Gate)

Invariant:

Update apply may only orchestrate payload application through the existing rebuild/lifecycle authority.

Explicitly:

  • Apply MUST NOT implement its own:

    • configuration mutation
    • rendering logic
    • authority takeover
    • validation logic
    • recovery/rollback logic
    • service convergence logic
  • Apply MUST:

    • call existing lifecycle/rebuild entrypoints
    • rely on existing validator gates
    • rely on existing recovery/rollback mechanisms
    • converge state through lifecycle authority only

2. Enforced Invariants

ID Invariant Enforcement
INV-U-001 No custom apply logic outside rebuild Call graph + diff audit
INV-U-002 No partial switch without recovery Mandatory rebuild recovery path
INV-U-003 No hidden authority expansion Authority checks + post-state validation

Violation of any invariant = automatic NO-GO


3. Scope (G3-U5..U10)

G3-U5 — .conf.local Byte Preservation

  • Apply MUST NOT modify .conf.local
  • Verified via pre/post byte hash equality; rebuild path only reads, never mutates

G3-U6 — Base Config Safety

  • No unsafe mutation of base config
  • All config transitions only via existing template → render → rebuild pipeline

G3-U7 — Rebuild Recovery Integration (INV-U-002)

  • Any failure during apply MUST trigger existing rebuild recovery/rollback
  • Never leave partial state active

G3-U8 — Post-Update Validator Gate

  • After apply: nftban-validate MUST run; invalid state MUST block success; recovery MUST be triggered on failure

G3-U9 — Service-State Convergence

  • Final state MUST converge to lifecycle authority truth: nftables active, nftband state aligned, kernel == validator == CLI

G3-U10 — Authority Safety (INV-U-003)

  • Apply MUST NOT expand authority scope, remove/disable external firewalls outside allowed model, or introduce new takeover paths

4. Non-Goals (Strictly Out of Scope)

  • ❌ Any update-specific apply engine
  • ❌ Direct config file mutation
  • ❌ Alternative rendering pipeline
  • ❌ Independent validation logic
  • ❌ Custom rollback/recovery logic
  • ❌ Authority takeover or firewall removal logic
  • ❌ Service manipulation outside lifecycle authority

If any of the above becomes necessary → STOP and split into new design PR


5. Call Graph (Critical Proof)

update_apply
   ↓
lifecycle entrypoint (existing)
   ↓
rebuild pipeline (existing)
   ↓
validator (existing)
   ↓
recovery (existing, if needed)

There are NO parallel or side paths.

Concrete binding (today):

  • runUpdateApplyexec.Run("nftban", "firewall", "rebuild")firewall_rebuild (cli/lib/nftban/cli/cmd_firewall.sh:1070)
  • → exec.Run("nftban-validate", "--json") → validator gate (cmd/nftban-validate)
  • Recovery: delegated to firewall_rebuild's own _rebuild_* family (nftban_rebuild_recovery.sh) — PR-18 adds NO new recovery code

6. Execution Flow

  1. Pre-check (read-only) — update.Preflight
  2. Trigger lifecycle/rebuild entrypoint — nftban firewall rebuild
  3. Rebuild executes: render → apply to kernel → enforce authority
  4. Validator gate runs — nftban-validate --json
  5. If failure: recovery/rollback via rebuild (not owned by apply)
  6. If success: service state converges to lifecycle truth

7. Merge Evidence Requirements (Hard Checklist)

PR cannot merge without ALL of the following:

A. Structural Proof

  • Call graph shows no apply-owned mutation path
  • No new renderer/config writer introduced
  • No new authority manipulation logic

B. .conf.local Preservation (G3-U5)

  • Byte hash identical pre/post apply

C. Base Config Safety (G3-U6)

  • No direct edits outside rebuild pipeline

D. Recovery Integrity (G3-U7)

  • Forced failure test triggers rollback via rebuild
  • No partial state remains active

E. Validator Gate (G3-U8)

  • Invalid state blocks success
  • Validator result matches kernel truth

F. Service Convergence (G3-U9)

  • nftables/nftband consistent post-apply
  • No drift between kernel / validator / CLI

G. Authority Safety (G3-U10)

  • No unintended firewall removal
  • No authority expansion detected

8. CI Gates (Blocking)

  • G3-U5 → .conf.local preservation test
  • G3-U6 → config mutation audit
  • G3-U7 → recovery failure simulation
  • G3-U8 → validator enforcement
  • G3-U9 → convergence validation
  • G3-U10 → authority safety check

All must be green and enforced (blocking)


9. Failure Model

Failure Type Expected Behavior
Rebuild fails rollback via lifecycle recovery
Validator fails block + rollback
Partial apply must not exist
Authority conflict must be detected and blocked

10. Risk Assessment

Risk Mitigation
Hidden apply logic creep invariant enforcement + diff audit
Partial state activation recovery integration (INV-U-002)
Authority drift validator + authority checks
CI false positives validator as source of truth

11. Definition of Done

PR-18 is DONE only if:

  • Update apply is provably orchestration-only
  • Rebuild remains the single mutation authority
  • Validator is the single truth gate
  • Recovery is always enforced
  • No invariant is weakened or bypassed

12. Final Statement

This PR establishes that:

Update apply is not a system that changes state —
it is a controlled mechanism to request the existing system to do so safely.

Any deviation from this model invalidates the PR.


13. Stop Condition

If at any point implementation requires:

  • new apply logic
  • bypass of rebuild
  • direct mutation
  • custom recovery

→ STOP PR-18 immediately and escalate to design/audit.


Implementation plan (commit-by-commit)

  1. Contract + proof skeletoninternal/installer/update/apply_contract.md + call-path audit test (this PR's seed commit already landed)
  2. Minimal orchestrationrunUpdateApply that invokes nftban firewall rebuild + nftban-validate and nothing else
  3. Recovery/validator integration — delegate apply failures to rebuild's own recovery; validator gate blocks bad post-state
  4. Convergence + authority proofs + CI gates G3-U5..U10

Status: Draft — standard v1.99 pattern, un-drafts after CI green. Each commit keeps the PR diff auditable.

🤖 Generated with Claude Code

This commit establishes the PR-18 contract BEFORE any implementation code
lands. Per user directive, PR-18 is judged by call-path purity, not by
"it works in happy path."

Pinned sentence (repeated in PR body + contract file + package doc):

  "PR-18 is orchestration-only: update apply may invoke the existing
   rebuild/lifecycle authority, but may not implement any independent
   apply, mutation, recovery, validation, or authority-taking behavior."

Contents:
  - Call graph (runUpdateApply → firewall_rebuild → validator → recovery)
  - Canonical entry points PR-18 orchestrates (never reimplements)
  - 8 forbidden patterns (automatic NO-GO)
  - Explicit stop condition: new apply/mutation/recovery/authority
    logic → STOP and split to new PR

No code, no behaviour change. This is the contract layer; step 1
(tests/proof skeleton) lands in the next commit, step 2 (minimal
orchestration wiring) after that.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 19, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

itcmsgr and others added 4 commits April 19, 2026 19:42
…purity)

Step 1 of PR-18 per user's implementation ordering. These tests land
BEFORE runUpdateApply itself so the invariants are enforced from the
first implementation commit onward.

Contract surface:
  - applyWhitelist: the complete set of commands runUpdateApply may invoke.
    Preflight probes (read-only) + canonical rebuild entry
    (`nftban firewall rebuild`) + validator gate (`nftban-validate --json`)
    + read-only post-state inspection (nft list, systemctl is-active).
    Adding to this list requires a corresponding apply_contract.md update.

  - applyForbiddenSubstrings: categories that must never appear in
    recorded commands — direct kernel mutation (nft add/flush/delete),
    service lifecycle (systemctl stop/disable/mask), package removal,
    external firewall touches (ufw, iptables).

  - applyForbiddenWritePaths: file-system destinations apply may never
    write — /etc/nftban/**, /usr/lib/nftban/**, /usr/sbin/nftban*.
    Plus explicit G3-U5 rule: .conf.local byte-preservation.

Harness functions:
  - auditRecordedCommands(cmds) — flags whitelist violations + forbidden
    substrings. Returns list of violation strings (empty = clean).
  - auditWrittenFiles(paths) — flags write-path violations + .conf.local
    touches.

Self-tests:
  - happy-path command trace produces zero violations
  - direct kernel mutation (nft add table) detected
  - unknown commands (curl) rejected
  - .conf.local write detected (G3-U5)
  - /etc/nftban write detected
  - /usr/sbin/nftban write detected

When step 2 lands runUpdateApply, its MockExecutor trace will be piped
through auditRecordedCommands + auditWrittenFiles to prove INV-U-001/002/003
mechanically, not just by reading the code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
runUpdateApply is a thin sequencer over the canonical rebuild +
validator entrypoints. No custom apply, no retry, no rollback, no
recovery, no authority decisions. Validator is the truth gate —
rebuild success + validator failure = apply failure.

Files:
  - cmd/nftban-installer/update_apply.go (NEW) — runUpdateApply + read-only
    postStateInspection + truncate helper. Exit-code contract explicit:
    each phase's failure propagates without merging so monitoring can
    distinguish rebuild-fail from validator-fail by exit + log phase.
  - cmd/nftban-installer/update_apply_test.go (NEW) — T1 happy path,
    T2 preflight-fail blocks rebuild, T3 rebuild-fail blocks validator,
    T4 G3-U8 truth gate (validator fail overrides rebuild success),
    T5 call-path purity on ALL branches, T6 G3-U5 .conf.local untouched.
  - cmd/nftban-installer/main.go — narrow dispatch: --mode=upgrade
    without --rpm/--deb routes to runUpdateApply. Package postinst
    paths (packaging/deb/postinst, RPM spec) always pass --rpm/--deb,
    so they continue through runInstall as today.
  - internal/installer/update/apply_contract.go (NEW) — exported
    ApplyWhitelist + ApplyForbidden* + AuditRecordedCommands +
    AuditWrittenFiles. The main-package tests pipe their MockExecutor
    traces through these so the exact same rules apply to both
    surfaces.
  - internal/installer/update/apply_contract_test.go — rewritten as
    self-tests for the exported harness.

Canonical entry points invoked (whitelist-enforced):
  - update.Preflight (read-only, PR-16/PR-17)
  - "nftban firewall rebuild" (mutation path — delegates to v1.96
    firewall_rebuild in cli/lib/nftban/cli/cmd_firewall.sh:1070)
  - "nftban-validate --json" (truth gate)
  - exec.NftTableExists (read-only kernel inspection)
  - exec.ServiceActive (read-only service inspection)

Forbidden by construction (AuditRecordedCommands / AuditWrittenFiles):
  - direct kernel mutation (nft add/flush/delete)
  - service lifecycle (systemctl stop/disable/mask)
  - package removal (apt-get/dnf remove)
  - external firewall touches (ufw, iptables)
  - writes to /etc/nftban/**, /usr/lib/nftban/**, /usr/sbin/nftban*
  - any write to *.conf.local (G3-U5)

INV-U-001/002/003 enforced mechanically — not just by reading code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Step 3 (recovery/validator proofs):
  - cmd/nftban-installer/update_apply_test.go:
    - T7 TestUpdateApply_RebuildFail_NoRetryNoRecovery — rebuild is
      invoked EXACTLY once; no recovery-flavored command appears in
      the trace after rebuild fails (firewall_rebuild owns retry per
      v1.96; apply must NOT add another retry layer)
    - T8 TestUpdateApply_DoesNotReinterpretValidatorOutput — when
      validator exits 2 but returns JSON body with status="protected",
      apply must honour the EXIT CODE and fail. Locks the truth-gate
      discipline against the common regression class where a later
      change "helpfully" inspects the JSON body to override the exit

Step 4 (CI gates G3-U5..U10):
  - .github/workflows/ci-update-canonization.yml:
    - New step "structural call-path audit of update_apply.go" — greps
      the runUpdateApply source for 13 forbidden patterns (direct nft
      mutation, service lifecycle, package removal, external firewall
      touch, /etc/nftban/** writes, /usr/lib/nftban/** writes,
      .conf.local touches). Fails fast BEFORE runtime tests, so a
      reviewer sees the violation in the first CI failure.
    - New step "unit tests for runUpdateApply call-path purity" — runs
      go test ./internal/installer/update/... ./cmd/nftban-installer/...
      which exercises every runUpdateApply trace through
      AuditRecordedCommands + AuditWrittenFiles under happy,
      rebuild-fail, validator-fail, no-retry, no-reinterpretation,
      and conf.local byte-preservation branches.

Together these close G3-U5 (.conf.local byte-preservation), G3-U6
(base-config safety via forbidden-write-path audit), G3-U7 (rebuild
recovery integration — delegation, not duplication), G3-U8
(post-update validator gate — truth discipline), G3-U9 (service-state
convergence via postStateInspection read-only checks), G3-U10
(authority safety via forbidden systemctl/ufw/iptables patterns).

Every sub-gate is mechanical: if apply ever grows a direct mutation
surface, CI catches it in the structural grep before the unit tests
even run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code-review response to PR #475. All changes are local to PR-18; no new
enum, no taxonomy redesign, no scope drift.

Blocker #1 — state/exit contradiction on validator failure:
  Validator-fail branch previously hard-coded sf.Transition(StateDegraded)
  regardless of validator's actual exit code. A validator exit 2 (DOWN)
  was persisted as DEGRADED, so sf.State.ExitCode() returned 1 while the
  process exit was 2 — silent truth split.

  Fix: new local helper stateForValidatorExit(rc) in update_apply.go:
      rc == 1  →  StateDegraded      (post-state valid-enough for degraded)
      rc >= 2  →  StateFailedRebuild (stronger failure not collapsed)

  Depends ONLY on validator's process exit code. No JSON parsing. No new
  InstallState enum value. State.ExitCode() now matches the returned
  process exit by construction.

  Tests:
    T9  validator rc=1 → StateDegraded,     state↔exit both = 1
    T10 validator rc=2 → StateFailedRebuild, state↔exit both = 2
    T11 stateForValidatorExit exhaustive mapping (rc 1/2/3/127)

Blocker #2 — contract audit missed preflight-fail branch:
  T5 TestUpdateApply_CallPathPurity_AllBranches only covered
  happy/rebuild-fail/validator-fail. A non-whitelisted command or
  forbidden write slipping into the preflight-fail path was unaudited.

  Fix: add {"preflight-fail", delete ip:nftban mock} to T5's branches.
  Now every runUpdateApply exit path pipes through AuditRecordedCommands
  + AuditWrittenFiles.

Minor #1 — post-state inspection success-path-only documented:
  update_apply.go header now explicitly states that step 4
  (postStateInspection) runs on success path only and that earlier
  failure branches return before it. Closes the contract/docs drift
  the reviewer flagged.

Minor #2 — truncate byte/rune mismatch:
  Previous implementation used len(s) + s[:n] on a string, risking
  cutting multi-byte UTF-8 characters mid-codepoint. Now converts to
  []rune before slicing.

Minor #3 — structural guardrail broadened:
  ApplyForbiddenSubstrings + CI grep previously banned only
  systemctl stop/disable/mask. Now bans the full service-lifecycle
  surface: stop/start/restart/reload/enable/disable/mask/unmask.
  Future-proofs against "helpful" service manipulation creep even if
  the specific verb changes.

Test-infra: tests now use t.TempDir() instead of /var/lib/nftban/state
  for sf.StateDir so Transition's WriteAtomic doesn't fail under
  unprivileged CI runners.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@itcmsgr itcmsgr marked this pull request as ready for review April 19, 2026 17:14
@itcmsgr itcmsgr merged commit 22f8fb4 into main Apr 19, 2026
54 checks passed
@itcmsgr itcmsgr deleted the feat/v1.99-pr18-update-apply branch April 19, 2026 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant