Skip to content

feat(v1.100 PR-P2-4): exec-trace CI gate (G3-EXEC-TRACE) — process-spawning purity#488

Merged
itcmsgr merged 1 commit intomainfrom
fix/v1.100-pr-p2-4-exec-trace-ci
Apr 20, 2026
Merged

feat(v1.100 PR-P2-4): exec-trace CI gate (G3-EXEC-TRACE) — process-spawning purity#488
itcmsgr merged 1 commit intomainfrom
fix/v1.100-pr-p2-4-exec-trace-ci

Conversation

@itcmsgr
Copy link
Copy Markdown
Owner

@itcmsgr itcmsgr commented Apr 20, 2026

Pre-PR-23 assurance blocker #4 of 4 remaining. Adds process-spawn-level truth check to every dry-run CI path. Wraps the dry-run binary under `strace -f -e trace=execve`, captures every process spawn, fails the gate if any forbidden mutation binary was invoked — regardless of whether the call was statically visible in the source (grep-based) or dynamically constructed (only visible at runtime).

Why

The existing gates cover different layers of purity:

Layer Catches
`G3-UN-NO-MUTATION` structural grep Source code with forbidden patterns
`G3-U3` / filesystem snapshot Writes under `/var/lib/nftban/` or `/etc/nftban/`
`G3-KS-SNAPSHOT` (PR-P2-3) Kernel nft table / service state changes
`G3-EXEC-TRACE` (this PR) ANY forbidden binary spawning at all — even dynamically-constructed invocations that grep cannot see

The classes are additive. A regression that somehow avoided filesystem AND kernel-state changes but still forked a forbidden binary now fails at the execve syscall boundary.

Scope (locked per authorization 2026-04-20)

  • ✅ strace-based wrap of dry-run paths
  • ✅ fail on forbidden spawned mutators
  • ✅ minimal implementation — one shell helper, focused regex list
  • ✅ graceful degrade when strace unavailable (warning, not silent skip)

Not in scope:

  • ❌ NO shell refactor
  • ❌ NO command-wrapper redesign
  • ❌ NO broad syscall observability platform
  • ❌ NO inference beyond "spawned or not spawned"

Implementation

NEW: `scripts/ci-exec-trace-assert.sh `

Wraps its argv under strace, propagates exit code unchanged, fails if any FORBIDDEN pattern matches in the trace. Contract:

  • read-only on the system (only syscall it observes is `execve`)
  • exit 0 if wrapped command succeeded AND no forbidden spawns
  • exit != 0 if EITHER the wrapped command failed OR forbidden spawns detected
  • graceful degrade: if strace unavailable, wrapped command runs with a CI warning

Wired into 3 gates

Workflow Gate Role
`ci-install-canonization.yml` `G3-IN-REFUSE-DRY-RUN` Refusal must exit before spawning anything forbidden
`ci-update-canonization.yml` `G3-U3` Purity assertion layered on top of filesystem + KS snapshots
`ci-uninstall-canonization.yml` `G3-UN-PLAN-RENDERS` Uninstall is the most scope-sensitive mode — strongest PR-23 precondition

Forbidden patterns (strace execve regex)

  • nft with mutation verbs: `add | create | delete | flush` (list/save allowed)
  • systemctl lifecycle verbs anywhere in argv: `start | stop | restart | reload | enable | disable | mask | unmask` (is-active/is-enabled/status/show allowed)
  • External firewall binaries — any invocation: `ufw`, `firewall-cmd`, `iptables-restore`, `ip6tables-restore`
  • CSF with destructive flags: `-e`, `-x`, `--enable`, `--disable`
  • Package-manager mutation: `apt-get remove|purge`, `dnf remove|erase`, `rpm -e`, `dpkg --remove|--purge`
  • User/group deletion: `userdel`, `groupdel`

Each pattern targets a specific execve shape. Match = gate failure.

Also: tracking update

Marks blocker #3 (kernel/service snapshot CI gate, PR #487) as LANDED. Remaining pre-PR-23 blockers: 3 (this PR = P2-4, plus P2-5 auto-elevate shim removal gate + P2-6 payload integrity).

Test plan

  • `Build & Test` green — no Go code changes
  • `ci-install-canonization` matrix green — refuse-dry-run traced, no forbidden spawns
  • `ci-update-canonization` matrix green — dry-run traced, no forbidden spawns
  • `ci-uninstall-canonization` matrix green — dry-run traced, no forbidden spawns
  • strace installs cleanly on both runner types
  • Falsifiability: if a forbidden execve were deliberately introduced, gate fails (validated by the regex specificity — each pattern targets a specific execve line shape)

🤖 Generated with Claude Code

…awning purity

Pre-PR-23 assurance blocker #4 of 4 remaining. Adds process-spawn-level
truth check to every dry-run CI path. Wraps the dry-run binary under
`strace -f -e trace=execve`, captures every process spawn, and fails
the gate if any forbidden mutation binary was invoked — regardless of
whether the call was statically visible in the source (grep-based) or
dynamically constructed (only visible at runtime).

## Layered defense — what this adds on top of the existing gates

| Layer | Catches |
|---|---|
| `G3-UN-NO-MUTATION` structural grep | Source code with forbidden patterns |
| `G3-U3` / filesystem snapshot | Writes to `/var/lib/nftban/` or `/etc/nftban/` |
| `G3-KS-SNAPSHOT` (PR-P2-3) | Kernel nft table / service state changes |
| **`G3-EXEC-TRACE` (this PR)** | **ANY forbidden binary spawning, even if it made no syscalls we could observe (e.g. rm was executed but the file was already absent)** |

The classes complement each other. A regression that somehow avoided
filesystem AND kernel-state changes but still forked a forbidden
binary now fails at the execve syscall boundary.

## Implementation

- NEW: `scripts/ci-exec-trace-assert.sh <command args...>` — wraps
  its argv under strace, propagates exit code unchanged, fails if any
  FORBIDDEN pattern matches in the trace. Contract:
    * read-only on the system (only syscall it observes is execve)
    * exit 0 if wrapped command succeeded AND no forbidden spawns
    * exit != 0 if EITHER the wrapped command failed OR forbidden spawns detected
    * graceful degrade: if strace unavailable, wrapped command runs
      with a CI warning (never silently weakens)

- Added `strace` to the dependency install step of all 3 canonization
  workflows (apt-get for ubuntu-24.04, dnf for almalinux-9 container).

- Wired into 3 gates:
    * `ci-install-canonization.yml / G3-IN-REFUSE-DRY-RUN` — wraps the
      refuse-dry-run invocation; refusal must exit before spawning
      anything forbidden
    * `ci-update-canonization.yml / G3-U3` — wraps the update dry-run;
      purity assertion layered on top of filesystem + KS snapshots
    * `ci-uninstall-canonization.yml / G3-UN-PLAN-RENDERS` — wraps the
      uninstall dry-run; uninstall is the most scope-sensitive mode,
      so exec-trace purity here is the strongest PR-23 precondition

## Forbidden patterns (strace execve regex)

- nft with mutation verbs: `add|create|delete|flush` (list/save are
  read-only and allowed)
- systemctl lifecycle verbs anywhere in argv: `start|stop|restart|
  reload|enable|disable|mask|unmask` (is-active/is-enabled/status/show
  are read-only and allowed)
- External firewall binaries — any invocation: `ufw`, `firewall-cmd`,
  `iptables-restore`, `ip6tables-restore`
- CSF with destructive flags: `-e`, `-x`, `--enable`, `--disable`
- Package-manager mutation: `apt-get remove|purge`, `dnf remove|erase`,
  `rpm -e`, `dpkg --remove|--purge`
- User/group deletion: `userdel`, `groupdel`

Each pattern targets a specific execve shape. A match = gate failure.
The dry-run paths must be observational at the process-spawning level,
not just the Go-function-call level.

## Non-goals (scope-lock per authorization 2026-04-20)

- NO shell refactor
- NO command-wrapper redesign
- NO broad syscall observability platform
- NO inference beyond "spawned or not spawned"
- NO new protection surfaces beyond exec-trace on dry-run

## Also: tracking update

Marks blocker #3 (kernel/service snapshot CI gate, PR #487) as LANDED.
Remaining pre-PR-23 blockers: 3 (this PR = P2-4, plus P2-5 auto-elevate
shim removal gate + P2-6 payload integrity).

Refs: internal/installer/uninstall/contract.md §"Pre-PR-23 blockers"
Authorization: locked Phase 2 sequencing (2026-04-20)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@itcmsgr itcmsgr merged commit 4749fe1 into main Apr 20, 2026
60 checks passed
@itcmsgr itcmsgr deleted the fix/v1.100-pr-p2-4-exec-trace-ci branch April 20, 2026 07:11
itcmsgr added a commit that referenced this pull request Apr 20, 2026
Pre-PR-23 assurance blocker #5 of 2 remaining. Adds a CI gate that
fails when the uninstall auto-elevate shim and uninstall mutation
code coexist. This enforces that when PR-23 lands, the shim is
removed in the SAME PR — preventing the scaffold-era "safe by default"
UX from silently flipping meaning the moment real mutation lands.

## Rule

| shim_present | mutation_present | Result |
|:-:|:-:|---|
| 1 | 1 | **FAIL** — shim + mutation cannot coexist |
| 1 | 0 | PASS — PR-22/P2-x scaffold state |
| 0 | 1 | PASS — post-PR-23, shim correctly removed |
| 0 | 0 | PASS — trivially clean |

## Detection

**Shim detection** (`cmd/nftban-installer/flags.go`, grep for either):
  - "auto-elevated to --dry-run"
  - "NO MUTATION WILL OCCUR (v1.100 PR-22 scope)"

Two independent markers — if one is removed by refactor, the other
still fires. Ensures the gate doesn't silently stop working.

**Mutation detection** (`internal/installer/uninstall/*.go` +
`cmd/nftban-installer/uninstall_dryrun.go`, Go only, excluding tests):
  - nft mutation verbs (add/create/delete/flush)
  - systemctl lifecycle verbs (start/stop/restart/reload/enable/
    disable/mask/unmask) via exec.Run or Service* methods
  - External firewall binaries (ufw, firewall-cmd, iptables-restore)
  - Filesystem writers (WriteFileAtomic, os.WriteFile, Create,
    Remove, RemoveAll, MkdirAll, Rename)
  - State persistence (sf.Transition)

## Scope lock

- NO code changes (no shim removal, no mutation added)
- NO CLI redesign
- Pure detection gate — fires if the coupling appears, silent otherwise

## Two acceptable shim remediations at PR-23 time

1. **Delete** the auto-elevate block entirely (`--mode=uninstall`
   mutates unless `--dry-run` is explicit)
2. **Convert** to explicit refusal requiring `--dry-run` or
   `--confirm-mutation` (no silent default either way)

## Also: tracking update

Marks blocker #4 (exec-trace CI gate, PR #488) as LANDED. Remaining
pre-PR-23 blockers: 2 (this PR = P2-5, plus P2-6 payload integrity).

Contract doc updated with the full decision table + detection logic
in internal/installer/uninstall/contract.md.

Refs: internal/installer/uninstall/contract.md §"Pre-PR-23 blockers"
      + §"G3-UN-SHIM-LOCK (PR-P2-5) — how the gate decides"
      + §"Audit C regression note" (where the two remediations were
      originally committed to)
Authorization: locked Phase 2 sequencing (2026-04-20)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
itcmsgr added a commit that referenced this pull request Apr 20, 2026
#489)

Pre-PR-23 assurance blocker #5 of 2 remaining. Adds a CI gate that
fails when the uninstall auto-elevate shim and uninstall mutation
code coexist. This enforces that when PR-23 lands, the shim is
removed in the SAME PR — preventing the scaffold-era "safe by default"
UX from silently flipping meaning the moment real mutation lands.

## Rule

| shim_present | mutation_present | Result |
|:-:|:-:|---|
| 1 | 1 | **FAIL** — shim + mutation cannot coexist |
| 1 | 0 | PASS — PR-22/P2-x scaffold state |
| 0 | 1 | PASS — post-PR-23, shim correctly removed |
| 0 | 0 | PASS — trivially clean |

## Detection

**Shim detection** (`cmd/nftban-installer/flags.go`, grep for either):
  - "auto-elevated to --dry-run"
  - "NO MUTATION WILL OCCUR (v1.100 PR-22 scope)"

Two independent markers — if one is removed by refactor, the other
still fires. Ensures the gate doesn't silently stop working.

**Mutation detection** (`internal/installer/uninstall/*.go` +
`cmd/nftban-installer/uninstall_dryrun.go`, Go only, excluding tests):
  - nft mutation verbs (add/create/delete/flush)
  - systemctl lifecycle verbs (start/stop/restart/reload/enable/
    disable/mask/unmask) via exec.Run or Service* methods
  - External firewall binaries (ufw, firewall-cmd, iptables-restore)
  - Filesystem writers (WriteFileAtomic, os.WriteFile, Create,
    Remove, RemoveAll, MkdirAll, Rename)
  - State persistence (sf.Transition)

## Scope lock

- NO code changes (no shim removal, no mutation added)
- NO CLI redesign
- Pure detection gate — fires if the coupling appears, silent otherwise

## Two acceptable shim remediations at PR-23 time

1. **Delete** the auto-elevate block entirely (`--mode=uninstall`
   mutates unless `--dry-run` is explicit)
2. **Convert** to explicit refusal requiring `--dry-run` or
   `--confirm-mutation` (no silent default either way)

## Also: tracking update

Marks blocker #4 (exec-trace CI gate, PR #488) as LANDED. Remaining
pre-PR-23 blockers: 2 (this PR = P2-5, plus P2-6 payload integrity).

Contract doc updated with the full decision table + detection logic
in internal/installer/uninstall/contract.md.

Refs: internal/installer/uninstall/contract.md §"Pre-PR-23 blockers"
      + §"G3-UN-SHIM-LOCK (PR-P2-5) — how the gate decides"
      + §"Audit C regression note" (where the two remediations were
      originally committed to)
Authorization: locked Phase 2 sequencing (2026-04-20)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant