Skip to content

feat(v1.100 PR-P2-3): kernel + service snapshot CI gate (G3-KS-SNAPSHOT)#487

Merged
itcmsgr merged 1 commit intomainfrom
fix/v1.100-pr-p2-3-kernel-service-snapshot-ci
Apr 20, 2026
Merged

feat(v1.100 PR-P2-3): kernel + service snapshot CI gate (G3-KS-SNAPSHOT)#487
itcmsgr merged 1 commit intomainfrom
fix/v1.100-pr-p2-3-kernel-service-snapshot-ci

Conversation

@itcmsgr
Copy link
Copy Markdown
Owner

@itcmsgr itcmsgr commented Apr 20, 2026

Pre-PR-23 assurance blocker #3 of 5 remaining. Adds system-level before/after truth checks to every dry-run CI path so a future regression that mutates nftables tables or firewall service states without touching tracked files is caught at gate time.

Scope (locked per authorization 2026-04-20)

  • ✅ Before/after `nft list tables` sorted diff — hard-assert equal
  • ✅ Before/after `systemctl is-active` for the 6 lifecycle-relevant units (nftband + 5 external firewalls) — hard-assert equal
  • ✅ No `|| true` on the diff checks
  • ✅ Covered paths: install dry-run refusal, update dry-run, uninstall dry-run

Not in scope:

  • ❌ NO code-path redesign
  • ❌ NO strace/exec tracing yet (deferred to PR-P2-4)
  • ❌ NO mutation behavior changes
  • ❌ NO new firewall-unit additions to the signal set

Implementation

NEW helper: `scripts/ci-snapshot-kernel-service.sh`

Reusable snapshot emitter. Single source of truth for what "kernel + service state" means across all three canonization gates. Contract:

  • purely read-only probes (`nft list tables`, `systemctl is-active`)
  • never invokes nft/systemctl with mutation verbs
  • never writes to the filesystem
  • exit 0 always — caller decides whether differences fail
  • degrades gracefully when nft or systemctl aren't available (e.g. almalinux-9 container without systemd) — both sides return the same placeholder, diff empty

EXTENDED workflows (3 files)

Workflow Gate Integration
`ci-install-canonization.yml` `G3-IN-REFUSE-DRY-RUN` Snapshot before refuse-dry-run, hard-diff after
`ci-update-canonization.yml` `G3-U3` Snapshot before dry-run, hard-diff after (layered on top of existing filesystem + history + install_state hard asserts)
`ci-uninstall-canonization.yml` `G3-UN-PLAN-RENDERS` Snapshot before dry-run, hard-diff after

Monitored units

Kept in lockstep with `internal/installer/extfw/detect.go` — CI and production code agree on "what counts as a firewall service":

```
nftband.service
ufw.service
firewalld.service
csf.service
lfd.service
iptables.service
```

Falsifiability proof

Could a regression pass this gate? No. Specifically:

Hypothetical regression Caught by
Dry-run calls `nft flush table ip nftban` Kernel snapshot diff — table content changes
Dry-run calls `systemctl stop nftband` Service snapshot diff — nftband.service state changes from active to inactive
Dry-run spawns a shell helper that touches /etc/csf via `csf -x` Service snapshot diff — csf.service may deactivate mid-run (also caught by existing filesystem snapshot)

Also: tracking update

Marks blocker #2 (external-firewall detection unification, PR #486 / `49d98fc1`) as LANDED in the contract blocker table. Remaining: 4 Phase 2 PRs before PR-23 (P2-4 exec-trace CI, P2-5 auto-elevate shim removal gate, P2-6 payload integrity).

Test plan

  • `Build & Test` green — no Go code changes so no regression risk
  • `ci-install-canonization` matrix green — snapshot wraps the refuse-dry-run
  • `ci-update-canonization` matrix green — snapshot wraps G3-U3
  • `ci-uninstall-canonization` matrix green — snapshot wraps G3-UN-PLAN-RENDERS
  • Snapshot is stable across repeated calls on the same host state (deterministic sort)

🤖 Generated with Claude Code

Pre-PR-23 assurance blocker #3 of 5 remaining. Adds system-level
before/after truth checks to every dry-run CI path so a future
regression that mutates nftables tables or firewall service states
without touching tracked files is caught at gate time.

## Scope (locked per authorization 2026-04-20)

- Before/after `nft list tables` sorted diff — hard-assert equal
- Before/after `systemctl is-active` for the 6 lifecycle-relevant
  units (nftband + 5 external firewalls) — hard-assert equal
- No `|| true` on the diff checks
- Covered paths: install dry-run refusal, update dry-run, uninstall
  dry-run (explicit + implicit)

## Implementation

- NEW: scripts/ci-snapshot-kernel-service.sh — reusable helper that
  emits a stable, sorted snapshot. Degrades gracefully (both sides
  return the same placeholder) when nft or systemctl aren't available
  (e.g. almalinux-9 container without systemd). Contract is:
    * purely read-only probes
    * never invokes nft/systemctl with mutation verbs
    * never writes to the filesystem
    * exit 0 always — caller decides whether differences fail
- EXTENDED: all 3 canonization workflows
    * ci-install-canonization.yml / G3-IN-REFUSE-DRY-RUN
    * ci-update-canonization.yml / G3-U3
    * ci-uninstall-canonization.yml / G3-UN-PLAN-RENDERS
  Each takes a snapshot before the dry-run invocation and hard-
  asserts byte-identical equality after.

## Monitored units (must match extfw.Detect's signal set)

  nftband.service
  ufw.service
  firewalld.service
  csf.service
  lfd.service
  iptables.service

Kept in lockstep with internal/installer/extfw/detect.go so CI and
production code agree on "what counts as a firewall service."

## Non-goals (scope-lock)

- NO code-path redesign
- NO strace/exec tracing yet (deferred to PR-P2-4)
- NO mutation behavior changes
- NO new firewall-unit additions to the signal set

## Also: tracking update

Marks blocker #2 (external-firewall detection unification, PR #486 /
49d98fc) as LANDED in the contract blocker table. Remaining: 4
Phase 2 PRs before PR-23.

Refs: internal/installer/uninstall/contract.md §"Pre-PR-23 blockers"
Authorization: locked Phase 2 sequencing (2026-04-20)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@itcmsgr itcmsgr merged commit c111602 into main Apr 20, 2026
60 checks passed
@itcmsgr itcmsgr deleted the fix/v1.100-pr-p2-3-kernel-service-snapshot-ci branch April 20, 2026 07:01
itcmsgr added a commit that referenced this pull request Apr 20, 2026
…awning purity

Pre-PR-23 assurance blocker #4 of 4 remaining. Adds process-spawn-level
truth check to every dry-run CI path. Wraps the dry-run binary under
`strace -f -e trace=execve`, captures every process spawn, and fails
the gate if any forbidden mutation binary was invoked — regardless of
whether the call was statically visible in the source (grep-based) or
dynamically constructed (only visible at runtime).

## Layered defense — what this adds on top of the existing gates

| Layer | Catches |
|---|---|
| `G3-UN-NO-MUTATION` structural grep | Source code with forbidden patterns |
| `G3-U3` / filesystem snapshot | Writes to `/var/lib/nftban/` or `/etc/nftban/` |
| `G3-KS-SNAPSHOT` (PR-P2-3) | Kernel nft table / service state changes |
| **`G3-EXEC-TRACE` (this PR)** | **ANY forbidden binary spawning, even if it made no syscalls we could observe (e.g. rm was executed but the file was already absent)** |

The classes complement each other. A regression that somehow avoided
filesystem AND kernel-state changes but still forked a forbidden
binary now fails at the execve syscall boundary.

## Implementation

- NEW: `scripts/ci-exec-trace-assert.sh <command args...>` — wraps
  its argv under strace, propagates exit code unchanged, fails if any
  FORBIDDEN pattern matches in the trace. Contract:
    * read-only on the system (only syscall it observes is execve)
    * exit 0 if wrapped command succeeded AND no forbidden spawns
    * exit != 0 if EITHER the wrapped command failed OR forbidden spawns detected
    * graceful degrade: if strace unavailable, wrapped command runs
      with a CI warning (never silently weakens)

- Added `strace` to the dependency install step of all 3 canonization
  workflows (apt-get for ubuntu-24.04, dnf for almalinux-9 container).

- Wired into 3 gates:
    * `ci-install-canonization.yml / G3-IN-REFUSE-DRY-RUN` — wraps the
      refuse-dry-run invocation; refusal must exit before spawning
      anything forbidden
    * `ci-update-canonization.yml / G3-U3` — wraps the update dry-run;
      purity assertion layered on top of filesystem + KS snapshots
    * `ci-uninstall-canonization.yml / G3-UN-PLAN-RENDERS` — wraps the
      uninstall dry-run; uninstall is the most scope-sensitive mode,
      so exec-trace purity here is the strongest PR-23 precondition

## Forbidden patterns (strace execve regex)

- nft with mutation verbs: `add|create|delete|flush` (list/save are
  read-only and allowed)
- systemctl lifecycle verbs anywhere in argv: `start|stop|restart|
  reload|enable|disable|mask|unmask` (is-active/is-enabled/status/show
  are read-only and allowed)
- External firewall binaries — any invocation: `ufw`, `firewall-cmd`,
  `iptables-restore`, `ip6tables-restore`
- CSF with destructive flags: `-e`, `-x`, `--enable`, `--disable`
- Package-manager mutation: `apt-get remove|purge`, `dnf remove|erase`,
  `rpm -e`, `dpkg --remove|--purge`
- User/group deletion: `userdel`, `groupdel`

Each pattern targets a specific execve shape. A match = gate failure.
The dry-run paths must be observational at the process-spawning level,
not just the Go-function-call level.

## Non-goals (scope-lock per authorization 2026-04-20)

- NO shell refactor
- NO command-wrapper redesign
- NO broad syscall observability platform
- NO inference beyond "spawned or not spawned"
- NO new protection surfaces beyond exec-trace on dry-run

## Also: tracking update

Marks blocker #3 (kernel/service snapshot CI gate, PR #487) as LANDED.
Remaining pre-PR-23 blockers: 3 (this PR = P2-4, plus P2-5 auto-elevate
shim removal gate + P2-6 payload integrity).

Refs: internal/installer/uninstall/contract.md §"Pre-PR-23 blockers"
Authorization: locked Phase 2 sequencing (2026-04-20)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
itcmsgr added a commit that referenced this pull request Apr 20, 2026
…awning purity (#488)

Pre-PR-23 assurance blocker #4 of 4 remaining. Adds process-spawn-level
truth check to every dry-run CI path. Wraps the dry-run binary under
`strace -f -e trace=execve`, captures every process spawn, and fails
the gate if any forbidden mutation binary was invoked — regardless of
whether the call was statically visible in the source (grep-based) or
dynamically constructed (only visible at runtime).

## Layered defense — what this adds on top of the existing gates

| Layer | Catches |
|---|---|
| `G3-UN-NO-MUTATION` structural grep | Source code with forbidden patterns |
| `G3-U3` / filesystem snapshot | Writes to `/var/lib/nftban/` or `/etc/nftban/` |
| `G3-KS-SNAPSHOT` (PR-P2-3) | Kernel nft table / service state changes |
| **`G3-EXEC-TRACE` (this PR)** | **ANY forbidden binary spawning, even if it made no syscalls we could observe (e.g. rm was executed but the file was already absent)** |

The classes complement each other. A regression that somehow avoided
filesystem AND kernel-state changes but still forked a forbidden
binary now fails at the execve syscall boundary.

## Implementation

- NEW: `scripts/ci-exec-trace-assert.sh <command args...>` — wraps
  its argv under strace, propagates exit code unchanged, fails if any
  FORBIDDEN pattern matches in the trace. Contract:
    * read-only on the system (only syscall it observes is execve)
    * exit 0 if wrapped command succeeded AND no forbidden spawns
    * exit != 0 if EITHER the wrapped command failed OR forbidden spawns detected
    * graceful degrade: if strace unavailable, wrapped command runs
      with a CI warning (never silently weakens)

- Added `strace` to the dependency install step of all 3 canonization
  workflows (apt-get for ubuntu-24.04, dnf for almalinux-9 container).

- Wired into 3 gates:
    * `ci-install-canonization.yml / G3-IN-REFUSE-DRY-RUN` — wraps the
      refuse-dry-run invocation; refusal must exit before spawning
      anything forbidden
    * `ci-update-canonization.yml / G3-U3` — wraps the update dry-run;
      purity assertion layered on top of filesystem + KS snapshots
    * `ci-uninstall-canonization.yml / G3-UN-PLAN-RENDERS` — wraps the
      uninstall dry-run; uninstall is the most scope-sensitive mode,
      so exec-trace purity here is the strongest PR-23 precondition

## Forbidden patterns (strace execve regex)

- nft with mutation verbs: `add|create|delete|flush` (list/save are
  read-only and allowed)
- systemctl lifecycle verbs anywhere in argv: `start|stop|restart|
  reload|enable|disable|mask|unmask` (is-active/is-enabled/status/show
  are read-only and allowed)
- External firewall binaries — any invocation: `ufw`, `firewall-cmd`,
  `iptables-restore`, `ip6tables-restore`
- CSF with destructive flags: `-e`, `-x`, `--enable`, `--disable`
- Package-manager mutation: `apt-get remove|purge`, `dnf remove|erase`,
  `rpm -e`, `dpkg --remove|--purge`
- User/group deletion: `userdel`, `groupdel`

Each pattern targets a specific execve shape. A match = gate failure.
The dry-run paths must be observational at the process-spawning level,
not just the Go-function-call level.

## Non-goals (scope-lock per authorization 2026-04-20)

- NO shell refactor
- NO command-wrapper redesign
- NO broad syscall observability platform
- NO inference beyond "spawned or not spawned"
- NO new protection surfaces beyond exec-trace on dry-run

## Also: tracking update

Marks blocker #3 (kernel/service snapshot CI gate, PR #487) as LANDED.
Remaining pre-PR-23 blockers: 3 (this PR = P2-4, plus P2-5 auto-elevate
shim removal gate + P2-6 payload integrity).

Refs: internal/installer/uninstall/contract.md §"Pre-PR-23 blockers"
Authorization: locked Phase 2 sequencing (2026-04-20)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant