Skip to content

PR26.1: validate installed systemd payload before StateCommitted#528

Merged
itcmsgr merged 1 commit intomainfrom
feat/pr26.1-systemd-payload-validation
Apr 29, 2026
Merged

PR26.1: validate installed systemd payload before StateCommitted#528
itcmsgr merged 1 commit intomainfrom
feat/pr26.1-systemd-payload-validation

Conversation

@itcmsgr
Copy link
Copy Markdown
Owner

@itcmsgr itcmsgr commented Apr 29, 2026

Summary

Adds generic install validation for:

  • ExecStart path existence
  • timer/service pairing
  • payload inventory references
  • failed nftban unit detection
  • StateCommitted strict gating

No panel logic. No DirectAdmin logic. No authority classifier. No restore changes. No firewall mutation.

Invariants

ID Rule
SYSTEMD-EXECSTART-001 Every nftban systemd unit's ExecStart/ExecStartPre/ExecStartPost local executable path must exist on disk after payload staging.
SYSTEMD-TIMER-PAIR-001 Every installed nftban-*.timer must activate an installed nftban service unit. Implicit Unit= (basename .service) is inferred when the directive is absent.
PAYLOAD-INVENTORY-001 Every nftban-owned path referenced by an nftban systemd unit must belong to the staged payload inventory. System-binary prefixes (/bin/, /usr/bin/, etc.) are exempt.
FAILED-UNIT-POSTINSTALL-001 After install/update apply, no nftban-* unit may be in failed state. Fails closed if systemctl is unavailable or the failed-unit enumeration query errors.
STATECOMMITTED-STRICT-001 All four invariants are appended to RunAssertions results; the existing AllPassed gate at cmd/nftban-installer/phases.go:353,378 blocks StateCommitted automatically — no phases.go change required.

Architecture

  • internal/installer/validate/systemd_payload.go — pure validator (no I/O), parser, types.
  • internal/installer/validate/systemd_payload_gather.go — host adapter; reads unit dirs via os.ReadDir and queries systemctl list-units --state=failed through the executor.
  • internal/installer/validate/assertions.go — four new assertions sharing one gather call so the unit dirs and systemctl are not walked four times.

Read-only. No nft*, no Service{Start,Stop,Mask,Unmask,Enable,Disable}, no WriteFileAtomic, no Rename, no Remove, no os.Write*.

Bug coverage

The dns2 (2026-04-29) regressions that motivated this PR are now caught at install time:

  • nftban-unified-exporter.service ExecStart /usr/lib/nftban/exporters/nftban_unified_exporter.sh missing on disk → SYSTEMD-EXECSTART-001 + PAYLOAD-INVENTORY-001 fire.
  • nftban-metrics-exporter.timer orphaned (paired service absent) → SYSTEMD-TIMER-PAIR-001 fires.
  • nftban-core-geoip.service (status=2) / nftban-unified-exporter.service (203/EXEC) / nftban-maintenance.service (activating-loop) → FAILED-UNIT-POSTINSTALL-001 fires.

Three independent assertions catch the dns2 exporter bug. AllPassed returns false on the first; install drops to StateDegraded instead of false-passing StateCommitted.

Tests

41 tests in ./internal/installer/validate/..., all green on lab4. Coverage:

  • (a) valid service+timer pair
  • (b) MissingExecStart_FilesystemAbsent (dns2 regression shape, structural name)
  • (c) TimerOrphan_NoServicePair (dns2 regression shape, structural name)
  • timer with implicit Unit= inference
  • (d) nftban-owned path outside inventory
  • (e) failed unit injected; non-nftban failed units ignored
  • (f) /bin/sh -c and /usr/bin/env shell-wrapper variants — embedded nftban paths detected
  • (g) system-binary exemption table: /bin/sh, /usr/bin/sh, /bin/bash, /usr/bin/bash, /bin/env, /usr/bin/env, /bin/systemctl, /usr/bin/systemctl, /usr/bin/journalctl (9 sub-tests)
  • system-binary exemption boundary: /bin/sh -c '/usr/lib/nftban/.../missing.sh' still fails
  • pure-shell positive: /bin/sh -c 'echo ok' passes
  • IsNftbanUnit naming matrix: 19 cases covering nftban/nftband × service/timer/socket/target/path/mount plus negatives
  • systemd Exec prefixes (-, +, !, combos) stripped
  • backslash continuation lines joined; multi-line ExecStart end-to-end
  • empty / malformed unit file (4 cases) — no panic
  • symlink shape: resolved-symlink positive + broken-symlink negative
  • FailedUnitQueryError_FailsClosed — fail-closed when systemctl unavailable or query errors
  • non-nftban third-party units ignored

Lab proof (lab4, RHEL/cPanel, go1.25.8)

Branch base: 7c9b409d (origin/main).

go test -v ./internal/installer/validate/...    PASS — 0.008s
go vet ./internal/installer/validate/... ./cmd/nftban-installer/...    clean (exit 0)
go test ./...    PASS — 64 packages, 0 FAIL
staticcheck     unavailable on lab4 (documented; CI should run it)

TMPDIR=/root/build-tmp was required on lab4 — both /tmp and /var/tmp are noexec under cPanel's /usr/tmpDSK mount.

Test plan

  • go test ./internal/installer/validate/... green on lab4
  • go vet ./internal/installer/validate/... ./cmd/nftban-installer/... clean on lab4
  • go test ./... green on lab4 (64 packages, 0 fail)
  • CI green on this PR (incl. staticcheck)
  • Auditor on-PR final review

🤖 Generated with Claude Code

PR26.1 — generic install-validation hardening. Adds four new
post-install assertions consumed by the existing AllPassed gate so
StateCommitted is blocked on any failure (no phases.go change):

  SYSTEMD-EXECSTART-001    every nftban unit's ExecStart/Pre/Post
                           local executable path exists on disk
  SYSTEMD-TIMER-PAIR-001   every nftban-*.timer has its target
                           service installed (explicit Unit= or
                           implicit basename.service)
  PAYLOAD-INVENTORY-001    nftban-owned paths referenced by
                           nftban units belong to the staged
                           payload inventory
  FAILED-UNIT-POSTINSTALL-001
                           no nftban-* unit is in failed state;
                           fails closed if systemctl enumeration
                           cannot complete

Generic by design — no panel logic, no DirectAdmin specifics, no
firewall-runtime mutation. Pure validator + host adapter; one
gather call feeds all four assertions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@itcmsgr itcmsgr merged commit cdf8f77 into main Apr 29, 2026
54 checks passed
@itcmsgr itcmsgr deleted the feat/pr26.1-systemd-payload-validation branch April 29, 2026 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant