Skip to content

Commit 9ea00d8

Browse files
itcmsgrclaude
andauthored
fix(v1.100.2): SF-1 — nftban health prints "ERROR: Script failed" on released host (#504)
Side finding SF-1 from PR-24 7-day passive soak (Day 6, 2026-04-26 on lab4 / AuthorityNone): `nftban health` rendered a spurious "ERROR: Script failed" block on top of its own DOWN-status table. Root cause: 4 unprotected call sites under set -Eeuo pipefail. nftban_health_cmd_truth() correctly captures the validator's exit code (2) internally and renders a clean DOWN table, but the bare invocations at the case-branch / dispatcher / top-level main fire the ERR trap defined in lib/strict.sh before the function's exit code can propagate cleanly. Fix: add `cmd || return $?` (or `|| exit $?` at top level) at: - cli/lib/nftban/cli/cmd_health.sh:140 — case check|""|truth|axes - cli/lib/nftban/cli/cmd_health.sh:189 — case json|--json - cli/sbin/nftban:917 — main() dispatch into nftban_cmd_<X> - cli/sbin/nftban:1261 — top-level `main "$@"` invocation The pattern is already documented inside nftban_health_cmd_truth itself (lines 413-421) for the same reason on the inner validator call. Exit-code semantics preserved: a released host now exits 2 cleanly, matching the rendered DOWN status, with no spurious trap output. Verified by local bash simulation: old pattern fires TRAP-FIRED on validator exit 2; new pattern propagates exit 2 cleanly with no trap output. Out of scope (locked): - Lifecycle execution surfaces untouched (restore/, state/file.go, uninstall_apply.go, etc.) - SF-2 stale FAILURE_REASON retained in state file — separate fix - Repo hygiene Phase A — separate (1.100.3+) - Lifecycle completion lane (PR-25..PR-30) — explicitly OPEN Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent cc4cf10 commit 9ea00d8

3 files changed

Lines changed: 53 additions & 5 deletions

File tree

CHANGELOG.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,38 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1111
1212
---
1313

14+
## [Unreleased] - v1.100.2 SF-1 health CLI fix (released-host case)
15+
16+
Post-soak correctness fix derived directly from the PR-24 7-day passive soak. Side finding **SF-1** was logged on Day 6 (2026-04-26) on the released/no-tables host (lab4): `nftban health` printed a spurious `ERROR: Script failed` block on top of its own DOWN-status output.
17+
18+
### Root cause
19+
20+
The `nftban_health_cmd_truth` function correctly returns exit code 2 when the Go validator can't verify the kernel state on a released host (no nftban tables present). It also renders a clean DOWN status table before returning. However, three separate call sites under `set -Eeuo pipefail` did not protect the bare invocation from the ERR trap:
21+
22+
- `cli/lib/nftban/cli/cmd_health.sh:140` — case branch in `nftban_cmd_health` calling `nftban_health_cmd_truth "$json_mode"`
23+
- `cli/lib/nftban/cli/cmd_health.sh:189` — second case branch (`json|--json` subcommand)
24+
- `cli/sbin/nftban:917` — main() dispatch invoking `"nftban_cmd_${func_cmd}" "$@"` followed by a `return $?` that never reached because the trap had already fired
25+
- `cli/sbin/nftban:1261` — top-level `main "$@"` where the script's set -e fires when main returns non-zero
26+
27+
When the validator exited 2, each unprotected level fired the ERR trap (defined in `lib/strict.sh`) which printed `ERROR: Script failed` to stderr on top of the legitimate DOWN table — confusing operators and contradicting the in-table message.
28+
29+
### Fix
30+
31+
Added the `cmd || return $?` (or `|| exit $?` at top level) idiom at all 4 sites. The pattern was already documented inside `nftban_health_cmd_truth` itself (lines 413-421) for the same reason on the inner validator call. The exit code semantics are preserved: a released host now exits 2 cleanly, matching the rendered `DOWN` status, with no spurious trap output.
32+
33+
### Verification
34+
35+
Local bash simulation reproduces the broken behaviour with the old pattern (TRAP-FIRED printed) and confirms the new pattern propagates exit 2 with no trap output.
36+
37+
### Out of scope for 1.100.2 (per locked instruction)
38+
39+
- Lifecycle execution surfaces remain untouched (`internal/installer/restore/*`, `internal/installer/state/file.go`, `cmd/nftban-installer/uninstall_apply.go`, etc.)
40+
- SF-2 (stale `FAILURE_REASON` retained in state file after `COMMITTED`) — separate cleanup
41+
- Repo hygiene Phase A — separate (1.100.3+)
42+
- Lifecycle completion lane (PR-25..PR-30) — explicitly **OPEN**
43+
44+
---
45+
1446
## [Unreleased] - v1.100.1b.D GOTH docs/repo cleanup (closes the removal track)
1547

1648
Final phase of the GOTH/UI removal sequence (A → B → C1 → C2 → **D**). Cleans up the runtime-touching code paths and JSON registries that referenced the retired Web GUI surface, plus obsolete CI workflow steps that no longer have any consumer.

cli/lib/nftban/cli/cmd_health.sh

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -137,7 +137,13 @@ nftban_cmd_health() {
137137
# v1.83: Default is Go-backed protection truth (4-axis table).
138138
# This is the primary operator health surface — fast, read-only,
139139
# deterministic. No side effects, no environment scanning.
140-
nftban_health_cmd_truth "$json_mode"
140+
#
141+
# SF-1 (v1.100.2): the truth path returns 2 on a released/no-tables
142+
# host (validator exits 2, function returns 2 after rendering a
143+
# DOWN table). The `|| return $?` idiom keeps that exit code as
144+
# the dispatcher's exit while suppressing the ERR trap that would
145+
# otherwise print "ERROR: Script failed" on top of the table.
146+
nftban_health_cmd_truth "$json_mode" || return $?
141147
;;
142148

143149
# =================================================================
@@ -186,7 +192,8 @@ nftban_cmd_health() {
186192
json|--json)
187193
# v1.84: "nftban health json" outputs Go validator truth JSON.
188194
# For diagnostics JSON, use: nftban health diagnostics --json
189-
nftban_health_cmd_truth "true"
195+
# SF-1 (v1.100.2): see check/truth/axes branch above.
196+
nftban_health_cmd_truth "true" || return $?
190197
;;
191198

192199
# =================================================================

cli/sbin/nftban

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -914,8 +914,13 @@ main() {
914914
# Some modules define nftban_cmd_X, others just use case statements
915915
if declare -f "nftban_cmd_${func_cmd}" >/dev/null 2>&1; then
916916
shift
917-
"nftban_cmd_${func_cmd}" "$@"
918-
return $?
917+
# SF-1 (v1.100.2): use `|| return $?` so a legitimate non-zero
918+
# exit from a command module (e.g. `nftban health` returning 2
919+
# for DOWN on a released host) is propagated cleanly to the
920+
# caller instead of triggering the ERR trap, which would print
921+
# "ERROR: Script failed" on top of the command's own output.
922+
"nftban_cmd_${func_cmd}" "$@" || return $?
923+
return 0
919924
else
920925
# Module handles args directly (like cmd_whitelist_system.sh)
921926
return 0
@@ -1253,4 +1258,8 @@ main() {
12531258
}
12541259

12551260
# Execute
1256-
main "$@"
1261+
# SF-1 (v1.100.2): `|| exit $?` so a legitimate non-zero exit from a
1262+
# command (e.g. `nftban health` returning 2 for DOWN on a released
1263+
# host) propagates as the script's exit code without triggering the
1264+
# ERR trap that would print "ERROR: Script failed".
1265+
main "$@" || exit $?

0 commit comments

Comments
 (0)