Skip to content

Commit f7be0c4

Browse files
itcmsgrclaude
andcommitted
fix(v1.100 PR-26-code-C2): A.4 corrupt-manifest is hard refusal, not soft-skip (auditor verdict)
Auditor focused-audit on PR-26-code-C flagged a semantic risk in the A.4 corrupt-manifest branch: previously a corrupt / hash-mismatch / unknown-entry / parse-failure manifest was a soft-skip with an informational sentinel, and A.5 still ran. The auditor argued — correctly — that proceeding to start csf.service when restore evidence is on disk but cannot be trusted weakens the evidence chain. Locked rule (per auditor verdict): manifest absent → soft-skip warning, continue to A.5 [migration gap, kept] manifest incomplete → ErrCSFRestoreCronManifestCorrupt, stop before A.5 hash mismatch → ErrCSFRestoreCronManifestCorrupt, stop before A.5 target exists dirty → ErrCSFRestoreCronTargetExists, stop before A.5 manifest clean → restore exact files, then continue to A.5 Behavior delta (this commit only — C1 + C2 + CI gate semantics remain otherwise unchanged): - Manifest parse failure / schema mismatch / unknown-entry path → A.4 returns wrapped ErrCSFRestoreCronManifestCorrupt; A.5 does NOT run; the existing §32 step-3 failure path retains the safety net. - Per-entry sha256 mismatch → same hard refusal. - Operator-content collision (target /etc/cron.d/<name> already exists) → A.4 returns wrapped ErrCSFRestoreCronTargetExists; A.5 does NOT run. - Manifest absent (pre-PR-26 host) → unchanged: graceful soft-skip with operator warning, control falls through to A.5. - Manifest clean → unchanged: restore both files, fall through to A.5. Files changed: cmd/nftban-installer/restore_deps_csf.go - ErrCSFRestoreCronManifestCorrupt docstring rewritten: now documents hard-refusal semantics (was: informational soft-skip). Wording updated: "refusing before A.5 (operator must inspect)". - New typed sentinel ErrCSFRestoreCronTargetExists for the operator-content-collision case. Distinct from ErrCSFRestoreCronManifestCorrupt for cleaner classification: a collision is an evidence conflict, not a manifest-trust failure. - A.4 step rewritten: * manifestErr branch now returns the wrapped sentinel instead of falling through. * Per-entry sha256 verify failure now returns instead of skip. * Per-entry unauthorized-Path now returns instead of skip. * Per-entry target-exists collision now returns ErrCSFRestoreCronTargetExists instead of skip. * Per-entry WriteFileAtomic failure now returns instead of skip. * Chown failure remains soft (logged warning, content already restored — partial-restore is recoverable; the integrity chain is unaffected). cmd/nftban-installer/restore_deps_csf_test.go - Renamed + retargeted three tests to assert hard-refusal: PR26C2_A4_TargetExists_SkipsRestore → PR26C2_A4_TargetExists_HardRefuses_StopsBeforeA5 + asserts errors.Is(err, ErrCSFRestoreCronTargetExists) + asserts NOT mock.CommandCalled("systemctl","start",csf.service) PR26C2_A4_SHA256Mismatch_SoftSkip_A5StillRuns → PR26C2_A4_SHA256Mismatch_HardRefuses_StopsBeforeA5 + asserts errors.Is(err, ErrCSFRestoreCronManifestCorrupt) + asserts A.5 NOT called PR26C2_A4_SchemaMismatch_SoftSkip_A5StillRuns → PR26C2_A4_SchemaMismatch_HardRefuses_StopsBeforeA5 + asserts errors.Is(err, ErrCSFRestoreCronManifestCorrupt) + asserts A.5 NOT called PR26C2_A4_UnknownEntryPath_Rejected → PR26C2_A4_UnknownEntryPath_HardRefuses_StopsBeforeA5 + asserts errors.Is(err, ErrCSFRestoreCronManifestCorrupt) + asserts A.5 NOT called - 3 new tests pinning the kept-behavior branches: PR26C2_A4_HappyPath_ContinuesToA5 — clean restore continues PR26C2_A4_ManifestAbsent_ContinuesToA5 — migration soft-skip continues PR26C2_A4_ParseFailure_HardRefuses_StopsBeforeA5 — parse failure stops Push criteria (all met as of this commit): - manifest absent = migration soft-skip ✓ (test #10 above) - manifest corrupt/hash mismatch = typed refusal before A.5 ✓ (tests #4, #5, #8, #11) - target cron path broad writes = impossible ✓ (allow-list + writer scope) - writer-before-reader invariant = tested ✓ (C1's roundtrip + C2's HappyPath_RestoresBothFiles) - G4-RESTORE-CRON-MANIFEST-INTEGRITY = PASS (local replay clean) - go test ./... + race + vet = PASS on lab2 Verified on lab2 (Ubuntu 24.04, go1.22.2): - go build ./... clean - go test ./... (full repo) PASS - go test -race -count=1 cmd + restore + state + switchop PASS - go vet ./... clean - 11 TestCSFMutate_PR26C2_* tests all PASS (3 hard-refusal tests retargeted; 1 unchanged; 7 unchanged or new) - existing PR-25 / PR-26-code-A / PR-26-code-B tests all still pass - G4-RESTORE-CRON-MANIFEST-INTEGRITY local replay: FAIL=0 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 93e86e2 commit f7be0c4

2 files changed

Lines changed: 182 additions & 87 deletions

File tree

cmd/nftban-installer/restore_deps_csf.go

Lines changed: 72 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -141,15 +141,33 @@ var (
141141
// ErrCSFRestoreServiceStopFailed wraps a non-nil ServiceStop error.
142142
ErrCSFRestoreServiceStopFailed = errors.New("restore csf: A.6 ServiceStop(nftband.service) failed")
143143

144-
// ErrCSFRestoreCronManifestCorrupt is returned (informationally —
145-
// not as a fatal error from mutateToCSFTarget) when the §42.2
146-
// manifest is present but corrupt / schema-mismatch / lists an
147-
// unknown-entry path / has a sha256 mismatch. Per §42.2-D the
148-
// overall mutation does NOT abort; A.4 logs a warning, emits this
149-
// sentinel for observability, and falls through to A.5. Tests
150-
// assert this sentinel is exposed for assertion via errors.Is.
151-
// PR-26-code-C2 addition.
152-
ErrCSFRestoreCronManifestCorrupt = errors.New("restore csf: A.4 cron-backup manifest is corrupt or unrecognized — soft-skip per §42.2-D, A.5 still runs")
144+
// ErrCSFRestoreCronManifestCorrupt is returned by A.4 as a HARD
145+
// refusal when the §42.2 manifest is present but cannot be
146+
// trusted: parse failure, schema mismatch, unknown-entry path,
147+
// or per-entry sha256 mismatch. Per the auditor verdict on
148+
// PR-26-code-C: when NFTBan has restore evidence on disk but
149+
// cannot trust it, proceeding to A.5 (start csf.service) would
150+
// weaken the evidence chain. A.4 stops before A.5, the safety
151+
// net is retained by the existing Execute failure path, and the
152+
// operator must inspect.
153+
//
154+
// Manifest ABSENT (no manifest.json at all) is the migration-gap
155+
// case for pre-PR-26 hosts; that path remains a soft-skip and
156+
// continues to A.5. Only manifest-present-but-untrusted is hard.
157+
//
158+
// PR-26-code-C2 addition (semantics revised on auditor pass —
159+
// originally soft-skip, now hard-refusal).
160+
ErrCSFRestoreCronManifestCorrupt = errors.New("restore csf: A.4 cron-backup manifest is corrupt or unrecognized — refusing before A.5 (operator must inspect)")
161+
162+
// ErrCSFRestoreCronTargetExists is returned by A.4 as a HARD
163+
// refusal when a manifest entry's target /etc/cron.d/<name> is
164+
// already present on disk at restore time. The operator may have
165+
// re-created a different version of the cron file post-takeover;
166+
// A.4 must NOT overwrite operator content. Per the auditor verdict
167+
// this is treated as an evidence-conflict case — restoration is
168+
// stopped before A.5 and the operator must reconcile manually.
169+
// PR-26-code-C2 addition (semantics revised on auditor pass).
170+
ErrCSFRestoreCronTargetExists = errors.New("restore csf: A.4 target /etc/cron.d/<name> already present — refusing before A.5 (operator-content collision; manual reconcile required)")
153171

154172
// ErrCSFRestoreNftReleaseUnsafe is returned by A.7 whenever the
155173
// safety-net-safe predicate is either unavailable (nil — the
@@ -324,93 +342,90 @@ func mutateToCSFTarget(ctx context.Context, m *productionMutationDep) error {
324342
m.log.Info("restore csf: A.3 skip — %s present, no rename needed", csfBinary)
325343
}
326344

327-
// A.4: Cron restore from manifest (PR-26-code-C2 / §42.2 lock).
345+
// A.4: Cron restore from manifest (PR-26-code-C2 / §42.2 lock,
346+
// auditor-revised semantics — corrupt manifest is HARD refusal).
328347
//
329-
// Read switchop.ReadCronBackupManifest. Three branches:
348+
// Branches:
330349
//
331-
// - Manifest absent (pre-PR-26 host): graceful soft-skip with a
332-
// specific operator warning. A.4 does not act.
333-
// - Manifest present but corrupt / schema-mismatch / unknown-entry
334-
// / sha256-mismatch: refuse the cron restore with the typed
335-
// ErrCSFRestoreCronManifestCorrupt sentinel. Per §42.2-D the
336-
// overall mutation does NOT abort — A.5 still runs because
337-
// csf can function without cron (lfd will not auto-restart;
338-
// operator must inspect). A.4's failure is a P1 logged warning.
339-
// - Manifest present + integrity-clean: for each entry whose
340-
// target path is currently absent, restore the content via
341-
// exec.WriteFileAtomic (preserves mode) + exec.Chown
342-
// (preserves uid/gid). Targets that already exist are
343-
// skipped (operator may have re-created a different version
344-
// post-takeover; A.4 must not overwrite operator content).
350+
// - Manifest ABSENT (pre-PR-26 host): graceful soft-skip with
351+
// an operator warning. A.4 does not act; control falls
352+
// through to A.5. This is the migration-gap path.
353+
// - Manifest present but UNTRUSTED (parse failure / schema
354+
// mismatch / unknown-entry / per-entry sha256 mismatch):
355+
// return ErrCSFRestoreCronManifestCorrupt — A.5 does NOT
356+
// run, the existing §32 step-3 failure path retains the
357+
// safety net, and the operator must inspect. Proceeding to
358+
// start csf.service while restore evidence is on disk but
359+
// untrusted would weaken the evidence chain (auditor
360+
// verdict on PR-26-code-C).
361+
// - Target ALREADY exists on disk (operator-content collision):
362+
// return ErrCSFRestoreCronTargetExists. Same hard-refusal
363+
// semantics as corrupt-manifest — A.4 must not overwrite
364+
// operator content, and the surrounding evidence-conflict
365+
// warrants stopping before A.5.
366+
// - Manifest present + integrity-clean + targets absent: for
367+
// each entry, restore the content via exec.WriteFileAtomic
368+
// (preserves mode) + exec.Chown (preserves uid/gid). Then
369+
// fall through to A.5.
345370
//
346371
// Absolutely no:
347372
// - template regeneration (only restore-from-backup, never
348373
// synthesize content)
349374
// - writes outside the two §42.2-locked target paths
350375
// - DirectAdmin custombuild rewrites
351376
// - cron files that NFTBan did not back up itself
352-
a4SoftSkipWarn := func(reason string) {
353-
if m.log != nil {
354-
m.log.Warn("restore csf: A.4 soft-skip — %s. %s and %s NOT auto-restored. Operator must restore manually if needed.",
355-
reason, csfCronPath, lfdCronPath)
356-
}
357-
}
358-
359377
manifest, manifestPresent, manifestErr := switchop.ReadCronBackupManifest(m.exec, m.log)
360378
switch {
361379
case manifestErr != nil:
362-
// Present but corrupt / schema-mismatch / unknown-entry.
363380
if m.log != nil {
364-
m.log.Warn("restore csf: A.4 manifest is corrupt: %v", manifestErr)
381+
m.log.Error("restore csf: A.4 manifest untrusted: %v — refusing before A.5", manifestErr)
365382
}
366-
a4SoftSkipWarn("cron-backup manifest is corrupt or unrecognized")
367-
// Per §42.2-D: do NOT abort. Continue to A.5. The operator
368-
// receives the warning and the typed sentinel surfaces in
369-
// the dispatcher's evidence record only if higher-layer code
370-
// chooses to wrap it. PR-26-code-C does not abort A.5 on
371-
// cron failure.
372-
_ = ErrCSFRestoreCronManifestCorrupt // typed sentinel exposed for callers/tests; see A.4 corrupt-manifest tests
383+
return fmt.Errorf("%w: %v", ErrCSFRestoreCronManifestCorrupt, manifestErr)
373384

374385
case !manifestPresent:
375-
a4SoftSkipWarn("cron-backup manifest absent (pre-PR-26 host)")
386+
if m.log != nil {
387+
m.log.Warn("restore csf: A.4 soft-skip — cron-backup manifest absent (pre-PR-26 host); %s and %s NOT auto-restored",
388+
csfCronPath, lfdCronPath)
389+
}
376390

377391
default:
378392
a4Restored := 0
379-
a4Skipped := 0
380393
for _, entry := range manifest.Files {
381394
// Defense-in-depth: only the two §42.2-locked paths.
395+
// (The reader already rejects unknown-entry manifests;
396+
// this is belt-and-braces.)
382397
if entry.Path != csfCronPath && entry.Path != lfdCronPath {
383398
if m.log != nil {
384-
m.log.Warn("restore csf: A.4 ignoring manifest entry with unauthorized path %q", entry.Path)
399+
m.log.Error("restore csf: A.4 manifest entry has unauthorized path %q — refusing before A.5", entry.Path)
385400
}
386-
continue
401+
return fmt.Errorf("%w: unauthorized entry path %q", ErrCSFRestoreCronManifestCorrupt, entry.Path)
387402
}
388403
// Verify sha256 integrity against the on-disk backup.
404+
// Mismatch is HARD refusal — restore evidence on disk
405+
// is untrusted; do not start csf with bad evidence.
389406
content, vErr := switchop.VerifyCronBackupEntry(m.exec, entry)
390407
if vErr != nil {
391408
if m.log != nil {
392-
m.log.Warn("restore csf: A.4 sha256 verify failed for %s: %v — skipping this file", entry.Path, vErr)
409+
m.log.Error("restore csf: A.4 sha256 verify failed for %s: %v — refusing before A.5", entry.Path, vErr)
393410
}
394-
a4Skipped++
395-
continue
411+
return fmt.Errorf("%w: %v", ErrCSFRestoreCronManifestCorrupt, vErr)
396412
}
397-
// Skip if target already exists — operator may have
398-
// re-created a different version post-takeover.
413+
// Operator-content collision: target already exists.
414+
// HARD refuse — A.4 must not overwrite operator content
415+
// and the conflict warrants stopping before A.5.
399416
if m.exec.FileExists(entry.Path) {
400417
if m.log != nil {
401-
m.log.Warn("restore csf: A.4 target %s already present — skipping (operator content not overwritten)", entry.Path)
418+
m.log.Error("restore csf: A.4 target %s already present — refusing before A.5 (operator-content collision)", entry.Path)
402419
}
403-
a4Skipped++
404-
continue
420+
return fmt.Errorf("%w: %s", ErrCSFRestoreCronTargetExists, entry.Path)
405421
}
406422
// Restore: WriteFileAtomic preserves mode; Chown
407423
// applies uid/gid for fidelity.
408424
if err := m.exec.WriteFileAtomic(entry.Path, content, fileModeFromUint32(entry.Mode)); err != nil {
409425
if m.log != nil {
410-
m.log.Warn("restore csf: A.4 WriteFileAtomic(%s) failed: %v — skipping", entry.Path, err)
426+
m.log.Error("restore csf: A.4 WriteFileAtomic(%s) failed: %v — refusing before A.5", entry.Path, err)
411427
}
412-
a4Skipped++
413-
continue
428+
return fmt.Errorf("%w: WriteFileAtomic(%s): %v", ErrCSFRestoreCronManifestCorrupt, entry.Path, err)
414429
}
415430
if err := m.exec.Chown(entry.Path, entry.UID, entry.GID); err != nil {
416431
if m.log != nil {
@@ -425,8 +440,7 @@ func mutateToCSFTarget(ctx context.Context, m *productionMutationDep) error {
425440
}
426441
}
427442
if m.log != nil {
428-
m.log.Info("restore csf: A.4 manifest-restore complete (restored=%d, skipped=%d)",
429-
a4Restored, a4Skipped)
443+
m.log.Info("restore csf: A.4 manifest-restore complete (restored=%d)", a4Restored)
430444
}
431445
}
432446

0 commit comments

Comments
 (0)