Skip to content

Commit fff7c48

Browse files
itcmsgrclaude
andauthored
feat(v1.99 PR-17): package-install target detection + preflight deepening (#474)
Second PR in the v1.99 Update Engine Canonization track. Deepens G3-U4 with package-install target detection + two new preflight checks + a planning-only RecoveryPlan surface. No mutation, no apply logic. Architecture constraint (INV-U-001) remains intact: Update is still a BOUNDED TRIGGER into the rebuild/lifecycle pipeline. Apply work stays deferred to PR-18. New file internal/installer/update/target.go: - DetectInstallOrigin: probe rpm -q / dpkg -s / NFTBAN_SOURCE_DIR env to classify origin when operator didn't pass --rpm/--deb/--source - DetectPackageTarget: query rpm -q --queryformat '%{VERSION}' OR dpkg -s | grep ^Version: - Both functions READ-ONLY — no package manager transactions Extensions: - DetectVersions signature: +origin param; source tree wins over package query so explicit --source-dir is always honoured - Preflight: +origin param; new checks P-6 rebuild_recovery_available (terminal prior state + ip nftban + nft binary) and P-7 install_origin_coherent (declared vs detected match) - BuildRecoveryPlan: produces RecoveryPlan metadata (mechanism=rebuild, Available bool, Notes). Planning-only — no recovery execution - Plan.AttachRecovery + Plan.Recovery field: new planning surface apply (PR-18) will consume per INV-U-002 - Plan.Render: shows Recovery block with availability + mechanism Installer binary cmd/nftban-installer/update_dryrun.go: - auto-detects origin via update.DetectInstallOrigin when no flag passed (so package-install hosts without --rpm/--deb still get a correct plan) - attaches BuildRecoveryPlan output to the rendered plan Tests (new target_test.go + extended update_test.go): - 11 new unit tests covering DetectInstallOrigin (rpm/deb/source/unknown), DetectPackageTarget (rpm/deb/not-owned/unknown-origin), DetectVersions package-deb fallback + source-tree override, P-6/P-7 preflight paths, BuildRecoveryPlan happy/no-state/in-progress cases CI gate extensions: - G3-U4 now asserts all 7 preflight checks reported - New G3-U4-deepen step: Recovery block + Mechanism line must render Out of scope (explicit — per spec + user guidance): - No payload mutation - No apply logic - No rebuild switch changes - No shell update path deletion - No .conf.local write path changes - No config delivery changes - No rollback logic beyond planning/metadata Depends on: v1.98.2 tag (already shipped), PR-16 merged (ca48884). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent ca48884 commit fff7c48

7 files changed

Lines changed: 725 additions & 41 deletions

File tree

.github/workflows/ci-update-canonization.yml

Lines changed: 32 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -228,22 +228,47 @@ jobs:
228228
echo "G3-U2 PASS — current=$CURRENT target=$TARGET"
229229
230230
# ------------------------------------------------------------------
231-
# G3-U4 — preflight enforcement: all five checks must appear in the
232-
# plan's Preflight block. We don't require every check to pass —
233-
# CI environments don't have nftband running — but every check
234-
# MUST be evaluated and reported.
231+
# G3-U4 — preflight enforcement: all seven checks must appear in the
232+
# plan's Preflight block (PR-16 P-1..P-5 + PR-17 P-6..P-7). We don't
233+
# require every check to pass — CI environments don't have nftband
234+
# running — but every check MUST be evaluated and reported.
235235
# ------------------------------------------------------------------
236-
- name: G3-U4 — preflight reports all 5 checks
236+
- name: G3-U4 — preflight reports all 7 checks
237237
shell: bash
238238
run: |
239239
set -Eeuo pipefail
240-
for name in authority_nftban service_nftband_active artifact_version_file dependency_nft state_no_stale_in_progress; do
240+
for name in \
241+
authority_nftban \
242+
service_nftband_active \
243+
artifact_version_file \
244+
dependency_nft \
245+
state_no_stale_in_progress \
246+
rebuild_recovery_available \
247+
install_origin_coherent; do
241248
grep -q "$name" /tmp/dryrun.out || {
242249
echo "::error::G3-U4 FAIL: preflight check '$name' missing from plan"
243250
exit 1
244251
}
245252
done
246-
echo "G3-U4 PASS — all 5 preflight checks reported"
253+
echo "G3-U4 PASS — all 7 preflight checks reported"
254+
255+
# ------------------------------------------------------------------
256+
# G3-U4-deepen (PR-17) — recovery planning metadata must be rendered
257+
# so apply (PR-18) has explicit input about rollback reachability.
258+
# ------------------------------------------------------------------
259+
- name: G3-U4-deepen — recovery plan metadata rendered
260+
shell: bash
261+
run: |
262+
set -Eeuo pipefail
263+
grep -q "Recovery plan" /tmp/dryrun.out || {
264+
echo "::error::G3-U4-deepen FAIL: plan has no Recovery block"
265+
exit 1
266+
}
267+
grep -q "Mechanism" /tmp/dryrun.out || {
268+
echo "::error::G3-U4-deepen FAIL: Recovery block missing Mechanism"
269+
exit 1
270+
}
271+
echo "G3-U4-deepen PASS — recovery planning metadata present"
247272
248273
summary:
249274
name: Update Canonization summary

cmd/nftban-installer/update_dryrun.go

Lines changed: 15 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -69,25 +69,30 @@ func runUpdateDryRun(ctx context.Context, exec executor.Executor, sf *state.Stat
6969
}
7070
log.PhaseEnd("Detect")
7171

72-
// 2. Update-specific preflight (P-1 through P-5).
72+
// 2. Install origin — declared flag wins; fall back to package-manager
73+
// probe (PR-17) so package installs that don't re-pass --rpm/--deb
74+
// still get a correct plan.
75+
origin := detectInstallOrigin(cfg)
76+
if origin == "" {
77+
origin = update.DetectInstallOrigin(exec, log)
78+
}
79+
80+
// 3. Update-specific preflight (P-1..P-7 — PR-16 + PR-17).
7381
log.Phase("Preflight")
74-
pre := update.Preflight(exec, log)
82+
pre := update.Preflight(exec, log, origin)
7583
log.PhaseEnd("Preflight")
7684

77-
// 3. Version detection. sourceDir is empty in the package-install case;
78-
// the update package treats that as non-fatal for PR-16 (package-install
79-
// target detection lands in PR-17).
80-
current, target, err := update.DetectVersions(exec, cfg.sourceDir, log)
85+
// 4. Version detection — source tree wins; package manager is the
86+
// fallback for package-install hosts (PR-17).
87+
current, target, err := update.DetectVersions(exec, cfg.sourceDir, origin, log)
8188
if err != nil {
8289
log.Error("update dry-run: version detection failed: %v", err)
8390
return state.ExitFailed
8491
}
8592

86-
// 4. Install origin — drives which apply path PR-18 will use.
87-
origin := detectInstallOrigin(cfg)
88-
89-
// 5. Assemble + render plan.
93+
// 5. Assemble plan + attach PR-17 recovery metadata (planning-only).
9094
plan := update.BuildPlan(pre, current, target, origin)
95+
plan.AttachRecovery(update.BuildRecoveryPlan(exec))
9196
plan.Render(os.Stdout)
9297

9398
// 6. Write a copy of the plan JSON to the state dir for audit/history

internal/installer/update/plan.go

Lines changed: 75 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -77,39 +77,78 @@ type Plan struct {
7777
// this abstract ("reuse rebuild pipeline"); PR-18 fills in concrete
7878
// steps.
7979
Actions []string `json:"actions,omitempty"`
80+
81+
// Recovery carries planning-only metadata about the rollback route that
82+
// PR-18's apply step will rely on. PR-17 populates this but does NOT
83+
// execute any recovery — that's PR-18 scope per INV-U-002.
84+
Recovery *RecoveryPlan `json:"recovery,omitempty"`
85+
}
86+
87+
// RecoveryPlan is PR-17 metadata describing the rollback path apply will
88+
// follow on failure. It is DESCRIPTIVE only; no state mutation in PR-17.
89+
type RecoveryPlan struct {
90+
// Available reports whether a rollback to the prior known-good state
91+
// is possible without operator intervention.
92+
Available bool `json:"available"`
93+
94+
// Mechanism is the rollback mechanism that would be used — currently
95+
// "rebuild" (validate-in-namespace + flush+load + v1.96 recovery), the
96+
// only canonized recovery surface today.
97+
Mechanism string `json:"mechanism"`
98+
99+
// Notes carries human-oriented context that helps an operator reason
100+
// about recovery without digging into code (e.g. "prior state file
101+
// present and terminal", "no prior install_state — fresh recovery").
102+
Notes []string `json:"notes,omitempty"`
80103
}
81104

82105
// PlanSchemaVersion is the current wire contract for Plan. Consumers should
83106
// reject plans with unexpected schema versions.
84107
const PlanSchemaVersion = "1.99.0"
85108

86109
// DetectVersions reads the current installed version from the FHS VERSION
87-
// file and the target version from the source-tree VERSION (for source
88-
// installs) or from the package metadata path.
110+
// file and the target version from:
89111
//
90-
// For PR-16 we support source-install detection; package-install target
91-
// detection is handled in PR-17.
92-
func DetectVersions(exec executor.Executor, sourceDir string, log *logging.Logger) (current, target string, err error) {
112+
// - <sourceDir>/VERSION when sourceDir is non-empty (source installs)
113+
// - the package manager (rpm -q / dpkg -s) when origin is "rpm"/"deb"
114+
// and sourceDir is empty (PR-17 scope)
115+
//
116+
// Missing target is non-fatal — the plan carries a warning downstream.
117+
// Missing current IS fatal — we must know where we are before planning
118+
// any transition (INV-U-002 rollbackability).
119+
func DetectVersions(exec executor.Executor, sourceDir, origin string, log *logging.Logger) (current, target string, err error) {
93120
current = readCurrentVersion(exec, log)
94121

95-
// Source install: target VERSION lives at <sourceDir>/VERSION.
122+
// Priority 1: source tree VERSION (takes precedence over package query
123+
// so a caller can test with a source tree even on a package-install host).
96124
if sourceDir != "" {
97125
tPath := filepath.Join(sourceDir, "VERSION")
98126
if data, rErr := exec.ReadFile(tPath); rErr == nil {
99127
target = strings.TrimSpace(string(data))
128+
log.Debug("update: target version from source tree %s = %s", tPath, target)
100129
} else {
101130
log.Debug("update: source VERSION not readable at %s: %v", tPath, rErr)
102131
}
103132
}
104133

134+
// Priority 2 (PR-17): package manager query. Only consulted when source
135+
// detection did not yield a result. We pass in `origin` rather than
136+
// auto-detecting inside this function so the caller stays authoritative
137+
// about the install-origin decision.
138+
if target == "" && (origin == "rpm" || origin == "deb") {
139+
if t, qErr := DetectPackageTarget(exec, origin, log); qErr != nil {
140+
log.Warn("update: package target query failed: %v", qErr)
141+
} else if t != "" {
142+
target = t
143+
log.Debug("update: target version from package manager (%s) = %s", origin, target)
144+
}
145+
}
146+
105147
if current == "" {
106148
return current, target, fmt.Errorf("update: cannot detect current version (no VERSION file)")
107149
}
108150
if target == "" {
109-
// PR-16: target detection for package installs lands in PR-17.
110-
// For now, missing target is non-fatal — the plan records it and
111-
// the preflight flags it as a warning.
112-
log.Info("update: target version not yet detected (package-install detection lands in PR-17)")
151+
log.Info("update: target version not detected (no source tree + no package ownership)")
113152
}
114153
return current, target, nil
115154
}
@@ -172,6 +211,20 @@ func (p *Plan) Render(w io.Writer) {
172211
fmt.Fprintln(w, "")
173212
}
174213

214+
if p.Recovery != nil {
215+
fmt.Fprintln(w, " Recovery plan (metadata only — apply lands in PR-18):")
216+
availMark := "available"
217+
if !p.Recovery.Available {
218+
availMark = "NOT available — operator intervention may be required"
219+
}
220+
fmt.Fprintf(w, " Rollback : %s\n", availMark)
221+
fmt.Fprintf(w, " Mechanism : %s\n", displayString(p.Recovery.Mechanism))
222+
for _, n := range p.Recovery.Notes {
223+
fmt.Fprintf(w, " · %s\n", n)
224+
}
225+
fmt.Fprintln(w, "")
226+
}
227+
175228
if len(p.Actions) > 0 {
176229
fmt.Fprintln(w, " Actions (dry-run — no mutation):")
177230
for _, a := range p.Actions {
@@ -209,11 +262,22 @@ func BuildPlan(pre *PreflightResult, current, target, origin string) *Plan {
209262
}
210263
if target == "" {
211264
p.Warnings = append(p.Warnings,
212-
"target version not detected (package-install target detection lands in PR-17)")
265+
"target version not detected (no source tree + no package ownership)")
266+
}
267+
if origin == "" {
268+
p.Warnings = append(p.Warnings,
269+
"install origin not detected — pass --source, --rpm, or --deb to disambiguate")
213270
}
214271
return p
215272
}
216273

274+
// AttachRecovery decorates the plan with the PR-17 recovery metadata.
275+
// Separate from BuildPlan so callers without an exec handle (tests) can
276+
// still build a plan without synthesizing recovery state.
277+
func (p *Plan) AttachRecovery(r *RecoveryPlan) {
278+
p.Recovery = r
279+
}
280+
217281
// displayVersion returns a placeholder for empty versions so the rendered
218282
// table doesn't print blank cells that look like a bug.
219283
func displayVersion(v string) string {

internal/installer/update/preflight.go

Lines changed: 111 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,10 @@ type PreflightResult struct {
6161
// Called by phaseDetect when cfg.mode == "upgrade".
6262
//
6363
// The function is READ-ONLY. It must never mutate host state.
64-
func Preflight(exec executor.Executor, log *logging.Logger) *PreflightResult {
64+
//
65+
// origin is the operator-declared install origin ("rpm", "deb", "source",
66+
// or ""). PR-17 uses it for the new P-7 coherence check.
67+
func Preflight(exec executor.Executor, log *logging.Logger, origin string) *PreflightResult {
6568
res := &PreflightResult{Passed: true}
6669

6770
// P-1 authority — the existing authority.Detect would give us a full
@@ -150,9 +153,116 @@ func Preflight(exec executor.Executor, log *logging.Logger) *PreflightResult {
150153
logCheck(log, c)
151154
}
152155

156+
// P-6 (PR-17) rebuild recovery available — the planning-only check
157+
// that tells apply (PR-18) whether a clean rollback is reachable.
158+
// We do NOT execute recovery here; we just report whether the
159+
// prerequisites exist: terminal prior state + ip nftban table + nft
160+
// binary. If any is missing, apply must treat rollback as
161+
// operator-assisted rather than automatic (INV-U-002).
162+
{
163+
ok := rebuildRecoveryAvailable(exec)
164+
c := PreflightCheck{
165+
Name: "rebuild_recovery_available",
166+
Passed: ok,
167+
Severity: "warning",
168+
}
169+
if !ok {
170+
c.Detail = "rebuild recovery may require operator intervention on failure — review before apply"
171+
}
172+
res.Checks = append(res.Checks, c)
173+
logCheck(log, c)
174+
}
175+
176+
// P-7 (PR-17) install origin coherent — the declared --rpm / --deb /
177+
// --source flag must match the host's actual install origin. A
178+
// mismatch (e.g. --rpm on a DEB host) is a hard warning: apply would
179+
// route through the wrong delivery model. PR-17 detects this early.
180+
{
181+
declared := origin
182+
detected := DetectInstallOrigin(exec, log)
183+
coherent := declared == "" || detected == "" || declared == detected
184+
c := PreflightCheck{
185+
Name: "install_origin_coherent",
186+
Passed: coherent,
187+
Severity: "warning",
188+
}
189+
if !coherent {
190+
c.Detail = "declared origin (" + declared + ") does not match detected origin (" + detected + ")"
191+
}
192+
res.Checks = append(res.Checks, c)
193+
logCheck(log, c)
194+
}
195+
153196
return res
154197
}
155198

199+
// rebuildRecoveryAvailable reports whether the prerequisites for automatic
200+
// rollback via rebuild recovery are present:
201+
//
202+
// - the prior install_state is terminal (COMMITTED / DEGRADED / FAILED_*)
203+
// - ip nftban table exists (authority held)
204+
// - nft binary is present (handled by P-4; we still check here so this
205+
// predicate is self-contained for callers outside the full preflight)
206+
//
207+
// PR-17 scope: this is metadata only. PR-18 will wire it into the apply
208+
// path via INV-U-002.
209+
func rebuildRecoveryAvailable(exec executor.Executor) bool {
210+
if !exec.NftTableExists("ip", "nftban") {
211+
return false
212+
}
213+
if r := exec.Run("sh", "-c", "command -v nft >/dev/null 2>&1"); r.ExitCode != 0 {
214+
return false
215+
}
216+
// Prior state must be readable OR absent (absent = fresh recovery is
217+
// fine). If present but non-terminal (in-progress), recovery is not
218+
// clean.
219+
const p = "/var/lib/nftban/state/install_state"
220+
if !exec.FileExists(p) {
221+
return true
222+
}
223+
data, err := exec.ReadFile(p)
224+
if err != nil {
225+
return false
226+
}
227+
s := trim(string(data))
228+
if s == "" {
229+
return true
230+
}
231+
// Same terminal-state list as isStaleInProgress but inverted.
232+
switch s {
233+
case "COMMITTED", "DEGRADED",
234+
"FAILED_SSH_UNKNOWN", "FAILED_AUTHORITY_ABORT", "FAILED_RENDER",
235+
"FAILED_REBUILD", "FAILED_NO_FIREWALL", "FAILED_TAKEOVER":
236+
return true
237+
}
238+
return false
239+
}
240+
241+
// BuildRecoveryPlan derives the PR-17 recovery metadata from exec state.
242+
// Planning-only: no mutation, no recovery execution.
243+
func BuildRecoveryPlan(exec executor.Executor) *RecoveryPlan {
244+
r := &RecoveryPlan{
245+
Mechanism: "rebuild",
246+
}
247+
r.Available = rebuildRecoveryAvailable(exec)
248+
const p = "/var/lib/nftban/state/install_state"
249+
if !exec.FileExists(p) {
250+
r.Notes = append(r.Notes,
251+
"no prior install_state — a fresh recovery is clean (no rollback target)")
252+
} else {
253+
data, err := exec.ReadFile(p)
254+
if err == nil {
255+
r.Notes = append(r.Notes,
256+
"prior install_state = "+trim(string(data)))
257+
}
258+
}
259+
if !r.Available {
260+
r.Notes = append(r.Notes,
261+
"preconditions for automatic rollback unmet — apply will require operator assist on failure")
262+
}
263+
return r
264+
}
265+
156266
// isStaleInProgress returns true if install_state exists AND is not a
157267
// terminal state (COMMITTED / DEGRADED / FAILED_*). We treat an in-progress
158268
// marker as a caution.

0 commit comments

Comments
 (0)