Skip to content

hotfix: backport #687 and #695 to v0.14.0#702

Merged
julianknutsen merged 3 commits intorelease/v0.14.0-hotfix-basefrom
hotfix/v0.14.0-copyfiles-fingerprint
Apr 14, 2026
Merged

hotfix: backport #687 and #695 to v0.14.0#702
julianknutsen merged 3 commits intorelease/v0.14.0-hotfix-basefrom
hotfix/v0.14.0-copyfiles-fingerprint

Conversation

@julianknutsen
Copy link
Copy Markdown
Collaborator

Summary

Validation

  • git diff --check v0.14.0..HEAD
  • go test ./internal/runtime/... ./cmd/gc/...

Backported commits

  • 426e896c fix: exclude pre_start-staged CopyFiles from config fingerprint (#682) (#695)
  • 2df03d2b fix: route agent-session GC_BEADS to raw provider (#687)
  • 8d59e433 fix: strip lifecycle wrapper from ambient GC_BEADS in rawBeadsProvider

sjarmak and others added 3 commits April 14, 2026 04:06
#695)

## Summary

Fixes #682 — polecat config-drift false positive from non-deterministic
CopyFiles hash.

`cmd/gc.stageHookFiles` probes workDir-relative paths (`.claude/skills`,
`.gemini/settings.json`, `.codex/hooks.json`, etc.) and captures their
content hashes into `CopyEntry.ContentHash`. Those destinations are
populated by `pre_start` scripts such as `worktree-setup.sh --sync`, so
the hash computed at session startup differs from the hash the
reconciler re-computes after `pre_start` completes. `CoreFingerprint`
diverges, config-drift fires, and the polecat is drained within 1–2
cycles of starting. Every replacement polecat hits the same race,
producing the thrash loop described in #682 (same `config_revision`,
three distinct `CopyFiles` breakdown hashes across cycles 1/4/6).

## Fix

Add `SkipFingerprint bool` to `runtime.CopyEntry`. All four fingerprint
consumers (`runtime.hashCoreFields`, `runtime.CoreFingerprintBreakdown`,
`runtime.LogCoreFingerprintDrift`, and the dormant
`cmd/gc.canonicalConfigHash`) skip entries where `cf.Probed &&
cf.SkipFingerprint` are both true.

The `Probed` precondition is deliberate: it guarantees that a future
caller who mistakenly sets `SkipFingerprint=true` on a config-derived
entry cannot silently mute a real config change.
`TestSkipFingerprintIgnoredOnConfigDerivedEntries` locks this contract
in.

`stageHookFiles` sets the flag on every workDir-based probed entry (the
7-provider hook loop + the `.claude/skills` stage). It also drops the
`runtime.HashPathContent()` call for those entries since the result is
now discarded — eliminating a recursive SHA-256 walk of `.claude/skills`
on every reconciler tick per polecat. The cityDir fallback
(`.gc/settings.json` / `hooks/claude.json`) is deliberately left
unmarked so real user edits still drive drain.
`TestStageHookFilesIncludesCanonicalClaudeHook` and
`TestStageHookFilesFallsBackToLegacyClaudeHook` both positively assert
`SkipFingerprint==false` on that branch.

## Test plan

- [x] `TestSkipFingerprintExcludesFromCoreHash` — skipped entries
produce the same fingerprint as a baseline Config regardless of
`ContentHash` (proves exclusion is total, not a `ContentHash==""`
shortcut).
- [x] `TestSkipFingerprintIgnoredOnConfigDerivedEntries` —
`Probed=false, SkipFingerprint=true` on a config-derived entry still
contributes to the fingerprint; changes to `Src` still drive drift.
Footgun guard.
- [x] `TestSkipFingerprintStableUnderFilesystemChurn` — regression test
for the exact #682 bug: probe a tmp skills dir, write a file to it,
re-probe. Without `SkipFingerprint` the hash drifts (locked in as a
precondition assertion so the test can't vacuously pass). With
`SkipFingerprint` the hash stays stable. A naive "skip when ContentHash
empty" implementation fails this test.
- [x] `TestLogCoreFingerprintDriftSkipsExcludedEntries` — diagnostic
output does not leak skipped entries.
- [x] `TestStageHookFilesIncludesCodexAndCopilotExecutableHooks`
extended: every workDir-based probed entry has `SkipFingerprint==true`
AND `ContentHash==""` (verifying the dropped `HashPathContent` call).
- [x] `TestStageHookFilesIncludesCanonicalClaudeHook` /
`TestStageHookFilesFallsBackToLegacyClaudeHook` extended: cityDir
fallback entries must NOT have `SkipFingerprint`.
- [x] `go build ./...` clean
- [x] `go vet ./...` clean
- [x] `make check` fmt-check + lint + vet clean
- [x] `go test ./internal/runtime/... ./cmd/gc/` PASS (two baseline
flakes verified failing identically on `origin/main`:
`TestHandleProviderReadinessReturnsNotInstalledWhenBinaryMissing` in
`internal/api`,
`TestControllerReloadsNamedSessionModeAndAppliesIdleTimeout` in `cmd/gc`
— neither is in this PR's code paths)

## Upgrade note

Live sessions persisted a `started_config_hash` computed with the old
(unstable) `CopyFiles` hashing. On the first reconciler tick after
upgrade, those hashes will not match the post-fix `CoreFingerprint`,
triggering one legit drift drain per live session. This is a one-time
cost that was already happening continuously pre-fix; the thrash loop
stops after the single replacement cycle.

---------

Co-authored-by: sjarmak <sjarmak@users.noreply.github.com>
Review follow-up: `rawBeadsProvider()` trusted any non-empty ambient
`GC_BEADS`. A parent process already exporting the city-managed
`exec:<city>/.gc/system/bin/gc-beads-bd` wrapper would propagate it into
nested agent sessions, reintroducing the exit-2/empty-JSON crash
addressed by #647.

Normalize the well-known wrapper path back to "bd" while preserving
genuine user `exec:` overrides. Added two regression tests covering
the contaminated-parent case and a custom-exec passthrough case.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added the status/needs-triage Inbox — we haven't looked at it yet label Apr 14, 2026
@julianknutsen julianknutsen merged commit 40bfad5 into release/v0.14.0-hotfix-base Apr 14, 2026
14 of 18 checks passed
@julianknutsen julianknutsen deleted the hotfix/v0.14.0-copyfiles-fingerprint branch April 14, 2026 06:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status/needs-triage Inbox — we haven't looked at it yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants