v1.33.0.0 feat: /sync-gbrain memory-stage batch-import refactor (D1-D8) + F6/F9 + signal cleanup#1432
Merged
Conversation
…ll-file hash
bin/gstack-memory-ingest.ts: rewrite memory ingest around `gbrain import <dir>`
batch path. Replaces per-file gbrainPutPage loop (~470s of subprocess startup
per cold run) with prepare-then-batch:
walkAllSources
-> preparePages: mtime-skip + optional gitleaks (--scan-secrets) + parse
-> writeStaged: mkdir -p per slug segment, hierarchical (D1)
-> snapshot ~/.gbrain/sync-failures.jsonl byte offset
-> runGbrainImport (async spawn) -> parseImportJson
-> readNewFailures: read appended bytes, map back to source paths (D7)
-> state.sessions[path] = {...} for files NOT in failed set
-> saveStateAtomic (F6) + cleanupStagingDir
Architecture decisions:
D1 hierarchical staging dir
D2 cut over, deleted gbrainPutPage entirely
D3 source-file gitleaks made opt-in via --scan-secrets (gstack-brain-sync
owns the cross-machine boundary; per-file scan was redundant ~470s tax)
D4 OK/ERR verdict (no DEGRADED tri-state)
D5 unified state schema (no separate skip-list)
D6 trust gbrain content_hash idempotency (no skip_reason bookkeeping)
D7 byte-offset snapshot of sync-failures.jsonl + per-source mapping
F6 saveState uses tmp+rename atomic write
F9 fileSha256 removes 1MB cap; full-file hash (no more silent tail-edit
misses on long partial transcripts)
Signal handling: installSignalForwarder propagates SIGTERM/SIGINT to the
gbrain child process AND synchronously cleans the staging dir before
process.exit. Pre-fix, orchestrator timeouts left gbrain processes
orphaned holding the PGLite write lock (observed: 15-hour-CPU-time
orphan still alive a day later).
parseImportJson returns null on unparseable output (treated as ERR by
caller) instead of silently zeroing through.
gbrainAvailable() probes for the `import` subcommand instead of `put`.
Plan + review chain at /Users/garrytan/.claude/plans/purrfect-tumbling-quiche.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
gstack-gbrain-sync.ts: memory-stage parser now picks [memory-ingest] ERR lines preferentially over the latest [memory-ingest] line, strips the prefix and any leading 'ERR: ' for cleaner summary output, and surfaces '(killed by signal / timeout)' when the child exits with status=null. Matches D6's OK/ERR contract: per-file failures (FILE_TOO_LARGE etc.) show in the summary count but only system-level failures (gbrain crash, process kill, missing CLI) mark the stage ERR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
test/gstack-memory-ingest.test.ts: 5 new tests for the batch-import
architecture:
1. D1 hierarchical staging slug round-trip — asserts staged file lives
in transcripts/claude-code/<dir>/*.md, not flat at staging root
2. Frontmatter injection — asserts title/type/tags written into the
staged page's YAML block
3. D7 sync-failures.jsonl exclusion — files listed as failed by
gbrain do NOT get state-recorded; one of two test sessions lands,
the other stays un-ingested for retry next run
4. Missing-`import`-subcommand error path — when gbrain only advertises
legacy `put`, memory-ingest exits 1 with [memory-ingest] ERR
5. --scan-secrets opt-in path — verifies a dirty-source file is
skipped via the secret-scan match when the flag is on, while a
clean session in the same run still gets staged
Replaces the prior put-per-file shim with an import-batch shim. The
shim fails loudly (exit 99) if the new code ever regresses to per-file
`gbrain put` calls.
test/fixtures/golden/{claude,codex,factory}-ship-SKILL.md: refresh
golden baselines to match the current generated SKILL.md content after
the v1.31.0.0 AskUserQuestion fallback-clause deletion. Goldens were
stale from that release; test was failing on origin/main before this
PR. Caught by the /ship test pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ngelog docs/designs/SYNC_GBRAIN_BATCH_INGEST.md: full design doc with the 8 decisions (D1-D8), source-verified gbrain behaviors (content_hash idempotency, frontmatter parity, path-authoritative slug, per-file failure surface), measured performance vs plan target, F9 hash migration one-time cliff note, and follow-up TODOs. CLAUDE.md: append `## GBrain Search Guidance` block from /sync-gbrain indicating this worktree's pin and how the agent should prefer gbrain search over Grep for semantic queries. TODOS.md: P2 `gbrain import` perf-on-large-staging-dirs investigation (5,131 files takes >10min in gbrain when 501 takes 10s — likely N+1 SQL or auto-link reconciliation). P3 cache-no-changes-since-last-import at the prepare-batch level for true no-op fast paths. VERSION + package.json: bump to 1.33.0.0 (queue-aware via bin/gstack-next-version — skipped v1.32.0.0 which is claimed by sibling worktree garrytan/wellington / PR #1431). CHANGELOG.md: v1.33.0.0 entry per the release-summary format. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per-file gitleaks scanning during memory ingest is now opt-in via --scan-secrets (or GSTACK_MEMORY_INGEST_SCAN_SECRETS=1). Update the user-facing reference doc so it stops claiming "every page passes through gitleaks." Also corrects the /gbrain-sync → /sync-gbrain command typo and the post-incident recovery section. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
# Conflicts: # CHANGELOG.md # VERSION # package.json
# Conflicts: # CHANGELOG.md # VERSION # bin/gstack-memory-ingest.ts # package.json # test/gstack-memory-ingest.test.ts
E2E Evals: ✅ PASS0/0 tests passed | $0 total cost | 12 parallel runners
12x ubicloud-standard-2 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite |
5 tasks
Willardgmoore
added a commit
to Willardgmoore/gstack
that referenced
this pull request
May 12, 2026
Brings in: - v1.33.2.0: setup guard against Conductor worktree pollution (garrytan#1446) - v1.33.1.0: learnings token-OR query + task-shaped retrieval (garrytan#1442) - v1.33.0.0: /sync-gbrain memory stage batch-import refactor (garrytan#1432) VERSION stays at 1.34.0.0 (no collision — queue-aware check confirms). CHANGELOG: our [1.34.0.0] entry at top, three new main entries below. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
/sync-gbrainmemory stage no longer infinite-loops or silently throws away progress. v1.33.0.0 rewrites memory ingest aroundgbrain import <dir>(batch path that's been in gbrain since v0.20), making the prepare phase ~60× faster, the signal handler actually kill the gbrain child + clean staging on SIGTERM, and per-file gitleaks scanning opt-in via--scan-secretsbecausegstack-brain-syncalready gates the real cross-machine secret boundary at git push.Architecture:
5 commits, bisectable:
refactor:batch-import architecture (D1-D8) + F6 atomic state + F9 full-file hashfeat:orchestrator OK/ERR verdict parsertest:batch-ingest writer regressions + refresh golden ship fixturesv1.33.0.0 docs:design doc, P2 perf TODOs, gbrain guidance block, changelogdocs:setup-gbrain/memory.md reflects opt-in per-file gitleaks (subagent)Test Coverage
test/gstack-memory-ingest.test.tsgrew from 17 → 21 tests (+5). New tests cover D1 hierarchical staging slug round-trip, frontmatter injection, D7 sync-failures exclusion, missing-import-subcommand error path, and--scan-secretsdirty-source skipping.Coverage audit ran via subagent (see PR description history). 22 known gaps cluster around: SIGTERM handler stress tests, parseImportJson malformed-JSON path (now fixed to return null instead of zero-padding), readNewFailures byte-offset slicing (only happy path tested), and orchestrator ERR-line parser. All 4 commits' test pass: 126 pass, 0 fail across 4 affected suites.
Pre-Landing Review
Adversarial review subagent found 14 issues. Fixed in commits before push:
finally→ staging dir leaks on SIGTERM. Fixed: handler now syncronously cleans the active staging dir beforeprocess.exit.require("fs")inside ESM module. Fixed: top-level imports.parseImportJsonsilently returns zeros on malformed JSON. Fixed: returnsnull, caller surfaces assystem_error.--scan-secretsgitleaks path was untested. Fixed: new test with fake gitleaks shim.Documented but not fixed in this PR (filed for follow-up):
detached: truespawn).bun bin/gstack-memory-ingest.tsinvocation bypasses orchestrator's lock.Plan Completion
17/21 plan items DONE, 1 CHANGED (D3 made opt-in post-review), 2 deferred (F8 isolated benchmark harness, 24-path unit coverage went integration-only). Full audit ran via subagent against
/Users/garrytan/.claude/plans/purrfect-tumbling-quiche.mdwhich captures the review chain: /investigate → /plan-eng-review → /codex review outside-voice → user perf review (D3 flip).Numbers
Documentation
setup-gbrain/memory.md— Rewrote the "What gets scanned for secrets" section to reflect v1.33.0.0's opt-in per-file gitleaks behavior. Names the actual cross-machine secret boundary (gstack-brain-syncgit push). Documents--scan-secretsflag +GSTACK_MEMORY_INGEST_SCAN_SECRETS=1env var.docs/designs/SYNC_GBRAIN_BATCH_INGEST.md— full design doc (new).CLAUDE.md— appends## GBrain Search Guidanceblock (from /sync-gbrain).TODOS.md— P2 gbrain-side import perf, P3 prepare-batch cache.CHANGELOG.md— release-summary format v1.33.0.0 entry.Audited and found current (no stale
gbrain put/ per-file references): README.md, ARCHITECTURE.md, CONTRIBUTING.md, USING_GBRAIN_WITH_GSTACK.md,sync-gbrain/SKILL.md,setup-gbrain/SKILL.md.Test plan
bun test test/gstack-memory-ingest.test.ts test/gstack-memory-helpers.test.ts test/skill-e2e-memory-pipeline.test.ts test/host-config.test.ts: 126 pass, 0 fail🤖 Generated with Claude Code
Need help on this PR? Tag
@codesmithwith what you need.