Skip to content

v1.33.0.0 feat: /sync-gbrain memory-stage batch-import refactor (D1-D8) + F6/F9 + signal cleanup#1432

Merged
garrytan merged 7 commits into
mainfrom
garrytan/dublin-v1
May 12, 2026
Merged

v1.33.0.0 feat: /sync-gbrain memory-stage batch-import refactor (D1-D8) + F6/F9 + signal cleanup#1432
garrytan merged 7 commits into
mainfrom
garrytan/dublin-v1

Conversation

@garrytan
Copy link
Copy Markdown
Owner

@garrytan garrytan commented May 11, 2026

Summary

/sync-gbrain memory stage no longer infinite-loops or silently throws away progress. v1.33.0.0 rewrites memory ingest around gbrain import <dir> (batch path that's been in gbrain since v0.20), making the prepare phase ~60× faster, the signal handler actually kill the gbrain child + clean staging on SIGTERM, and per-file gitleaks scanning opt-in via --scan-secrets because gstack-brain-sync already gates the real cross-machine secret boundary at git push.

Architecture:

walkAllSources
  -> preparePages (mtime-skip + optional gitleaks + parse + render frontmatter)
  -> writeStaged (hierarchical mkdir -p per slug, D1)
  -> snapshot ~/.gbrain/sync-failures.jsonl byte offset
  -> runGbrainImport (async spawn) -> parseImportJson
  -> readNewFailures (D7: byte-offset slice, map staged path -> source path)
  -> state.sessions[path] for files NOT in failed set
  -> saveStateAtomic (F6) + cleanupStagingDir

5 commits, bisectable:

  1. refactor: batch-import architecture (D1-D8) + F6 atomic state + F9 full-file hash
  2. feat: orchestrator OK/ERR verdict parser
  3. test: batch-ingest writer regressions + refresh golden ship fixtures
  4. v1.33.0.0 docs: design doc, P2 perf TODOs, gbrain guidance block, changelog
  5. docs: setup-gbrain/memory.md reflects opt-in per-file gitleaks (subagent)

Test Coverage

test/gstack-memory-ingest.test.ts grew from 17 → 21 tests (+5). New tests cover D1 hierarchical staging slug round-trip, frontmatter injection, D7 sync-failures exclusion, missing-import-subcommand error path, and --scan-secrets dirty-source skipping.

Coverage audit ran via subagent (see PR description history). 22 known gaps cluster around: SIGTERM handler stress tests, parseImportJson malformed-JSON path (now fixed to return null instead of zero-padding), readNewFailures byte-offset slicing (only happy path tested), and orchestrator ERR-line parser. All 4 commits' test pass: 126 pass, 0 fail across 4 affected suites.

Pre-Landing Review

Adversarial review subagent found 14 issues. Fixed in commits before push:

Documented but not fixed in this PR (filed for follow-up):

Plan Completion

17/21 plan items DONE, 1 CHANGED (D3 made opt-in post-review), 2 deferred (F8 isolated benchmark harness, 24-path unit coverage went integration-only). Full audit ran via subagent against /Users/garrytan/.claude/plans/purrfect-tumbling-quiche.md which captures the review chain: /investigate → /plan-eng-review → /codex review outside-voice → user perf review (D3 flip).

Numbers

Metric Before (v1.31.x) After (v1.33)
Cold run completes no, 35-min loop + null exit yes
Prepare phase time (5,135 files) ~10-12 min <10 sec
Per-file gitleaks scans 1,841 mandatory 0 by default, opt-in
State flushed on SIGTERM no yes (sync cleanup before exit)
Orphan gbrain after timeout yes (15hr CPU drain observed) no
FILE_TOO_LARGE blocks advance yes no, failed paths excluded via D7
Tests in memory-ingest.test.ts 17 21 (+5)

Documentation

  • setup-gbrain/memory.md — Rewrote the "What gets scanned for secrets" section to reflect v1.33.0.0's opt-in per-file gitleaks behavior. Names the actual cross-machine secret boundary (gstack-brain-sync git push). Documents --scan-secrets flag + GSTACK_MEMORY_INGEST_SCAN_SECRETS=1 env var.
  • docs/designs/SYNC_GBRAIN_BATCH_INGEST.md — full design doc (new).
  • CLAUDE.md — appends ## GBrain Search Guidance block (from /sync-gbrain).
  • TODOS.md — P2 gbrain-side import perf, P3 prepare-batch cache.
  • CHANGELOG.md — release-summary format v1.33.0.0 entry.

Audited and found current (no stale gbrain put / per-file references): README.md, ARCHITECTURE.md, CONTRIBUTING.md, USING_GBRAIN_WITH_GSTACK.md, sync-gbrain/SKILL.md, setup-gbrain/SKILL.md.

Test plan

  • bun test test/gstack-memory-ingest.test.ts test/gstack-memory-helpers.test.ts test/skill-e2e-memory-pipeline.test.ts test/host-config.test.ts: 126 pass, 0 fail
  • Architecture decisions D1-D8 in plan all addressed with file:line evidence
  • /sync-gbrain manual smoke on real 5,135-file corpus: prepare phase <10s (was ~10min), no infinite loop, no orphan gbrain child
  • Bun build green on bin/gstack-memory-ingest.ts and bin/gstack-gbrain-sync.ts
  • Cold-run end-to-end through gbrain import: blocked by separate P2 gbrain-side perf issue (>10min on 5,131 files in gbrain itself). Local prepare/staging side verified working.

🤖 Generated with Claude Code


View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

garrytan and others added 7 commits May 11, 2026 10:15
…ll-file hash

bin/gstack-memory-ingest.ts: rewrite memory ingest around `gbrain import <dir>`
batch path. Replaces per-file gbrainPutPage loop (~470s of subprocess startup
per cold run) with prepare-then-batch:

  walkAllSources
    -> preparePages: mtime-skip + optional gitleaks (--scan-secrets) + parse
    -> writeStaged: mkdir -p per slug segment, hierarchical (D1)
    -> snapshot ~/.gbrain/sync-failures.jsonl byte offset
    -> runGbrainImport (async spawn) -> parseImportJson
    -> readNewFailures: read appended bytes, map back to source paths (D7)
    -> state.sessions[path] = {...} for files NOT in failed set
    -> saveStateAtomic (F6) + cleanupStagingDir

Architecture decisions:
  D1 hierarchical staging dir
  D2 cut over, deleted gbrainPutPage entirely
  D3 source-file gitleaks made opt-in via --scan-secrets (gstack-brain-sync
     owns the cross-machine boundary; per-file scan was redundant ~470s tax)
  D4 OK/ERR verdict (no DEGRADED tri-state)
  D5 unified state schema (no separate skip-list)
  D6 trust gbrain content_hash idempotency (no skip_reason bookkeeping)
  D7 byte-offset snapshot of sync-failures.jsonl + per-source mapping
  F6 saveState uses tmp+rename atomic write
  F9 fileSha256 removes 1MB cap; full-file hash (no more silent tail-edit
     misses on long partial transcripts)

Signal handling: installSignalForwarder propagates SIGTERM/SIGINT to the
gbrain child process AND synchronously cleans the staging dir before
process.exit. Pre-fix, orchestrator timeouts left gbrain processes
orphaned holding the PGLite write lock (observed: 15-hour-CPU-time
orphan still alive a day later).

parseImportJson returns null on unparseable output (treated as ERR by
caller) instead of silently zeroing through.

gbrainAvailable() probes for the `import` subcommand instead of `put`.

Plan + review chain at /Users/garrytan/.claude/plans/purrfect-tumbling-quiche.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
gstack-gbrain-sync.ts: memory-stage parser now picks [memory-ingest] ERR
lines preferentially over the latest [memory-ingest] line, strips the
prefix and any leading 'ERR: ' for cleaner summary output, and surfaces
'(killed by signal / timeout)' when the child exits with status=null.

Matches D6's OK/ERR contract: per-file failures (FILE_TOO_LARGE etc.)
show in the summary count but only system-level failures (gbrain crash,
process kill, missing CLI) mark the stage ERR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
test/gstack-memory-ingest.test.ts: 5 new tests for the batch-import
architecture:
  1. D1 hierarchical staging slug round-trip — asserts staged file lives
     in transcripts/claude-code/<dir>/*.md, not flat at staging root
  2. Frontmatter injection — asserts title/type/tags written into the
     staged page's YAML block
  3. D7 sync-failures.jsonl exclusion — files listed as failed by
     gbrain do NOT get state-recorded; one of two test sessions lands,
     the other stays un-ingested for retry next run
  4. Missing-`import`-subcommand error path — when gbrain only advertises
     legacy `put`, memory-ingest exits 1 with [memory-ingest] ERR
  5. --scan-secrets opt-in path — verifies a dirty-source file is
     skipped via the secret-scan match when the flag is on, while a
     clean session in the same run still gets staged

Replaces the prior put-per-file shim with an import-batch shim. The
shim fails loudly (exit 99) if the new code ever regresses to per-file
`gbrain put` calls.

test/fixtures/golden/{claude,codex,factory}-ship-SKILL.md: refresh
golden baselines to match the current generated SKILL.md content after
the v1.31.0.0 AskUserQuestion fallback-clause deletion. Goldens were
stale from that release; test was failing on origin/main before this
PR. Caught by the /ship test pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ngelog

docs/designs/SYNC_GBRAIN_BATCH_INGEST.md: full design doc with the 8
decisions (D1-D8), source-verified gbrain behaviors (content_hash
idempotency, frontmatter parity, path-authoritative slug, per-file
failure surface), measured performance vs plan target, F9 hash
migration one-time cliff note, and follow-up TODOs.

CLAUDE.md: append `## GBrain Search Guidance` block from /sync-gbrain
indicating this worktree's pin and how the agent should prefer gbrain
search over Grep for semantic queries.

TODOS.md: P2 `gbrain import` perf-on-large-staging-dirs investigation
(5,131 files takes >10min in gbrain when 501 takes 10s — likely N+1
SQL or auto-link reconciliation). P3 cache-no-changes-since-last-import
at the prepare-batch level for true no-op fast paths.

VERSION + package.json: bump to 1.33.0.0 (queue-aware via
bin/gstack-next-version — skipped v1.32.0.0 which is claimed by
sibling worktree garrytan/wellington / PR #1431).

CHANGELOG.md: v1.33.0.0 entry per the release-summary format.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per-file gitleaks scanning during memory ingest is now opt-in via
--scan-secrets (or GSTACK_MEMORY_INGEST_SCAN_SECRETS=1). Update the
user-facing reference doc so it stops claiming "every page passes
through gitleaks." Also corrects the /gbrain-sync → /sync-gbrain
command typo and the post-incident recovery section.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
# Conflicts:
#	CHANGELOG.md
#	VERSION
#	package.json
# Conflicts:
#	CHANGELOG.md
#	VERSION
#	bin/gstack-memory-ingest.ts
#	package.json
#	test/gstack-memory-ingest.test.ts
@github-actions
Copy link
Copy Markdown

E2E Evals: ✅ PASS

0/0 tests passed | $0 total cost | 12 parallel runners

Suite Result Status Cost

12x ubicloud-standard-2 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite

@garrytan garrytan merged commit d21ba06 into main May 12, 2026
23 checks passed
Willardgmoore added a commit to Willardgmoore/gstack that referenced this pull request May 12, 2026
Brings in:
- v1.33.2.0: setup guard against Conductor worktree pollution (garrytan#1446)
- v1.33.1.0: learnings token-OR query + task-shaped retrieval (garrytan#1442)
- v1.33.0.0: /sync-gbrain memory stage batch-import refactor (garrytan#1432)

VERSION stays at 1.34.0.0 (no collision — queue-aware check confirms).
CHANGELOG: our [1.34.0.0] entry at top, three new main entries below.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant