feat(memory): zero-spend ingest of Claude Code auto-memory into observations#2829
feat(memory): zero-spend ingest of Claude Code auto-memory into observations#2829surfingdoggo wants to merge 4 commits into
Conversation
Sibling to the thedotmack#2690 transcript backfill: import Claude Code auto-memory (~/.claude/projects/<encoded>/memory/*.md) into the observation store. Memory is ALREADY distilled, so this is mechanical store-direct (no Haiku). Each topic file body becomes one observation (narrative=body), reusing the existing storeObservation seam with content_hash dedup at store time. The MEMORY.md link-index is skipped. Project is resolved from a sibling transcript cwd so recall merges with live capture, and orphaned memory (no transcript) is flagged rather than dropped, which is the whole point of memory-direct ingest. Provenance rides in subtitle + concepts because storeObservation has no metadata column. Each obs is backdated to the file mtime. The store orchestrator takes injected deps so it stays unit-testable without a worker. Validated dry-run on observability (3 files, index skipped) and the --all sweep (56 dirs, 387 files, 21 orphaned). Prolific ticket c470ccd2. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Wires the memory ingest store path into the worker and CLI: - MemoryIngestRoutes: POST /api/memory/ingest loops files server-side through getOrCreateManualSession + storeObservation (backdated via mtime) + Chroma sync, with a content_hash pre-check so stored-vs-deduped is reported accurately. Provenance persists in the metadata column (SessionStore variant writes it) plus lightweight concepts tags. - memory/cli.ts runMemoryCommand: dry-run client-side, real ingest over HTTP. Default source = current repo memory dir, plus --source / --all / --require-cwd. - Dispatch wired in both worker-service.ts (case memory + route reg) and npx-cli (case memory -> spawnBunWorkerCommand), mirroring transcript ingest. Typecheck green; dry-run verified through the worker-service CLI dispatch.
…strator 13 tests: frontmatter parse (incl. no-block + unterminated), title derivation, cwd encoding, temp-fixture scan (index skip + cwd/project resolution + dry-run counts), buildMemoryObservation mapping (body->narrative, backdate, provenance), and ingestMemorySource with fake deps (empty-body skip, dedup reporting).
Covers the zero-spend memory-file ingest path (memory ingest --source/--all --dry-run --require-cwd), how it differs from transcript backfill (stores already-distilled prose directly, no generation), frontmatter mapping, and content-hash idempotency. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Greptile SummaryAdds
Confidence Score: 3/5Safe to merge for single-project use; the --all sweep has a latent abort-on-unreadable-file problem that will surface in the wild. The per-file store path defensively wraps each storeMemoryObservation call, but the scan/parse phase in readMemoryDir has no equivalent guard: a broken symlink, permission error, or mid-scan deletion throws out of scanMemorySource and aborts the entire --all run with a 500. Users who accumulated many projects (the primary --all audience) are the most likely to hit this. The rest of the change is clean additive work that mirrors established patterns correctly. src/services/memory/ingest.ts (readMemoryDir file loop) and src/services/worker/http/routes/MemoryIngestRoutes.ts (source path handling and error classification). Important Files Changed
Sequence DiagramsequenceDiagram
participant User as User (npx)
participant CLI as npx-cli/index.ts
participant MemCLI as memory/cli.ts
participant Ingest as memory/ingest.ts
participant Route as MemoryIngestRoutes
participant DB as SQLite (SessionStore)
participant Chroma as ChromaSync
User->>CLI: claude-mem memory ingest [flags]
CLI->>MemCLI: runMemoryCommand(ingest, args)
alt dry-run
MemCLI->>Ingest: dryRunMemorySource(source, all)
Ingest->>Ingest: scanMemorySource
Ingest-->>MemCLI: MemoryDryRunReport
MemCLI-->>User: formatted report (no DB, no worker)
else real ingest
MemCLI->>Route: POST /api/memory/ingest
Route->>Ingest: ingestMemorySource(source, opts, deps)
Ingest->>Ingest: scanMemorySource / readMemoryDir
loop each memory file
Ingest->>DB: SELECT content_hash (dedup check)
alt not deduped
Ingest->>DB: storeObservation (backdated to mtime)
Ingest->>Chroma: syncObservation (fire-and-forget)
end
end
Ingest-->>Route: MemoryIngestReport
Route-->>MemCLI: JSON report
MemCLI-->>User: per-file log + summary line
end
Prompt To Fix All With AIFix the following 3 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 3
src/services/memory/ingest.ts:511-531
**Unhandled errors in readMemoryDir abort the entire `--all` ingest**
`statSync` and `readFileSync` inside the file loop are not wrapped in a `try/catch`. A single unreadable file (broken symlink, permission error, file deleted between `readdirSync` and `stat`) will throw out of `readMemoryDir` and propagate through `scanMemorySource` all the way to `ingestMemorySource`, which does not guard around the `scanMemorySource` call. In the `--all` case this means one bad file in any project aborts the entire multi-project run rather than skipping the problematic entry.
The per-file `storeMemoryObservation` call (line 813) is already wrapped in `try/catch` to catch store failures, so the intent to be resilient per-file is clear — the same pattern needs to reach the parse/scan phase. The same gap exists in `scanMemorySource` (line 570): `readMemoryDir` is called without a guard, so one bad project dir also aborts an `--all` sweep.
### Issue 2 of 3
src/services/worker/http/routes/MemoryIngestRoutes.ts:80-89
**Unrestricted filesystem path accepted as `source`**
`handleIngest` accepts any filesystem path as `source` without validating that it falls within `~/.claude/projects/`. Because the worker has no per-route auth (the `requireLocalhost` middleware defined in `middleware.ts` is not wired here, consistent with other routes), any local process can POST `{ source: "/home/user/sensitive-docs", all: false }` and have arbitrary `.md` files imported into the memory database. The CLI passes only well-formed paths, but nothing prevents a rogue local process from making this call directly. Adding a check that `source` resolves under `claudeProjectsDir()` (or is explicitly opt-in via `all: true`) would close this.
This is consistent with how other routes behave (no pre-existing per-route path guard), so it is not a regression — but the `source` parameter is a new arbitrary-path entry point that other routes don't expose.
### Issue 3 of 3
src/services/worker/http/routes/MemoryIngestRoutes.ts:84-92
**User-input path errors surface as HTTP 500, not 400**
If `source` points to a non-existent or non-directory path, `scanMemorySource` throws a descriptive `Error` (e.g. `"memory ingest source not found: …"`). That error propagates through `ingestMemorySource` and is caught by `wrapHandler`, which calls `handleError` and returns a 500. The CLI then prints `Memory ingest failed: HTTP 500 …`, which is accurate but implies a server fault. A try/catch around `ingestMemorySource` that distinguishes user-input errors (source not found, not a directory) from internal failures would let the route return a 400 for those cases.
Reviews (1): Last reviewed commit: "docs(memory): document memory ingest CLI..." | Re-trigger Greptile |
| all: !!options.all, | ||
| dirs: refs.length, | ||
| found: 0, | ||
| stored: 0, | ||
| deduped: 0, | ||
| skipped: 0, | ||
| failed: 0, | ||
| cwdUnresolvedDirs: refs.filter(r => !r.cwd).length, | ||
| files: [], | ||
| }; | ||
|
|
||
| for (const ref of refs) { | ||
| for (const file of ref.files) { | ||
| report.found++; | ||
| const result: MemoryFileIngestResult = { project: ref.project, file: file.fileName, status: 'stored' }; | ||
|
|
||
| if (options.requireCwd && !ref.cwd) { | ||
| result.status = 'skipped'; | ||
| result.reason = 'cwd-unresolved'; | ||
| report.skipped++; | ||
| report.files.push(result); |
There was a problem hiding this comment.
Unhandled errors in readMemoryDir abort the entire
--all ingest
statSync and readFileSync inside the file loop are not wrapped in a try/catch. A single unreadable file (broken symlink, permission error, file deleted between readdirSync and stat) will throw out of readMemoryDir and propagate through scanMemorySource all the way to ingestMemorySource, which does not guard around the scanMemorySource call. In the --all case this means one bad file in any project aborts the entire multi-project run rather than skipping the problematic entry.
The per-file storeMemoryObservation call (line 813) is already wrapped in try/catch to catch store failures, so the intent to be resilient per-file is clear — the same pattern needs to reach the parse/scan phase. The same gap exists in scanMemorySource (line 570): readMemoryDir is called without a guard, so one bad project dir also aborts an --all sweep.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/services/memory/ingest.ts
Line: 511-531
Comment:
**Unhandled errors in readMemoryDir abort the entire `--all` ingest**
`statSync` and `readFileSync` inside the file loop are not wrapped in a `try/catch`. A single unreadable file (broken symlink, permission error, file deleted between `readdirSync` and `stat`) will throw out of `readMemoryDir` and propagate through `scanMemorySource` all the way to `ingestMemorySource`, which does not guard around the `scanMemorySource` call. In the `--all` case this means one bad file in any project aborts the entire multi-project run rather than skipping the problematic entry.
The per-file `storeMemoryObservation` call (line 813) is already wrapped in `try/catch` to catch store failures, so the intent to be resilient per-file is clear — the same pattern needs to reach the parse/scan phase. The same gap exists in `scanMemorySource` (line 570): `readMemoryDir` is called without a guard, so one bad project dir also aborts an `--all` sweep.
How can I resolve this? If you propose a fix, please make it concise.| private async handleIngest(req: Request, res: Response): Promise<void> { | ||
| const body = (req.body ?? {}) as { source?: unknown; all?: unknown; requireCwd?: unknown }; | ||
| const source = typeof body.source === 'string' ? body.source.trim() : ''; | ||
| const all = body.all === true; | ||
| if (!source && !all) { | ||
| this.badRequest(res, 'source is required (or pass all=true)'); | ||
| return; | ||
| } | ||
| const requireCwd = body.requireCwd === true; | ||
| const effectiveSource = source || claudeProjectsDir(); |
There was a problem hiding this comment.
Unrestricted filesystem path accepted as
source
handleIngest accepts any filesystem path as source without validating that it falls within ~/.claude/projects/. Because the worker has no per-route auth (the requireLocalhost middleware defined in middleware.ts is not wired here, consistent with other routes), any local process can POST { source: "/home/user/sensitive-docs", all: false } and have arbitrary .md files imported into the memory database. The CLI passes only well-formed paths, but nothing prevents a rogue local process from making this call directly. Adding a check that source resolves under claudeProjectsDir() (or is explicitly opt-in via all: true) would close this.
This is consistent with how other routes behave (no pre-existing per-route path guard), so it is not a regression — but the source parameter is a new arbitrary-path entry point that other routes don't expose.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/services/worker/http/routes/MemoryIngestRoutes.ts
Line: 80-89
Comment:
**Unrestricted filesystem path accepted as `source`**
`handleIngest` accepts any filesystem path as `source` without validating that it falls within `~/.claude/projects/`. Because the worker has no per-route auth (the `requireLocalhost` middleware defined in `middleware.ts` is not wired here, consistent with other routes), any local process can POST `{ source: "/home/user/sensitive-docs", all: false }` and have arbitrary `.md` files imported into the memory database. The CLI passes only well-formed paths, but nothing prevents a rogue local process from making this call directly. Adding a check that `source` resolves under `claudeProjectsDir()` (or is explicitly opt-in via `all: true`) would close this.
This is consistent with how other routes behave (no pre-existing per-route path guard), so it is not a regression — but the `source` parameter is a new arbitrary-path entry point that other routes don't expose.
How can I resolve this? If you propose a fix, please make it concise.| if (!source && !all) { | ||
| this.badRequest(res, 'source is required (or pass all=true)'); | ||
| return; | ||
| } | ||
| const requireCwd = body.requireCwd === true; | ||
| const effectiveSource = source || claudeProjectsDir(); | ||
|
|
||
| logger.info('INGEST', 'Memory ingest starting', { source: effectiveSource, all, requireCwd }); | ||
| const report = await ingestMemorySource(effectiveSource, { all, requireCwd }, this.buildDeps()); |
There was a problem hiding this comment.
User-input path errors surface as HTTP 500, not 400
If source points to a non-existent or non-directory path, scanMemorySource throws a descriptive Error (e.g. "memory ingest source not found: …"). That error propagates through ingestMemorySource and is caught by wrapHandler, which calls handleError and returns a 500. The CLI then prints Memory ingest failed: HTTP 500 …, which is accurate but implies a server fault. A try/catch around ingestMemorySource that distinguishes user-input errors (source not found, not a directory) from internal failures would let the route return a 400 for those cases.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/services/worker/http/routes/MemoryIngestRoutes.ts
Line: 84-92
Comment:
**User-input path errors surface as HTTP 500, not 400**
If `source` points to a non-existent or non-directory path, `scanMemorySource` throws a descriptive `Error` (e.g. `"memory ingest source not found: …"`). That error propagates through `ingestMemorySource` and is caught by `wrapHandler`, which calls `handleError` and returns a 500. The CLI then prints `Memory ingest failed: HTTP 500 …`, which is accurate but implies a server fault. A try/catch around `ingestMemorySource` that distinguishes user-input errors (source not found, not a directory) from internal failures would let the route return a 400 for those cases.
How can I resolve this? If you propose a fix, please make it concise.
What
Adds
claude-mem memory ingest, which imports Claude Code's own "auto-memory"markdown (
~/.claude/projects/<encoded-cwd>/memory/*.md) directly into thememory database as observations.
Why
Auto-memory is already distilled prose, the same kind of artifact the
observation generator produces. So this stores each topic file directly through
the existing observation seam (content-hash dedup + Chroma sync) and does not
run the Haiku generation pipeline. Re-generating over already-distilled prose
would be lossy and pay for negative value. The
MEMORY.mdlink index is skipped.This is the mechanical, zero-model-spend sibling to the transcript backfill
(#2690): transcripts need generation, distilled memory does not.
How
src/services/memory/ingest.ts: enumerate + parse (hand-rolled frontmatterparser, no YAML dep) + content-hash store orchestrator. Dry-run is DB-free and
worker-free.
src/services/memory/cli.ts+ npx wiring:memory ingest [--source <dir> | --all] [--dry-run] [--require-cwd]. Dry-run runs client-side; the real storedrives the worker over HTTP.
src/services/worker/http/routes/MemoryIngestRoutes.ts:POST /api/memory/ingest.usage/memory-ingest.mdx+ nav entry.All changes are additive; no existing behavior changes.
Testing
bun test tests/memory/memory-ingest.test.ts-> 13 passtsc --noEmitcleanWe have been running this in our own claude-mem deployment (worker/SQLite) since
Thursday 4 June. We used it to import an existing project's accumulated memory
files, which made the transition painless: we did not have to start memory from
scratch. It was quick and easy.
Note:
maincurrently has pre-existing unrelated red CI.