Skip to content

feat(memory): zero-spend ingest of Claude Code auto-memory into observations#2829

Open
surfingdoggo wants to merge 4 commits into
thedotmack:mainfrom
surfingdoggo:pr/memory-ingest
Open

feat(memory): zero-spend ingest of Claude Code auto-memory into observations#2829
surfingdoggo wants to merge 4 commits into
thedotmack:mainfrom
surfingdoggo:pr/memory-ingest

Conversation

@surfingdoggo

Copy link
Copy Markdown

What

Adds claude-mem memory ingest, which imports Claude Code's own "auto-memory"
markdown (~/.claude/projects/<encoded-cwd>/memory/*.md) directly into the
memory database as observations.

Why

Auto-memory is already distilled prose, the same kind of artifact the
observation generator produces. So this stores each topic file directly through
the existing observation seam (content-hash dedup + Chroma sync) and does not
run the Haiku generation pipeline
. Re-generating over already-distilled prose
would be lossy and pay for negative value. The MEMORY.md link index is skipped.

This is the mechanical, zero-model-spend sibling to the transcript backfill
(#2690): transcripts need generation, distilled memory does not.

How

  • src/services/memory/ingest.ts: enumerate + parse (hand-rolled frontmatter
    parser, no YAML dep) + content-hash store orchestrator. Dry-run is DB-free and
    worker-free.
  • src/services/memory/cli.ts + npx wiring: memory ingest [--source <dir> | --all] [--dry-run] [--require-cwd]. Dry-run runs client-side; the real store
    drives the worker over HTTP.
  • src/services/worker/http/routes/MemoryIngestRoutes.ts: POST /api/memory/ingest.
  • Tests: frontmatter parsing, scan, mapping, ingest orchestrator (13 cases).
  • Docs: usage/memory-ingest.mdx + nav entry.

All changes are additive; no existing behavior changes.

Testing

  • bun test tests/memory/memory-ingest.test.ts -> 13 pass
  • tsc --noEmit clean

We have been running this in our own claude-mem deployment (worker/SQLite) since
Thursday 4 June. We used it to import an existing project's accumulated memory
files, which made the transition painless: we did not have to start memory from
scratch. It was quick and easy.

Note: main currently has pre-existing unrelated red CI.

SurfingDoggo and others added 4 commits June 7, 2026 22:51
Sibling to the thedotmack#2690 transcript backfill: import Claude Code auto-memory
(~/.claude/projects/<encoded>/memory/*.md) into the observation store.

Memory is ALREADY distilled, so this is mechanical store-direct (no Haiku).
Each topic file body becomes one observation (narrative=body), reusing the
existing storeObservation seam with content_hash dedup at store time. The
MEMORY.md link-index is skipped. Project is resolved from a sibling
transcript cwd so recall merges with live capture, and orphaned memory (no
transcript) is flagged rather than dropped, which is the whole point of
memory-direct ingest. Provenance rides in subtitle + concepts because
storeObservation has no metadata column. Each obs is backdated to the file
mtime. The store orchestrator takes injected deps so it stays unit-testable
without a worker.

Validated dry-run on observability (3 files, index skipped) and the --all
sweep (56 dirs, 387 files, 21 orphaned). Prolific ticket c470ccd2.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Wires the memory ingest store path into the worker and CLI:
- MemoryIngestRoutes: POST /api/memory/ingest loops files server-side through
  getOrCreateManualSession + storeObservation (backdated via mtime) + Chroma
  sync, with a content_hash pre-check so stored-vs-deduped is reported
  accurately. Provenance persists in the metadata column (SessionStore variant
  writes it) plus lightweight concepts tags.
- memory/cli.ts runMemoryCommand: dry-run client-side, real ingest over HTTP.
  Default source = current repo memory dir, plus --source / --all / --require-cwd.
- Dispatch wired in both worker-service.ts (case memory + route reg) and
  npx-cli (case memory -> spawnBunWorkerCommand), mirroring transcript ingest.

Typecheck green; dry-run verified through the worker-service CLI dispatch.
…strator

13 tests: frontmatter parse (incl. no-block + unterminated), title derivation,
cwd encoding, temp-fixture scan (index skip + cwd/project resolution + dry-run
counts), buildMemoryObservation mapping (body->narrative, backdate, provenance),
and ingestMemorySource with fake deps (empty-body skip, dedup reporting).
Covers the zero-spend memory-file ingest path (memory ingest --source/--all
--dry-run --require-cwd), how it differs from transcript backfill (stores
already-distilled prose directly, no generation), frontmatter mapping, and
content-hash idempotency.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@greptile-apps

greptile-apps Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Adds claude-mem memory ingest, a zero-model-spend command that imports Claude Code's auto-memory markdown files directly into the observation database, bypassing the Haiku generation pipeline since the files are already distilled prose. The implementation is fully additive and follows the existing transcript-ingest pattern (CLI → worker HTTP → SQLite store with content-hash dedup).

  • src/services/memory/ingest.ts: hand-rolled frontmatter parser, directory scanner, dry-run reporter, and async ingest orchestrator with per-file try/catch around store calls — but readMemoryDir lacks error handling around statSync/readFileSync, so a single unreadable file aborts an entire --all run.
  • src/services/worker/http/routes/MemoryIngestRoutes.ts: new POST /api/memory/ingest route that accepts an arbitrary source filesystem path from the request body without validating it stays within ~/.claude/projects/.
  • Tests (13 cases) cover the happy path well; error-path coverage for unreadable files and scan exceptions is missing.

Confidence Score: 3/5

Safe to merge for single-project use; the --all sweep has a latent abort-on-unreadable-file problem that will surface in the wild.

The per-file store path defensively wraps each storeMemoryObservation call, but the scan/parse phase in readMemoryDir has no equivalent guard: a broken symlink, permission error, or mid-scan deletion throws out of scanMemorySource and aborts the entire --all run with a 500. Users who accumulated many projects (the primary --all audience) are the most likely to hit this. The rest of the change is clean additive work that mirrors established patterns correctly.

src/services/memory/ingest.ts (readMemoryDir file loop) and src/services/worker/http/routes/MemoryIngestRoutes.ts (source path handling and error classification).

Important Files Changed

Filename Overview
src/services/memory/ingest.ts Core orchestrator: enumeration, frontmatter parsing, dry-run, and real ingest. The per-file store path is resilient (try/catch), but readMemoryDir has no error handling around statSync/readFileSync, so one unreadable file aborts the entire --all scan.
src/services/worker/http/routes/MemoryIngestRoutes.ts Worker HTTP route for POST /api/memory/ingest; accepts an arbitrary source path from the request body without path-boundary validation and surfaces scan errors as 500 instead of 400.
src/services/memory/cli.ts CLI entrypoint for memory ingest; correctly separates dry-run (client-side) from real ingest (worker HTTP), source resolution is clean, and error propagation is handled.
src/npx-cli/index.ts Adds memory case to the npx CLI dispatcher; straightforward arg slicing, correct fallthrough to error for unknown subcommands.
src/npx-cli/commands/runtime.ts Adds runMemoryIngestCommand wrapping spawnBunWorkerCommand; mirrors existing transcript ingest pattern exactly.
src/services/worker-service.ts Registers MemoryIngestRoutes and adds a memory case to the CLI dispatcher in main(); additive, consistent with existing patterns.
tests/memory/memory-ingest.test.ts 13 tests covering frontmatter parsing, title derival, scan, dry-run, observation mapping, and ingest orchestration. Does not cover error paths (unreadable files, scan exceptions) that correspond to the P1 gap.
docs/public/usage/memory-ingest.mdx New usage documentation; accurate, well-structured, and matches the implementation flags and behavior.

Sequence Diagram

sequenceDiagram
    participant User as User (npx)
    participant CLI as npx-cli/index.ts
    participant MemCLI as memory/cli.ts
    participant Ingest as memory/ingest.ts
    participant Route as MemoryIngestRoutes
    participant DB as SQLite (SessionStore)
    participant Chroma as ChromaSync

    User->>CLI: claude-mem memory ingest [flags]
    CLI->>MemCLI: runMemoryCommand(ingest, args)

    alt dry-run
        MemCLI->>Ingest: dryRunMemorySource(source, all)
        Ingest->>Ingest: scanMemorySource
        Ingest-->>MemCLI: MemoryDryRunReport
        MemCLI-->>User: formatted report (no DB, no worker)
    else real ingest
        MemCLI->>Route: POST /api/memory/ingest
        Route->>Ingest: ingestMemorySource(source, opts, deps)
        Ingest->>Ingest: scanMemorySource / readMemoryDir
        loop each memory file
            Ingest->>DB: SELECT content_hash (dedup check)
            alt not deduped
                Ingest->>DB: storeObservation (backdated to mtime)
                Ingest->>Chroma: syncObservation (fire-and-forget)
            end
        end
        Ingest-->>Route: MemoryIngestReport
        Route-->>MemCLI: JSON report
        MemCLI-->>User: per-file log + summary line
    end
Loading
Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
src/services/memory/ingest.ts:511-531
**Unhandled errors in readMemoryDir abort the entire `--all` ingest**

`statSync` and `readFileSync` inside the file loop are not wrapped in a `try/catch`. A single unreadable file (broken symlink, permission error, file deleted between `readdirSync` and `stat`) will throw out of `readMemoryDir` and propagate through `scanMemorySource` all the way to `ingestMemorySource`, which does not guard around the `scanMemorySource` call. In the `--all` case this means one bad file in any project aborts the entire multi-project run rather than skipping the problematic entry.

The per-file `storeMemoryObservation` call (line 813) is already wrapped in `try/catch` to catch store failures, so the intent to be resilient per-file is clear — the same pattern needs to reach the parse/scan phase. The same gap exists in `scanMemorySource` (line 570): `readMemoryDir` is called without a guard, so one bad project dir also aborts an `--all` sweep.

### Issue 2 of 3
src/services/worker/http/routes/MemoryIngestRoutes.ts:80-89
**Unrestricted filesystem path accepted as `source`**

`handleIngest` accepts any filesystem path as `source` without validating that it falls within `~/.claude/projects/`. Because the worker has no per-route auth (the `requireLocalhost` middleware defined in `middleware.ts` is not wired here, consistent with other routes), any local process can POST `{ source: "/home/user/sensitive-docs", all: false }` and have arbitrary `.md` files imported into the memory database. The CLI passes only well-formed paths, but nothing prevents a rogue local process from making this call directly. Adding a check that `source` resolves under `claudeProjectsDir()` (or is explicitly opt-in via `all: true`) would close this.

This is consistent with how other routes behave (no pre-existing per-route path guard), so it is not a regression — but the `source` parameter is a new arbitrary-path entry point that other routes don't expose.

### Issue 3 of 3
src/services/worker/http/routes/MemoryIngestRoutes.ts:84-92
**User-input path errors surface as HTTP 500, not 400**

If `source` points to a non-existent or non-directory path, `scanMemorySource` throws a descriptive `Error` (e.g. `"memory ingest source not found: …"`). That error propagates through `ingestMemorySource` and is caught by `wrapHandler`, which calls `handleError` and returns a 500. The CLI then prints `Memory ingest failed: HTTP 500 …`, which is accurate but implies a server fault. A try/catch around `ingestMemorySource` that distinguishes user-input errors (source not found, not a directory) from internal failures would let the route return a 400 for those cases.

Reviews (1): Last reviewed commit: "docs(memory): document memory ingest CLI..." | Re-trigger Greptile

Comment on lines +511 to +531
all: !!options.all,
dirs: refs.length,
found: 0,
stored: 0,
deduped: 0,
skipped: 0,
failed: 0,
cwdUnresolvedDirs: refs.filter(r => !r.cwd).length,
files: [],
};

for (const ref of refs) {
for (const file of ref.files) {
report.found++;
const result: MemoryFileIngestResult = { project: ref.project, file: file.fileName, status: 'stored' };

if (options.requireCwd && !ref.cwd) {
result.status = 'skipped';
result.reason = 'cwd-unresolved';
report.skipped++;
report.files.push(result);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Unhandled errors in readMemoryDir abort the entire --all ingest

statSync and readFileSync inside the file loop are not wrapped in a try/catch. A single unreadable file (broken symlink, permission error, file deleted between readdirSync and stat) will throw out of readMemoryDir and propagate through scanMemorySource all the way to ingestMemorySource, which does not guard around the scanMemorySource call. In the --all case this means one bad file in any project aborts the entire multi-project run rather than skipping the problematic entry.

The per-file storeMemoryObservation call (line 813) is already wrapped in try/catch to catch store failures, so the intent to be resilient per-file is clear — the same pattern needs to reach the parse/scan phase. The same gap exists in scanMemorySource (line 570): readMemoryDir is called without a guard, so one bad project dir also aborts an --all sweep.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/services/memory/ingest.ts
Line: 511-531

Comment:
**Unhandled errors in readMemoryDir abort the entire `--all` ingest**

`statSync` and `readFileSync` inside the file loop are not wrapped in a `try/catch`. A single unreadable file (broken symlink, permission error, file deleted between `readdirSync` and `stat`) will throw out of `readMemoryDir` and propagate through `scanMemorySource` all the way to `ingestMemorySource`, which does not guard around the `scanMemorySource` call. In the `--all` case this means one bad file in any project aborts the entire multi-project run rather than skipping the problematic entry.

The per-file `storeMemoryObservation` call (line 813) is already wrapped in `try/catch` to catch store failures, so the intent to be resilient per-file is clear — the same pattern needs to reach the parse/scan phase. The same gap exists in `scanMemorySource` (line 570): `readMemoryDir` is called without a guard, so one bad project dir also aborts an `--all` sweep.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +80 to +89
private async handleIngest(req: Request, res: Response): Promise<void> {
const body = (req.body ?? {}) as { source?: unknown; all?: unknown; requireCwd?: unknown };
const source = typeof body.source === 'string' ? body.source.trim() : '';
const all = body.all === true;
if (!source && !all) {
this.badRequest(res, 'source is required (or pass all=true)');
return;
}
const requireCwd = body.requireCwd === true;
const effectiveSource = source || claudeProjectsDir();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unrestricted filesystem path accepted as source

handleIngest accepts any filesystem path as source without validating that it falls within ~/.claude/projects/. Because the worker has no per-route auth (the requireLocalhost middleware defined in middleware.ts is not wired here, consistent with other routes), any local process can POST { source: "/home/user/sensitive-docs", all: false } and have arbitrary .md files imported into the memory database. The CLI passes only well-formed paths, but nothing prevents a rogue local process from making this call directly. Adding a check that source resolves under claudeProjectsDir() (or is explicitly opt-in via all: true) would close this.

This is consistent with how other routes behave (no pre-existing per-route path guard), so it is not a regression — but the source parameter is a new arbitrary-path entry point that other routes don't expose.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/services/worker/http/routes/MemoryIngestRoutes.ts
Line: 80-89

Comment:
**Unrestricted filesystem path accepted as `source`**

`handleIngest` accepts any filesystem path as `source` without validating that it falls within `~/.claude/projects/`. Because the worker has no per-route auth (the `requireLocalhost` middleware defined in `middleware.ts` is not wired here, consistent with other routes), any local process can POST `{ source: "/home/user/sensitive-docs", all: false }` and have arbitrary `.md` files imported into the memory database. The CLI passes only well-formed paths, but nothing prevents a rogue local process from making this call directly. Adding a check that `source` resolves under `claudeProjectsDir()` (or is explicitly opt-in via `all: true`) would close this.

This is consistent with how other routes behave (no pre-existing per-route path guard), so it is not a regression — but the `source` parameter is a new arbitrary-path entry point that other routes don't expose.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +84 to +92
if (!source && !all) {
this.badRequest(res, 'source is required (or pass all=true)');
return;
}
const requireCwd = body.requireCwd === true;
const effectiveSource = source || claudeProjectsDir();

logger.info('INGEST', 'Memory ingest starting', { source: effectiveSource, all, requireCwd });
const report = await ingestMemorySource(effectiveSource, { all, requireCwd }, this.buildDeps());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 User-input path errors surface as HTTP 500, not 400

If source points to a non-existent or non-directory path, scanMemorySource throws a descriptive Error (e.g. "memory ingest source not found: …"). That error propagates through ingestMemorySource and is caught by wrapHandler, which calls handleError and returns a 500. The CLI then prints Memory ingest failed: HTTP 500 …, which is accurate but implies a server fault. A try/catch around ingestMemorySource that distinguishes user-input errors (source not found, not a directory) from internal failures would let the route return a 400 for those cases.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/services/worker/http/routes/MemoryIngestRoutes.ts
Line: 84-92

Comment:
**User-input path errors surface as HTTP 500, not 400**

If `source` points to a non-existent or non-directory path, `scanMemorySource` throws a descriptive `Error` (e.g. `"memory ingest source not found: …"`). That error propagates through `ingestMemorySource` and is caught by `wrapHandler`, which calls `handleError` and returns a 500. The CLI then prints `Memory ingest failed: HTTP 500 …`, which is accurate but implies a server fault. A try/catch around `ingestMemorySource` that distinguishes user-input errors (source not found, not a directory) from internal failures would let the route return a 400 for those cases.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant