Skip to content

fix(corpus): camelCase params + obs_type key routing; clarify MCP tool descriptions#2728

Closed
Kirchlive wants to merge 2 commits into
thedotmack:mainfrom
Kirchlive:sprint-3-mcp-fixes
Closed

fix(corpus): camelCase params + obs_type key routing; clarify MCP tool descriptions#2728
Kirchlive wants to merge 2 commits into
thedotmack:mainfrom
Kirchlive:sprint-3-mcp-fixes

Conversation

@Kirchlive

Copy link
Copy Markdown

Summary

Two independent fixes in the MCP corpus pipeline plus 14 tool-description rewrites, all uncovered by a systematic smoke-test of all 21 MCP tools on a v13.3.0 install (worker runtime).

Fixes (commit 1)

Defect 1 — camelCase dateStart/dateEnd silently dropped

  • MCP tool surface (src/servers/mcp-server.ts) advertises camelCase.
  • Worker REST route (src/services/worker/http/routes/CorpusRoutes.ts) destructures only snake_case (date_start/date_end) from req.body.
  • Zod's .passthrough() keeps the unknown camelCase keys on the body but the handler never reads them — date filters are silently dropped, no error.
  • Fix: declare both casings on the Zod schema, prefer snake_case at destructure with a camelCase fallback.

Defect 2 — multi-type filter collapses to single type

  • src/services/worker/knowledge/CorpusBuilder.ts:42 set searchArgs.type = filter.types.join(',').
  • But type is the search-router discriminator (observations | sessions | prompts), not the observation-type filter.
  • WHERE type = 'bugfix,decision' matches zero rows; downstream array-hydrate masks the failure and returns only entries that survived a different filter pass — hence the user-visible "only the first type came back" symptom.
  • Fix: route the comma-joined types through searchArgs.obs_type (the correct observation-type filter key).

Docs (commit 2)

14 MCP tool-description rewrites to prevent first-use misuse:

  • All 8 server-beta-only tools prefixed with [Server-beta runtime only — DISABLED in default "worker" runtime.] plus a one-line pointer to the worker-runtime equivalent. The transport already returns this error at call-time; surfacing it in the description lets the LLM pick the right tool the first time.
  • search.obs_type now warns about the FTS5 type-token trap: query="bugfix" + obs_type="bugfix" returns 0 because the FTS5 index covers title/subtitle/narrative/text/facts/concepts but not the type column. Use one or the other for type-token queries.
  • prime_corpus / query_corpus / rebuild_corpus / reprime_corpus preconditions strengthened — explicit that query_corpus errors when unprimed, that rebuild does not reprime, and that responses are LLM-generative.
  • build_corpus description lists canonical types and emphasizes verifying stats.observation_count before priming.

Test plan

  • Reproduced both bugs against unpatched v13.3.0 worker:
    • build_corpus(types="bugfix,decision", dateStart="2026-05-01", dateEnd="2026-06-01", limit=50)3 obs (bugfix only, dates silently dropped, no error)
  • Applied both fixes to bundled worker-service.cjs via patch-on-bundle (same surgical-edit strategy as plugin-hook-perf-patch.v2)
  • Restarted worker process
  • Re-ran the same call → 14 obs (8 decision + 6 bugfix; filter object in response now contains date_start/date_end correctly)
  • All 38 unit tests of Sprint-2 infrastructure still pass on the patched install
  • All 21 MCP tools systematically smoke-tested post-patch: 13 functional, 8 expected server-beta runtime errors (correct + clearer error message)
  • Independent node --check syntax verification of patched bundles
  • node --check src/servers/mcp-server.ts clean (no TS-source diagnostics introduced)
  • Backups preserved (.sprint3-bak) and rollback path tested

Out of scope (filed separately)

  • Bug A: FTS5 type column not in the index, causing search(query="<typename>", obs_type="<same>") to AND-fold to 0 results. Workaround in this PR via search.obs_type description update; structural fix proposed in a separate issue with TS-source diff.
  • 598 stale pending_messages (worker consumer side); orthogonal to this PR.

🤖 Generated with Claude Code

@greptile-apps

greptile-apps Bot commented May 31, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR delivers two targeted bug fixes in the worker-runtime corpus pipeline plus description improvements for 13 MCP tools. Both functional fixes are correct and well-reasoned.

  • CorpusRoutes.ts: Adds dateStart/dateEnd to the Zod schema with a null-coalescing fallback (date_start ?? dateStart), so camelCase date filters sent from the MCP surface are no longer silently dropped.
  • CorpusBuilder.ts: Routes the comma-joined observation types through searchArgs.obs_type instead of searchArgs.type (the search-router discriminator), fixing multi-type filters collapsing to a single type.
  • mcp-server.ts: 8 server-beta tools gain a [Server-beta runtime only — DISABLED in default "worker" runtime.] prefix and 5 corpus tools receive clarified precondition/workflow descriptions; however, the search tool's obs_type description — claimed in the PR summary as updated to warn about the FTS5 type-token trap — is absent from the diff.

Confidence Score: 5/5

Safe to merge — both functional fixes are correct and the description updates are straightforward.

The two code changes are narrow and well-targeted: the Zod schema addition plus null-coalescing fallback is the minimal correct fix for the camelCase passthrough gap, and routing observation types through obs_type instead of the search-router discriminator type is the right call. The only gap is a missing search tool description update that the PR claims to include; that omission leaves an LLM-facing documentation gap but does not affect runtime correctness.

src/servers/mcp-server.ts — the search tool description update described in the PR summary as a workaround for the FTS5 type-token trap is not present in the diff.

Important Files Changed

Filename Overview
src/services/worker/knowledge/CorpusBuilder.ts Routes observation-type filter through obs_type instead of type (the search-router discriminator), fixing the multi-type filter collapsing to a single type. Change is targeted and correct.
src/services/worker/http/routes/CorpusRoutes.ts Adds dateStart/dateEnd to the Zod schema and uses null-coalescing to prefer date_start/date_end with camelCase fallback, correctly bridging the MCP camelCase surface to the internal snake_case filter.
src/servers/mcp-server.ts 13 tool-description rewrites (8 server-beta labels + 5 corpus tool improvements); the search tool description claimed in the PR summary as updated for the FTS5 trap warning is absent from the diff.

Sequence Diagram

sequenceDiagram
    participant LLM as LLM Caller
    participant MCP as MCP Server (mcp-server.ts)
    participant Route as CorpusRoutes.ts
    participant Builder as CorpusBuilder.ts
    participant Search as SearchOrchestrator

    LLM->>MCP: "build_corpus(dateStart, dateEnd, types=[bugfix,decision])"
    MCP->>Route: "POST /api/corpus {dateStart, dateEnd, types}"

    Note over Route: Zod schema now accepts both<br/>dateStart AND date_start<br/>(Fix #1 — camelCase passthrough)

    Route->>Route: "dateStart = body.date_start ?? body.dateStart"
    Route->>Builder: "build(name, desc, filter{date_start, date_end, types})"

    Note over Builder: Fix #2 — obs_type routing<br/>searchArgs.obs_type = types.join(',')<br/>(was searchArgs.type — router discriminator)

    Builder->>Search: "search({obs_type:"bugfix,decision", dateStart, dateEnd})"
    Search-->>Builder: results (all matching types + date range)
    Builder->>Builder: "hydrateObservationsByIds(ids, {type: filter.types})"
    Builder-->>Route: CorpusFile
    Route-->>MCP: metadata (stats, date_range, filter)
    MCP-->>LLM: corpus response
Loading
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
src/servers/mcp-server.ts:781-788
**`search` description FTS5 warning absent from diff**

The PR description explicitly states: *"Workaround in this PR via `search.obs_type` description update"* for the FTS5 type-token trap (`query="bugfix" + obs_type="bugfix"` → 0 results). The diff shows 13 description rewrites but the `search` tool — whose description ends just before the first hunk at line 524 — was not changed. LLM callers will still silently get 0 results when they combine a type-name query with the same `obs_type` value, with no hint in the tool description about this pitfall. The claimed workaround should be added to the `search` tool's `obs_type` param description.

Reviews (6): Last reviewed commit: "docs(mcp): clarify tool descriptions — r..." | Re-trigger Greptile

Comment thread src/servers/mcp-server.ts
{
name: 'build_corpus',
description: 'Build a knowledge corpus from filtered observations. Creates a queryable knowledge agent. Params: name (required), description, project, types (comma-separated), concepts (comma-separated), files (comma-separated), query, dateStart, dateEnd, limit',
description: 'Build a knowledge corpus snapshot for later prime_corpus + query_corpus. Verify `stats.observation_count` and `date_range` in the response before priming. Params: name (required, used as filename), description, project, types (comma-separated; canonical: bugfix,decision,feature,change,refactor,discovery), concepts, files, query, dateStart (ISO), dateEnd (ISO), limit (default 500).',

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The build_corpus description introduces a "canonical types" list but omits security_alert and security_note, which are accepted by the Zod schema in CorpusRoutes.ts (ALLOWED_CORPUS_TYPES). An LLM caller following this description would never discover those two types are valid, and any existing corpus built with security observation types would silently not be reproducible via this tool description alone.

Suggested change
description: 'Build a knowledge corpus snapshot for later prime_corpus + query_corpus. Verify `stats.observation_count` and `date_range` in the response before priming. Params: name (required, used as filename), description, project, types (comma-separated; canonical: bugfix,decision,feature,change,refactor,discovery), concepts, files, query, dateStart (ISO), dateEnd (ISO), limit (default 500).',
description: 'Build a knowledge corpus snapshot for later prime_corpus + query_corpus. Verify `stats.observation_count` and `date_range` in the response before priming. Params: name (required, used as filename), description, project, types (comma-separated; canonical: bugfix,decision,feature,change,refactor,discovery,security_alert,security_note), concepts, files, query, dateStart (ISO), dateEnd (ISO), limit (default 500).',
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/servers/mcp-server.ts
Line: 784

Comment:
The `build_corpus` description introduces a "canonical types" list but omits `security_alert` and `security_note`, which are accepted by the Zod schema in `CorpusRoutes.ts` (`ALLOWED_CORPUS_TYPES`). An LLM caller following this description would never discover those two types are valid, and any existing corpus built with security observation types would silently not be reproducible via this tool description alone.

```suggestion
    description: 'Build a knowledge corpus snapshot for later prime_corpus + query_corpus. Verify `stats.observation_count` and `date_range` in the response before priming. Params: name (required, used as filename), description, project, types (comma-separated; canonical: bugfix,decision,feature,change,refactor,discovery,security_alert,security_note), concepts, files, query, dateStart (ISO), dateEnd (ISO), limit (default 500).',
```

How can I resolve this? If you propose a fix, please make it concise.

Kirchlive added a commit to Kirchlive/claude-mem that referenced this pull request May 31, 2026
Summarizes all Sprint 1+2+3 modifications in this fork:
- 7 bug fixes (3 corpus-side + 4 daemon/hook reliability)
- UDS daemon pipeline (~7.8x hook-latency reduction)
- 14 MCP tool-description rewrites (runtime gating, FTS5 trap, preconditions)
- 5 skill cleanups (wowerpoint + version-bump auto-trigger disabled; mem-search/manual/knowledge-agent improved)
- 38/38 unit tests + 8/8 patcher tests
- Rollback path documented

Cross-references upstream PR thedotmack#2728 (corpus fixes + tool docs) and
issue thedotmack#2729 (Bug A FTS5 conjunction).
Kirchlive added 2 commits May 31, 2026 05:01
…outing

Two defects in build_corpus path silently narrowed result sets:

1. CorpusRoutes destructured only snake_case (date_start, date_end) from the
   request body, but the MCP tool surface advertises camelCase
   (dateStart, dateEnd). Zod's .passthrough() let the unknown keys through
   but the handler never read them — date filters were silently dropped.
   Fix: declare both casings in the Zod schema and read either at the
   destructure site.

2. CorpusBuilder set searchArgs.type = filter.types.join(',') — but `type`
   is the search-router discriminator (observations|sessions|prompts),
   NOT the observation-type filter. Passing 'bugfix,decision' to that key
   matched zero rows; a downstream array hydrate masked the failure and
   returned only entries that survived a different filter pass.
   Fix: route the joined types through searchArgs.obs_type (the correct
   observation-type filter key).
…conditions

14 tool-description rewrites to prevent first-use misuse:

* All 8 server-beta-only tools (observation_add, observation_record_event,
  observation_search, observation_context, observation_generation_status,
  memory_add, memory_search, memory_context) now prefix the description
  with [Server-beta runtime only — DISABLED in default "worker" runtime.]
  and append a one-line pointer to the worker-runtime equivalent. The
  transport already returns this error at call-time; surfacing it in the
  description lets Claude pick the right tool the first time.

* search.obs_type now warns about the FTS5 type-token trap: combining
  query='bugfix' with obs_type='bugfix' returns 0 results because the
  FTS5 index covers title/subtitle/narrative/text/facts/concepts but
  NOT the type column. Use one or the other for type-token queries.

* prime_corpus / query_corpus / rebuild_corpus / reprime_corpus
  preconditions strengthened — explicit that query_corpus errors when
  prime is missing, that rebuild doesn't reprime, and that responses
  are LLM-generative rather than deterministic lookups.

* build_corpus description rewritten to list canonical types and
  emphasize verifying stats.observation_count before priming.
@Kirchlive

Copy link
Copy Markdown
Author

Superseded by #2731 — both commits (camelCase fix + tool descriptions) folded into the combined Sprint 1-3 PR per maintainer-friendly bundling. Branch sprint-3-mcp-fixes preserved on the fork in case you want to compare.

@Kirchlive Kirchlive closed this May 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant