fix(corpus): camelCase params + obs_type key routing; clarify MCP tool descriptions#2728
fix(corpus): camelCase params + obs_type key routing; clarify MCP tool descriptions#2728Kirchlive wants to merge 2 commits into
Conversation
Greptile SummaryThis PR delivers two targeted bug fixes in the worker-runtime corpus pipeline plus description improvements for 13 MCP tools. Both functional fixes are correct and well-reasoned.
Confidence Score: 5/5Safe to merge — both functional fixes are correct and the description updates are straightforward. The two code changes are narrow and well-targeted: the Zod schema addition plus null-coalescing fallback is the minimal correct fix for the camelCase passthrough gap, and routing observation types through
Important Files Changed
Sequence DiagramsequenceDiagram
participant LLM as LLM Caller
participant MCP as MCP Server (mcp-server.ts)
participant Route as CorpusRoutes.ts
participant Builder as CorpusBuilder.ts
participant Search as SearchOrchestrator
LLM->>MCP: "build_corpus(dateStart, dateEnd, types=[bugfix,decision])"
MCP->>Route: "POST /api/corpus {dateStart, dateEnd, types}"
Note over Route: Zod schema now accepts both<br/>dateStart AND date_start<br/>(Fix #1 — camelCase passthrough)
Route->>Route: "dateStart = body.date_start ?? body.dateStart"
Route->>Builder: "build(name, desc, filter{date_start, date_end, types})"
Note over Builder: Fix #2 — obs_type routing<br/>searchArgs.obs_type = types.join(',')<br/>(was searchArgs.type — router discriminator)
Builder->>Search: "search({obs_type:"bugfix,decision", dateStart, dateEnd})"
Search-->>Builder: results (all matching types + date range)
Builder->>Builder: "hydrateObservationsByIds(ids, {type: filter.types})"
Builder-->>Route: CorpusFile
Route-->>MCP: metadata (stats, date_range, filter)
MCP-->>LLM: corpus response
Prompt To Fix All With AIFix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
src/servers/mcp-server.ts:781-788
**`search` description FTS5 warning absent from diff**
The PR description explicitly states: *"Workaround in this PR via `search.obs_type` description update"* for the FTS5 type-token trap (`query="bugfix" + obs_type="bugfix"` → 0 results). The diff shows 13 description rewrites but the `search` tool — whose description ends just before the first hunk at line 524 — was not changed. LLM callers will still silently get 0 results when they combine a type-name query with the same `obs_type` value, with no hint in the tool description about this pitfall. The claimed workaround should be added to the `search` tool's `obs_type` param description.
Reviews (6): Last reviewed commit: "docs(mcp): clarify tool descriptions — r..." | Re-trigger Greptile |
| { | ||
| name: 'build_corpus', | ||
| description: 'Build a knowledge corpus from filtered observations. Creates a queryable knowledge agent. Params: name (required), description, project, types (comma-separated), concepts (comma-separated), files (comma-separated), query, dateStart, dateEnd, limit', | ||
| description: 'Build a knowledge corpus snapshot for later prime_corpus + query_corpus. Verify `stats.observation_count` and `date_range` in the response before priming. Params: name (required, used as filename), description, project, types (comma-separated; canonical: bugfix,decision,feature,change,refactor,discovery), concepts, files, query, dateStart (ISO), dateEnd (ISO), limit (default 500).', |
There was a problem hiding this comment.
The
build_corpus description introduces a "canonical types" list but omits security_alert and security_note, which are accepted by the Zod schema in CorpusRoutes.ts (ALLOWED_CORPUS_TYPES). An LLM caller following this description would never discover those two types are valid, and any existing corpus built with security observation types would silently not be reproducible via this tool description alone.
| description: 'Build a knowledge corpus snapshot for later prime_corpus + query_corpus. Verify `stats.observation_count` and `date_range` in the response before priming. Params: name (required, used as filename), description, project, types (comma-separated; canonical: bugfix,decision,feature,change,refactor,discovery), concepts, files, query, dateStart (ISO), dateEnd (ISO), limit (default 500).', | |
| description: 'Build a knowledge corpus snapshot for later prime_corpus + query_corpus. Verify `stats.observation_count` and `date_range` in the response before priming. Params: name (required, used as filename), description, project, types (comma-separated; canonical: bugfix,decision,feature,change,refactor,discovery,security_alert,security_note), concepts, files, query, dateStart (ISO), dateEnd (ISO), limit (default 500).', |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/servers/mcp-server.ts
Line: 784
Comment:
The `build_corpus` description introduces a "canonical types" list but omits `security_alert` and `security_note`, which are accepted by the Zod schema in `CorpusRoutes.ts` (`ALLOWED_CORPUS_TYPES`). An LLM caller following this description would never discover those two types are valid, and any existing corpus built with security observation types would silently not be reproducible via this tool description alone.
```suggestion
description: 'Build a knowledge corpus snapshot for later prime_corpus + query_corpus. Verify `stats.observation_count` and `date_range` in the response before priming. Params: name (required, used as filename), description, project, types (comma-separated; canonical: bugfix,decision,feature,change,refactor,discovery,security_alert,security_note), concepts, files, query, dateStart (ISO), dateEnd (ISO), limit (default 500).',
```
How can I resolve this? If you propose a fix, please make it concise.Summarizes all Sprint 1+2+3 modifications in this fork: - 7 bug fixes (3 corpus-side + 4 daemon/hook reliability) - UDS daemon pipeline (~7.8x hook-latency reduction) - 14 MCP tool-description rewrites (runtime gating, FTS5 trap, preconditions) - 5 skill cleanups (wowerpoint + version-bump auto-trigger disabled; mem-search/manual/knowledge-agent improved) - 38/38 unit tests + 8/8 patcher tests - Rollback path documented Cross-references upstream PR thedotmack#2728 (corpus fixes + tool docs) and issue thedotmack#2729 (Bug A FTS5 conjunction).
…outing
Two defects in build_corpus path silently narrowed result sets:
1. CorpusRoutes destructured only snake_case (date_start, date_end) from the
request body, but the MCP tool surface advertises camelCase
(dateStart, dateEnd). Zod's .passthrough() let the unknown keys through
but the handler never read them — date filters were silently dropped.
Fix: declare both casings in the Zod schema and read either at the
destructure site.
2. CorpusBuilder set searchArgs.type = filter.types.join(',') — but `type`
is the search-router discriminator (observations|sessions|prompts),
NOT the observation-type filter. Passing 'bugfix,decision' to that key
matched zero rows; a downstream array hydrate masked the failure and
returned only entries that survived a different filter pass.
Fix: route the joined types through searchArgs.obs_type (the correct
observation-type filter key).
…conditions 14 tool-description rewrites to prevent first-use misuse: * All 8 server-beta-only tools (observation_add, observation_record_event, observation_search, observation_context, observation_generation_status, memory_add, memory_search, memory_context) now prefix the description with [Server-beta runtime only — DISABLED in default "worker" runtime.] and append a one-line pointer to the worker-runtime equivalent. The transport already returns this error at call-time; surfacing it in the description lets Claude pick the right tool the first time. * search.obs_type now warns about the FTS5 type-token trap: combining query='bugfix' with obs_type='bugfix' returns 0 results because the FTS5 index covers title/subtitle/narrative/text/facts/concepts but NOT the type column. Use one or the other for type-token queries. * prime_corpus / query_corpus / rebuild_corpus / reprime_corpus preconditions strengthened — explicit that query_corpus errors when prime is missing, that rebuild doesn't reprime, and that responses are LLM-generative rather than deterministic lookups. * build_corpus description rewritten to list canonical types and emphasize verifying stats.observation_count before priming.
a170302 to
cc58adf
Compare
|
Superseded by #2731 — both commits (camelCase fix + tool descriptions) folded into the combined Sprint 1-3 PR per maintainer-friendly bundling. Branch |
Summary
Two independent fixes in the MCP corpus pipeline plus 14 tool-description rewrites, all uncovered by a systematic smoke-test of all 21 MCP tools on a v13.3.0 install (worker runtime).
Fixes (commit 1)
Defect 1 — camelCase
dateStart/dateEndsilently droppedsrc/servers/mcp-server.ts) advertises camelCase.src/services/worker/http/routes/CorpusRoutes.ts) destructures only snake_case (date_start/date_end) fromreq.body..passthrough()keeps the unknown camelCase keys on the body but the handler never reads them — date filters are silently dropped, no error.Defect 2 — multi-type filter collapses to single type
src/services/worker/knowledge/CorpusBuilder.ts:42setsearchArgs.type = filter.types.join(',').typeis the search-router discriminator (observations|sessions|prompts), not the observation-type filter.WHERE type = 'bugfix,decision'matches zero rows; downstream array-hydrate masks the failure and returns only entries that survived a different filter pass — hence the user-visible "only the first type came back" symptom.searchArgs.obs_type(the correct observation-type filter key).Docs (commit 2)
14 MCP tool-description rewrites to prevent first-use misuse:
[Server-beta runtime only — DISABLED in default "worker" runtime.]plus a one-line pointer to the worker-runtime equivalent. The transport already returns this error at call-time; surfacing it in the description lets the LLM pick the right tool the first time.search.obs_typenow warns about the FTS5 type-token trap:query="bugfix" + obs_type="bugfix"returns 0 because the FTS5 index covers title/subtitle/narrative/text/facts/concepts but not thetypecolumn. Use one or the other for type-token queries.prime_corpus/query_corpus/rebuild_corpus/reprime_corpuspreconditions strengthened — explicit thatquery_corpuserrors when unprimed, that rebuild does not reprime, and that responses are LLM-generative.build_corpusdescription lists canonical types and emphasizes verifyingstats.observation_countbefore priming.Test plan
build_corpus(types="bugfix,decision", dateStart="2026-05-01", dateEnd="2026-06-01", limit=50)→ 3 obs (bugfix only, dates silently dropped, no error)plugin-hook-perf-patch.v2)filterobject in response now containsdate_start/date_endcorrectly)node --checksyntax verification of patched bundlesnode --check src/servers/mcp-server.tsclean (no TS-source diagnostics introduced).sprint3-bak) and rollback path testedOut of scope (filed separately)
typecolumn not in the index, causingsearch(query="<typename>", obs_type="<same>")to AND-fold to 0 results. Workaround in this PR viasearch.obs_typedescription update; structural fix proposed in a separate issue with TS-source diff.pending_messages(worker consumer side); orthogonal to this PR.🤖 Generated with Claude Code