Cross-model schema sanitizer + tool discovery layer (102-tool load fix, 9.3K→1.8K footprint) by drewburchfield · Pull Request #59 · drewburchfield/help-scout-mcp-server

drewburchfield · 2026-06-17T19:43:48Z

Summary

Cross-model compatibility + tool-surface footprint, in two independent-but-stacked pieces, proven by an eval across model families. Design: docs/superpowers/specs/2026-06-17-tool-discovery-layer-design.md. Closes NAS-1307, NAS-1305, NAS-1308; absorbs NAS-1302.

Why (proven by the eval, not assumed)

The "Gemini can't handle many tools" framing was wrong. Gemini rejected the old 102-tool surface with a schema 400 on a single object-level anyOf (structuredConversationFilter) — one bad schema kills the whole tools/list. The 102→55 consolidation's apparent "Gemini win" was luck (it deleted the offending tool). The real cross-model issue is schema dialect, and tool-definition tokens (55 schemas = 9.3K/request).

A. Cross-model schema sanitizer (NAS-1307)

Pure sanitizeJsonSchema() over every emitted inputSchema: strips object-level anyOf/oneOf/allOf (Gemini), adds additionalProperties:false (OpenAI strict), number→integer (matches Zod .int()), inline-derefs $defs. Plus JSON-string arg coercion at dispatch (weak models stringify arrays/objects → -32602). Absorbs NAS-1302.

B. Discovery layer (NAS-1305)

Default tools/list = ~10 tools (1.8K tokens, 0.9% of context): 7 core read tools + search_tools / get_tool_schema / call_tool. The other ~46 stay reachable:

Response-bootstrapped successor hints — hub tools append the logically-next tail tools' full sanitized schemas in _meta.suggestedTools (successor map grounded in the API entity graph; core tools filtered out), so the model's next call is correctly typed with no search step.
search_tools (keyword + 12-group Help Scout synonym map) + get_tool_schema as the escape hatch.

Discovery is the opinionated default (no mode flag); one escape hatch HELPSCOUT_EXPOSE_ALL_TOOLS=true returns the flat sanitized 55. The 55-case dispatch was refactored into dispatchTool() so call_tool re-enters cleanly.

Proof (evals/, NAS-1308)

Per-model LOAD check — does each surface load without a provider 400?

Surface	gemini-3-flash	gemini-3.1-pro-low	glm-4.7	gpt
control-102 (unsanitized)	FAIL (anyOf)	FAIL	LOAD	n/a (quota)
control-102 (sanitized)	LOAD	LOAD	LOAD	n/a
discovery-10	LOAD	LOAD	LOAD	n/a

→ the sanitizer fixes the load bug; discovery-10 loads on every reachable model.

Agentic tail-walk — can a model drive the 10-tool surface to a non-core tool? 24/24 (100%) across gemini-3-flash / gemini-3.1-pro-low / glm-4.7, avg ~3.4 turns (search → schema → call), with self-recovery from validation errors. gpt quota-blocked (untested; the additionalProperties fix targets its exact requirement).

Testing

type-check, lint, build, npm test (417 pass — only the pre-existing local-.env package-scripts test fails, passes in CI), MCPB validation (18). Eval harnesses live-verified against the proxy + Help Scout API.

Notes

Default surface is now 10 tools; full catalog reachable via meta-tools or HELPSCOUT_EXPOSE_ALL_TOOLS. Static mcp.json/manifest.json document the full 55 + 3 meta-tools. gpt cross-model load confirmation pending quota reset.

Pure sanitizeJsonSchema() pass over every emitted inputSchema (applied in listTools()): strips object-level anyOf/oneOf/allOf (Gemini 400), adds additionalProperties:false (OpenAI strict), converts number->integer (API has no floats; matches Zod .int()), inline-derefs $defs/$ref. Plus coerceJsonStringArgs() at callTool() dispatch so weak/non-Claude models that stringify array/object args don't 400 (-32602). Verified runtime: listTools() emits 0 combinators, additionalProperties on all 55 object schemas, all-integer. Live: sanitized 55-tool surface loads+calls on gemini-3-flash, gemini-3.1-pro-low, glm-4.7 (gpt quota-blocked this run). Absorbs NAS-1302. 13 new sanitizer tests; 397 pass. Proven earlier: stripping the anyOf made the full 102-tool surface load on Gemini, confirming the root cause was schema dialect, not tool count.

…ools Default tools/list = 7 core read tools + search_tools/get_tool_schema/call_tool (~1.8K tokens vs 9.3K). The other 46 tools stay reachable: search_tools (keyword scorer + 12-group Help Scout synonym map, camelCase-aware), get_tool_schema (returns sanitized schemas), call_tool (dispatches into the existing switch with arg coercion, rejects unknown names + meta-tool recursion). Refactored the 55-case switch into private dispatchTool() so call_tool re-enters cleanly; buildToolDefs/ allToolDefs() is the internal registry. HELPSCOUT_EXPOSE_ALL_TOOLS=true returns the flat sanitized 55 (single escape-hatch branch, no other dual-mode logic). Verified: default=10, expose-all=55; search('happiness report')->getHappinessReport, mailbox-synonym->getInbox, call_tool(getServerTime) returns live time. 13 new discovery tests; 410 pass. Static mcp.json/manifest document full catalog + meta.

…dTools) Hub tools append their logically-next TAIL tools' full sanitized schemas to result._meta.suggestedTools, so the model can call_tool them with correct args and no search step. SUCCESSOR_MAP grounded in the API entity graph; successorHintsFor() filters out core + meta tools (only purely-additive tail surfaces), caps at 3. Attached in dispatchTool so it rides along through call_tool re-entry; skipped for meta-tool results and under HELPSCOUT_EXPOSE_ALL_TOOLS; total (try/catch, never mutates payload). Verified live: searchConversations -> [getOriginalSource, getAttachment] (not the core getConversation/getThreads); getServerTime (terminal) -> none. 7 new tests; 417 pass.

…ity metric) evals/load-check.mjs runs a surface x model matrix asking 'does this tool surface load without a provider schema-400?' across Gemini/GLM/OpenAI. Proves end-to-end with production code: control-102 unsanitized FAILS on both Gemini (the anyOf 400), control-102 SANITIZED loads -> the sanitizer fixes the cross-model load bug; and discovery-10 loads on every reachable model. gpt currently n/a (quota cooldown, not a schema failure - detection distinguishes quota/config from real 400s).

…urface evals/tail-walk.mjs runs a real multi-turn tool-use loop: model gets the 10-tool discovery surface + a task needing a NON-core tool, and must reach it via search_tools -> get_tool_schema -> call_tool (executed live against the handler). Result: gemini-3-flash 8/8, gemini-3.1-pro-low 8/8, glm-4.7 8/8 = 24/24 (100%) reached the right tail tool with correct args, avg ~3.4 turns. gpt quota-blocked (n/a). Self-recovery observed on report date-range validation errors. Proves the discovery surface works across model families.

devin-ai-integration

Devin Review found 2 potential issues.

devin-ai-integration · 2026-06-17T19:53:05Z

+            arguments: {
+              type: 'object',
+              description: "The tool's input arguments object, matching its loaded input schema.",
+            },


🔴 Sanitizer adds additionalProperties: false to call_tool.arguments, blocking all inner tool arguments for strict-schema models

The call_tool meta-tool's arguments property is defined as { type: 'object', description: '...' } with no properties key (src/tools/index.ts:1500-1502), because it's an open passthrough bag for arbitrary inner-tool arguments. However, the sanitizer at src/utils/schema-sanitizer.ts:127-129 unconditionally adds additionalProperties: false to every node where type === 'object', even those without a properties map. After sanitization the advertised schema becomes { type: 'object', additionalProperties: false } — which per JSON Schema means "accept zero properties." Models or clients that enforce strict schema validation (e.g. OpenAI structured outputs, some Gemini configurations) would either refuse to populate arguments or send an empty object, making the entire discovery layer's ~48 non-core tools unreachable. The existing test at src/__tests__/discovery.test.ts:100-101 only asserts props.arguments.type === 'object' and does not check additionalProperties, so it doesn't catch this.

Suggested change

arguments: {

type: 'object',

description: "The tool's input arguments object, matching its loaded input schema.",

},

arguments: {

type: 'object',

additionalProperties: true,

description: "The tool's input arguments object, matching its loaded input schema.",

},

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-06-17T19:53:06Z

+    const coerced = coerceJsonStringArgs(innerArgs, tool.inputSchema ?? {}) as object;
+    return this.dispatchTool(name, coerced);


🚩 call_tool bypasses constraint validation and call history tracking

When a tool is invoked through the call_tool meta-tool, it enters via callToolMeta → dispatchTool → dispatchToolInner, which skips callTool()'s HelpScoutAPIConstraints.validateToolCall check and does not record the inner tool name in this.callHistory (src/tools/index.ts:1639). For example, searchConversations invoked via call_tool would skip inbox-ID format validation and date-format checks (src/utils/api-constraints.ts:82-97). The constraints are advisory (suggestions/guidance, not security gates), and the Help Scout API itself would reject malformed requests, so this is not a correctness bug. However, if the constraint system is extended with harder requirements in the future, this bypass path could become a real issue.

Was this helpful? React with 👍 or 👎 to provide feedback.

drewburchfield added 5 commits June 17, 2026 15:00

devin-ai-integration Bot reviewed Jun 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cross-model schema sanitizer + tool discovery layer (102-tool load fix, 9.3K→1.8K footprint)#59

Cross-model schema sanitizer + tool discovery layer (102-tool load fix, 9.3K→1.8K footprint)#59
drewburchfield wants to merge 5 commits into
devfrom
feature/nas-1307-schema-sanitizer

drewburchfield commented Jun 17, 2026 •

edited by devin-ai-integration Bot

Loading

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Jun 17, 2026

Uh oh!

devin-ai-integration Bot Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		const coerced = coerceJsonStringArgs(innerArgs, tool.inputSchema ?? {}) as object;
		return this.dispatchTool(name, coerced);

Conversation

drewburchfield commented Jun 17, 2026 • edited by devin-ai-integration Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why (proven by the eval, not assumed)

A. Cross-model schema sanitizer (NAS-1307)

B. Discovery layer (NAS-1305)

Proof (evals/, NAS-1308)

Testing

Notes

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

drewburchfield commented Jun 17, 2026 •

edited by devin-ai-integration Bot

Loading