mcp-schema-normalize

Bridge MCP tool schemas to llama.cpp's grammar-compatible subset.

Normalize MCP / OpenAI-format tool JSON schemas into the narrower subset llama.cpp's grammar converter accepts. Bridges the standards gap between MCP-mandated JSON Schema 2020-12 (SEP-1613) and what local grammar-constrained sampling backends actually compile.

If your MCP tool calls work fine against Anthropic / OpenAI hosted APIs but die with Unable to generate parser for this template or Error resolving ref … anyOf not in {…} when routed through llama.cpp (llama-server, llama-swap, Ollama, etc.) — this library is for you.

What it fixes

These are documented permanent limitations of llama.cpp's json-schema-to-grammar.cpp, authoritatively listed in the grammars README maintained by the converter's implementer. The cited issues are closed — not because they were fixed, but because they were accepted as won't-fix or fell out of triage. This library is the gateway-side workaround for that documented gap.

Failure mode	Upstream status	What this library does
`anyOf` (or `oneOf`) beside `properties` / `type` / `required` / `additionalProperties`	Documented limitation (#7703 — closed, covered by grammars/README.md)	Distribute siblings into each union branch, producing self-contained objects
`{"not": {}}` sentinel from `zod-to-json-schema`	Closed with a LibreChat-side patch as the resolution (#17574)	Drop empty-`not` keywords; preserve non-empty `not` schemas
Nested `$ref`s into `anyOf` nodes	Documented limitation (#8073 — closed, still active in current builds)	Inline non-cyclic refs; preserve cyclic refs (llama.cpp handles cycles natively)
Schemas that expand past `MAX_REPETITION_THRESHOLD = 2000`	Closed without fix (#21228, user-side workaround posted)	Coarsen inlines that would blow the budget
llama-server silently falls back to unconstrained generation when grammar build fails	Closed as stale by bot (#19051 — still observable)	Pre-flight size budget + telemetry to make the silent fallback visible
Dangling `$ref` (paths that don't exist) — common `zod-to-json-schema` artifact when singleton unions collapse	Upstream schema-generator bug	Replace with permissive `{}` so the request still completes. See the load-bearing caveat below.

Install

This package is pure Python, zero runtime dependencies for the core. The LiteLLM proxy hook lives behind an optional extra so consumers who only need the schema transforms don't pull in LiteLLM.

# Pure-core: just the schema transforms (normalize_schema, normalize_tools,
# resolve_pointer, build_ref_graph, find_ref_cycles). No third-party deps.
pip install mcp-schema-normalize

# Add the LiteLLM CustomLogger pre-call hook. Pulls litellm>=1.0.
pip install mcp-schema-normalize[litellm]

# Development (pytest, ruff).
pip install mcp-schema-normalize[dev]

Equivalent uv invocations:

uv add mcp-schema-normalize                     # pure core
uv add 'mcp-schema-normalize[litellm]'          # + LiteLLM hook

Import the public API from the top-level package; integrations live under their own submodule path:

# Pure-core API — always available
from mcp_schema_normalize import normalize_schema, normalize_tools

# LiteLLM hook — only available with [litellm] extra installed
from mcp_schema_normalize.integrations.litellm import normalize_tool_schemas_handler

Quick start

Direct use (any framework, any backend)

from mcp_schema_normalize import normalize_tools

# Your OpenAI-format tool list as received from an MCP server
tools = [
    {
        "type": "function",
        "function": {
            "name": "paperclipUpdateIssue",
            "parameters": {
                # ... a JSON Schema 2020-12 tool definition with $ref, anyOf,
                # not:{} sentinels, etc. — whatever zod-to-json-schema emits
            },
        },
    },
]

normalized, telemetry = normalize_tools(tools)
# `normalized` is safe to forward to llama.cpp
# `telemetry` is a dict of counters you should log / alert on

LiteLLM proxy

Two steps: install the package into the proxy's Python environment, then register the hook in config.yaml.

Build a custom image that includes the package:

FROM ghcr.io/berriai/litellm:main-latest
RUN pip install --no-cache-dir 'mcp-schema-normalize[litellm]'

Register the hook in your config.yaml:

litellm_settings:
  callbacks:
    - "mcp_schema_normalize.integrations.litellm.normalize_tool_schemas_handler"
    # ... any other callbacks (after this one)

The hook will rewrite every tool's function.parameters in-flight on chat-completion, responses, and other tool-carrying calls. One INFO-level summary log per modified request, escalated to WARN if anything lossy fires. All telemetry counters land as structured extra= fields for log aggregators (Loki, Datadog, etc.) to index.

See docs/litellm.md for:

Running on a read-only / hardened LiteLLM container (volume-mount pattern)
Callback ordering against strip_invalid_tools, OTel, and other common callbacks
Troubleshooting (logs not appearing, hook not firing, etc.)

⚠️ When NOT to use this — load-bearing assumption

This library will make your request go through even when your MCP server emits broken schemas. The cost is that affected fields lose their type spec and the model may emit structurally wrong values (e.g. a number where the schema said string-or-null).

The most common case: zod-to-json-schema's singleton-union-collapse bug, where z.union([X, ...]) collapses to its sole concrete variant but the generated $ref strings still expect the pre-collapse anyOf envelope. The library detects these dangling refs and replaces them with {} (match-anything) so the request completes; the original schema is malformed and gets silently loosened.

Telemetry surfaces every event but you must be watching for it. The library emits:

refs_unresolved counter — incremented per dangling ref
WARN-level per-ref log line — unresolvable $ref replaced with permissive {} fallback
WARN-level per-request summary log — escalated whenever any lossy counter is non-zero
Per-schema WARN-line rate limiting (default 10 per schema) so a runaway broken server can't flood logs; aggregate counter still reflects every event

If your observability stack doesn't alert on either the counter or the WARN log, you will not notice schemas are degrading silently. In that case set STRICT_UNRESOLVED_REFS = True to opt out of the fallback — dangling refs are then left in place, llama.cpp's grammar converter rejects the tool, and the failure surfaces as a 400 instead of a degraded response.

import mcp_schema_normalize
mcp_schema_normalize.STRICT_UNRESOLVED_REFS = True  # fail loudly

Other lossy events the library also surfaces:

empty_union_drops — anyOf: [{"not": {}}] collapsed; siblings retained (strict loosening)
union_coexistence_skipped — anyOf and oneOf at the same level; we refuse to rewrite (correct handling needs allOf-wrapping; not yet implemented)
size_coarsenings — inline would blow SIZE_BUDGET = 1500; deepest inline coarsened to {"type": "object"}
max_inline_depth_reached — $ref chain exceeded MAX_INLINE_DEPTH = 5; tail coarsened to {"type": "object"}

Telemetry reference

normalize_schema() and normalize_tools() return (new_schema, telemetry) and (new_tools, telemetry) respectively. The telemetry dict's keys, what they mean, and when to alert:

Counter	Meaning	Lossy?	Routine on…
`refs_inlined`	Number of `$ref`s successfully inlined	no	Schemas with shared types
`cycles_preserved`	Cyclic `$ref`s left in place for llama.cpp to handle	no	Recursive types (TreeNode-style)
`refs_unresolved`	Dangling `$ref`s replaced with `{}`	yes	Broken MCP servers
`size_coarsenings`	Inlines coarsened due to size budget	yes	Pathologically large schemas
`max_inline_depth_reached`	Inline chains hit the depth cap	yes	Deeply nested ref graphs
`anyof_rewrites`	`anyOf`-beside-siblings distributions performed	no	Well-typed MCP schemas
`oneof_rewrites`	`oneOf`-beside-siblings distributions performed	no	Same
`not_drops`	`{"not": {}}` sentinels removed	no	zod-emitted schemas
`empty_union_drops`	Unions that became empty after `not:{}` filtering	yes	zod bugs
`union_coexistence_skipped`	Skipped node had both `anyOf` and `oneOf`	yes	Unusual schemas

A reasonable Grafana alert: sum(rate(refs_unresolved[5m])) by model > 0 pages whenever any tool schema starts emitting dangling refs.

Configuration

All knobs are module-level constants you can monkey-patch before use:

import mcp_schema_normalize

mcp_schema_normalize.SIZE_BUDGET = 1500              # llama.cpp threshold proxy
mcp_schema_normalize.MAX_INLINE_DEPTH = 5            # ref-chain depth cap
mcp_schema_normalize.MAX_PER_SCHEMA_REF_WARNINGS = 10  # per-schema log rate limit
mcp_schema_normalize.STRICT_UNRESOLVED_REFS = False  # True = no permissive fallback

Backends and frameworks

The library is structurally agnostic — it operates on JSON Schema. It's been tested with:

LiteLLM proxy → llama-swap → llama.cpp server (primary use case; first-class integration shipped)
Direct llama-server via OpenAI-compatible API (use the pure-core normalize_tools() in your own client)
Ollama (same llama.cpp grammar converter underneath; pure-core API applies)

Adding integrations for vLLM, TabbyAPI, or other proxies is a matter of writing a thin adapter that calls normalize_tools(). PRs welcome.

Status

0.1.0, alpha. API may change before 1.0. The pipeline and telemetry surface are stable in intent; specific field names and module constants may move based on user feedback.

Originating incident

This library was extracted from a real production incident — a paperclip MCP server emitting schemas that crashed Qwen3-Coder and Nemotron-Nano local backends with Unable to generate parser for this template. The investigation post-mortem (including "what we should have done differently") is in the LiteLLM repo it was extracted from; if you want the long-form story, ping me and I'll publish it as a blog post.

Contributing

See CONTRIBUTING.md. Bug reports especially welcome — the more broken MCP schemas we see in the wild, the better this library gets at handling them.

License

MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github		.github
docs		docs
src/mcp_schema_normalize		src/mcp_schema_normalize
tests		tests
.gitignore		.gitignore
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mcp-schema-normalize

What it fixes

Install

Quick start

Direct use (any framework, any backend)

LiteLLM proxy

⚠️ When NOT to use this — load-bearing assumption

Telemetry reference

Configuration

Backends and frameworks

Status

Originating incident

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

mcp-schema-normalize

What it fixes

Install

Quick start

Direct use (any framework, any backend)

LiteLLM proxy

⚠️ When NOT to use this — load-bearing assumption

Telemetry reference

Configuration

Backends and frameworks

Status

Originating incident

Contributing

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages