Describe the bug
The semantic-search service builds Pydantic filter/mutation models dynamically from schema leaves, using the raw schema field name as the Pydantic field key:
components/lif/semantic_search_service/core.py:129,134 (build_dynamic_filter_model) and :200,205 (build_dynamic_mutation_model): annotations[key] = …, where key comes from leaf.json_path.split(".").
So a queryable/mutable attribute whose name isn't a valid Python identifier — e.g. the CEDS iSO639-2LangCode (hyphen), or any name with a space/dot — produces an invalid Pydantic field and breaks model construction at MCP-server startup. It sanitizes enum values (to_value_enum_name) but not field names.
This is the same class of failure as #1011 (which crashed graphql-org1), in a different component. #1012 does not cover it — that fix is scoped to openapi_to_graphql. The same iSO639-2LangCode field that downs GraphQL would also down semantic-search if it's queryable.
Additional related hazard in the same code: leaf.json_path.split(".") is naive, so a field name that itself contains a . would mis-nest the model tree (silent structure corruption).
Severity: startup crash for the semantic-search MCP server (for the name case); silent corruption (for the dotted-name case).
Fix direction: the durable fix is MDR write-time validation of names (#1014) so invalid names never reach any consumer. Defense-in-depth here: sanitize field names to valid Python identifiers when building the dynamic models (mirroring what openapi_to_graphql does with safe_identifier/safe_graphql_name), and de-dup collisions.
Related: #1011, #1012 (GraphQL equivalent), #1014 (boundary validation), #1013 (the offending field).
Describe the bug
The semantic-search service builds Pydantic filter/mutation models dynamically from schema leaves, using the raw schema field name as the Pydantic field key:
components/lif/semantic_search_service/core.py:129,134(build_dynamic_filter_model) and:200,205(build_dynamic_mutation_model):annotations[key] = …, wherekeycomes fromleaf.json_path.split(".").So a queryable/mutable attribute whose name isn't a valid Python identifier — e.g. the CEDS
iSO639-2LangCode(hyphen), or any name with a space/dot — produces an invalid Pydantic field and breaks model construction at MCP-server startup. It sanitizes enum values (to_value_enum_name) but not field names.This is the same class of failure as #1011 (which crashed graphql-org1), in a different component. #1012 does not cover it — that fix is scoped to
openapi_to_graphql. The sameiSO639-2LangCodefield that downs GraphQL would also down semantic-search if it's queryable.Additional related hazard in the same code:
leaf.json_path.split(".")is naive, so a field name that itself contains a.would mis-nest the model tree (silent structure corruption).Severity: startup crash for the semantic-search MCP server (for the name case); silent corruption (for the dotted-name case).
Fix direction: the durable fix is MDR write-time validation of names (#1014) so invalid names never reach any consumer. Defense-in-depth here: sanitize field names to valid Python identifiers when building the dynamic models (mirroring what
openapi_to_graphqldoes withsafe_identifier/safe_graphql_name), and de-dup collisions.Related: #1011, #1012 (GraphQL equivalent), #1014 (boundary validation), #1013 (the offending field).