Skip to content

Conversation

Copy link

Copilot AI commented Dec 5, 2025

Tracking issue

Why are the changes needed?

Pydantic v2 generates JSON schemas with $ref references for nested models (e.g., {"$ref": "#/$defs/SingleObj"}). The schema parsing logic in type_engine.py was attempting to access property_val["type"] before resolving these references, causing KeyError: 'type'.

class SingleObj(BaseModel):
    a: str

class TestDatum(BaseModel):
    b: SingleObj                      # Direct ref: {"$ref": "#/$defs/SingleObj"}
    d: list[SingleObj]                # Array items ref: {"items": {"$ref": ...}}
    e: Optional[list[SingleObj]]      # anyOf with ref: {"anyOf": [{"items": {"$ref": ...}}]}

What changes were proposed in this pull request?

Added $ref resolution logic:

  • _resolve_json_schema_ref() dereferences schema paths like #/$defs/ModelName with proper error handling
  • Resolves references before type access, preventing KeyError

Updated schema processing functions:

  • _handle_json_schema_property() now accepts full schema and resolves $ref before processing
  • _get_element_type() handles resolved object types by converting them to dataclasses
  • Fixed type annotation: Dict[str, str]Dict[str, Any] for schema properties

Propagated schema context:

  • generate_attribute_list_from_dataclass_json_mixin() passes schema to helper functions
  • All recursive calls maintain schema context for nested reference resolution

How was this patch tested?

Added test_nested_pydantic_model_with_list covering:

  • Direct nested models with $ref
  • Lists of nested models with $ref in items
  • Optional lists with anyOf containing $ref

All existing pydantic transformer tests (30/30) and dataclass tests (38/38) pass.

Setup process

N/A

Screenshots

N/A

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

Original prompt

Fix handling of JSON schema $ref references in nested Pydantic models

Problem

When Pydantic v2 generates JSON schemas for nested models (especially in lists like list[NestedModel]), it uses $ref references to definitions. The _handle_json_schema_property function in type_engine.py fails with KeyError: 'type' because it tries to access the "type" key before resolving the $ref.

Example that fails:

from pydantic import BaseModel

class SingleObj(BaseModel):
    a: str

class TestDatum(BaseModel):
    a: str
    b: SingleObj
    c: list[str]
    d: list[SingleObj]  # This fails - list of nested objects

This generates a schema like:

{
  "properties": {
    "d": {
      "anyOf": [
        {
          "type": "array",
          "items": {
            "$ref": "#/$defs/SingleObj"
          }
        },
        {
          "type": "null"
        }
      ]
    }
  }
}

The error occurs because:

  1. _handle_json_schema_property processes the anyOf and recursively calls itself for each item
  2. For the array item, it encounters {"type": "array", "items": {"$ref": "#/$defs/SingleObj"}}
  3. When processing the items, it tries to access property_val["type"] on the $ref dict
  4. This fails because $ref dicts only have a "$ref" key, not a "type" key

Solution

The _handle_json_schema_property function needs to resolve $ref references before attempting to access any schema properties. This should be done:

  1. At the beginning of the function (before any property access)
  2. Pass the full schema as a parameter to enable reference resolution
  3. Handle the reference path format #/$defs/ModelName or #/definitions/ModelName

The existing generate_attribute_list_from_dataclass_json function already has logic to handle $ref for nested dataclasses, and we should apply similar logic to generate_attribute_list_from_dataclass_json_mixin.

Also need to handle $ref in array items and other nested structures.

This pull request was created as a result of the following prompt from Copilot chat.

Fix handling of JSON schema $ref references in nested Pydantic models

Problem

When Pydantic v2 generates JSON schemas for nested models (especially in lists like list[NestedModel]), it uses $ref references to definitions. The _handle_json_schema_property function in type_engine.py fails with KeyError: 'type' because it tries to access the "type" key before resolving the $ref.

Example that fails:

from pydantic import BaseModel

class SingleObj(BaseModel):
    a: str

class TestDatum(BaseModel):
    a: str
    b: SingleObj
    c: list[str]
    d: list[SingleObj]  # This fails - list of nested objects

This generates a schema like:

{
  "properties": {
    "d": {
      "anyOf": [
        {
          "type": "array",
          "items": {
            "$ref": "#/$defs/SingleObj"
          }
        },
        {
          "type": "null"
        }
      ]
    }
  }
}

The error occurs because:

  1. _handle_json_schema_property processes the anyOf and recursively calls itself for each item
  2. For the array item, it encounters {"type": "array", "items": {"$ref": "#/$defs/SingleObj"}}
  3. When processing the items, it tries to access property_val["type"] on the $ref dict
  4. This fails because $ref dicts only have a "$ref" key, not a "type" key

Solution

The _handle_json_schema_property function needs to resolve $ref references before attempting to access any schema properties. This should be done:

  1. At the beginning of the function (before any property access)
  2. Pass the full schema as a parameter to enable reference resolution
  3. Handle the reference path format #/$defs/ModelName or #/definitions/ModelName

The existing generate_attribute_list_from_dataclass_json function already has logic to handle $ref for nested dataclasses, and we should apply similar logic to generate_attribute_list_from_dataclass_json_mixin.

Also need to handle $ref in array items and other nested structures.


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI self-assigned this Dec 5, 2025
@flyte-bot
Copy link
Contributor

Bito Automatic Review Skipped - Draft PR

Bito didn't auto-review because this pull request is in draft status.
No action is needed if you didn't intend for the agent to review it. Otherwise, to manually trigger a review, type /review in a comment and save.
You can change draft PR review settings here, or contact your Bito workspace admin at [email protected].

Copilot AI changed the title [WIP] Fix handling of JSON schema $ref references in nested Pydantic models Fix JSON schema $ref resolution in nested Pydantic models Dec 5, 2025
Copilot AI requested a review from davidmirror-ops December 5, 2025 14:30
@flyte-bot
Copy link
Contributor

Bito Automatic Review Skipped - Draft PR

Bito didn't auto-review because this pull request is in draft status.
No action is needed if you didn't intend for the agent to review it. Otherwise, to manually trigger a review, type /review in a comment and save.
You can change draft PR review settings here, or contact your Bito workspace admin at [email protected].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants