Skip to content

Conversation

shreyashankar
Copy link
Collaborator

Adds native Pydantic BaseModel support for output schemas across all DocETL operations (Map, Reduce, Filter, Resolve). This only works for the Python API.
Note that this PR was mainly AI-generated.
Example usage is in tests/test_pydantic_integration.py.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

# The LLM API will handle the OpenAPI conversion internally
op_config["_pydantic_schema"] = output_schema
# Convert to dict format only for the list schema wrapper
dict_schema = convert_schema_to_dict_format(output_schema, model)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Schema Conversion Function Parameter Mismatch

The convert_schema_to_dict_format function is called with the LLM model name as its second argument. The function's model parameter, which defaults to 'gpt-4o-mini', seems intended for schema conversion logic rather than the LLM model name. This semantic mismatch could lead to incorrect schema conversion behavior.

Fix in Cursor Fix in Web

self.config["output"]["schema"] = convert_schema_to_dict_format(
raw_schema
)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Schema Mutation Causes Inconsistent Behavior

The ReduceOperation and ResolveOperation classes mutate self.config["output"]["schema"] in place. When a Pydantic model is provided for the output schema, it's converted to a dictionary and directly overwrites the original Pydantic model in the configuration. This can lead to inconsistent behavior or break functionality if the operation instance is reused, as subsequent executions will operate on the converted dictionary schema.

Additional Locations (1)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant