-
Notifications
You must be signed in to change notification settings - Fork 51
Description
Description
When using ChatDatabricks.with_structured_output() with method="json_schema", the API returns errors because the constructed response_format payload is missing required fields that the OpenAI-compatible API expects. There are three cascading issues, all stemming from the same root cause.
from pydantic import BaseModel, Field
from databricks_langchain import ChatDatabricks
class TerminationReason(BaseModel):
"""Structured termination reason of a conversation between agent and user."""
non_english: bool = Field(description="Whether or not the conversation is in English")
frustration: bool = Field(description="If user sounds frustrated, angry or threatening, return True.")
conversation = [
{"role": "user", "content": "I'm really upset that my order hasn't arrived yet."},
{"role": "ai", "content": "I'm sorry to hear that. Let me check the status for you."}
]
llm = ChatDatabricks(endpoint="databricks-gpt-5")
result = llm.with_structured_output(
schema=TerminationReason,
method="json_schema",
).invoke(conversation)
Additionally, specifically for reasoning models the parsing is broken because _convert_dict_to_message and _convert_dict_to_message_chunk do a json.dumps on content which for reasoning models is a list of content (reason tokens and response tokens) which is passed to the output parser leading to schema validation error.
Errors
- Missing response_format.json_schema.name
BadRequestError: Error code: 400"Missing required parameter: 'response_format.json_schema.name'."
The API requires a name field in the json_schema object. The current code does not include one.
- Missing additionalProperties: false (if name is added in response format)
When strict: true, the OpenAI API Spec requires additionalProperties: false at every object-level node. model_json_schema() does not include this by default.
BadRequestError: Error code: 400"Invalid schema for response_format 'json_schema': In context=(),'additionalProperties' is required to be supplied and to be false."
- required array mismatch (if name and additionalProperties both are specified)
BadRequestError: Error code: 400"Invalid schema for response_format 'generic-schema-name': In context=(),'required' is required to be supplied and to be an array including every keyin properties. Extra required key 'x' supplied."
- always fails for reasoning models
PydanticSerializationUnexpectedValue(Expected `str` - serialized value may not be as expected [field_name='content', input_value=[{'type': 'reasoning', 's....."}'}]}], input_type=list])
Root Cause
Errors 1-3:
The current implementation constructs the response_format manually using raw model_json_schema() output, which is not compliant with the OpenAI structured output API requirements:
response_format = {
"type": "json_schema",
"json_schema": {
"strict": True,
"schema": (pydantic_schema.model_json_schema() if pydantic_schema else schema),
},
}
The errors raised also depends on which model is being used and the provider. For example:-
- with databricks-gpt-oss only adding name field fixes everything
- with databricks-gpt-5... OpenAI enforces other attributes and they fail.
Error 4:
Some reasoning models like gpt-oss family return content as a list of content blocks (including reasoning and text types). In the Chat Completions path _convert_dict_to_message uses json.dumps(content) which converts the list to a string before the AIMessage is created and passed to output parser leading to this error.
This is also related to langchain-ai/langchain#33116 and langchain-ai/langchain#32465 among others.
Possible Fixes
Errors 1-3
- Adding required fields directly within the response_format
- How langchain-openai handles this - uses
_convert_to_openai_response_formatwhich usesconvert_to_openai_functionfrom langchain_core with strict=True which handles all three requirements:
name: extracted from the Pydantic class name or JSON schema title key
additionalProperties: false: recursively set on all object nodes via _recursive_set_additional_properties_false
required: set to exactly list(properties.keys()) so it matches every property
Error 4
- Handle de-serialization and serialization in output parser step that handles parsing
- In openai langchain integration The structured output is extracted via
message.parseda field the OpenAI SDK provides when you useclient.beta.chat.completions.parse()- this relies on server side operations nad is native to openai but wouldn't work in databricks, vllm, ollama serving etc