bug: with_structured_output(method="json_schema") fails due to incorrect repsonse_format schema and hanlding of content for reasoning models

# Description
When using ChatDatabricks.with_structured_output() with method="json_schema", the API returns errors because the constructed response_format payload is missing required fields that the OpenAI-compatible API expects. There are three cascading issues, all stemming from the same root cause.

```
from pydantic import BaseModel, Field
from databricks_langchain import ChatDatabricks

class TerminationReason(BaseModel):
    """Structured termination reason of a conversation between agent and user."""
    non_english: bool = Field(description="Whether or not the conversation is in English")
    frustration: bool = Field(description="If user sounds frustrated, angry or threatening, return True.")

conversation = [
    {"role": "user", "content": "I'm really upset that my order hasn't arrived yet."},
    {"role": "ai", "content": "I'm sorry to hear that. Let me check the status for you."}
]

llm = ChatDatabricks(endpoint="databricks-gpt-5")

result = llm.with_structured_output(
    schema=TerminationReason,
    method="json_schema",
).invoke(conversation)
```

Additionally, specifically for reasoning models the parsing is broken because `_convert_dict_to_message` and `_convert_dict_to_message_chunk` do a json.dumps on content which for reasoning models is a list of content (reason tokens and response tokens) which is passed to the output parser leading to schema validation error.

# Errors
1. Missing response_format.json_schema.name
```
BadRequestError: Error code: 400"Missing required parameter: 'response_format.json_schema.name'."
The API requires a name field in the json_schema object. The current code does not include one.
```

2. Missing additionalProperties: false (if name is added in response format)
When strict: true, the OpenAI API Spec requires additionalProperties: false at every object-level node. model_json_schema() does not include this by default.
```
BadRequestError: Error code: 400"Invalid schema for response_format 'json_schema': In context=(),'additionalProperties' is required to be supplied and to be false."
```

3. required array mismatch (if name and additionalProperties both are specified)
```
BadRequestError: Error code: 400"Invalid schema for response_format 'generic-schema-name': In context=(),'required' is required to be supplied and to be an array including every keyin properties. Extra required key 'x' supplied."
```

4. always fails for reasoning models
```
PydanticSerializationUnexpectedValue(Expected `str` - serialized value may not be as expected [field_name='content', input_value=[{'type': 'reasoning', 's....."}'}]}], input_type=list])
```

# Root Cause
## Errors 1-3:
The current implementation constructs the response_format manually using raw model_json_schema() output, which is not compliant with the OpenAI structured output API requirements:
```
response_format = {
    "type": "json_schema",
    "json_schema": {
        "strict": True,
        "schema": (pydantic_schema.model_json_schema() if pydantic_schema else schema),
    },
}
```

The errors raised also depends on which model is being used and the provider. For example:-
1. with databricks-gpt-oss only adding name field fixes everything
2. with databricks-gpt-5... OpenAI enforces other attributes and they fail.

## Error 4:
Some reasoning models like `gpt-oss` family return content as a list of content blocks (including reasoning and text types). In the Chat Completions path `_convert_dict_to_message` uses `json.dumps(content)` which converts the list to a string before the AIMessage is created and passed to output parser leading to this error.

This is also related to https://github.com/langchain-ai/langchain/issues/33116  and https://github.com/langchain-ai/langchain/issues/32465 among others.

# Possible Fixes
## Errors 1-3
1. Adding required fields directly within the response_format
2. How langchain-openai handles this - uses `_convert_to_openai_response_format` which uses`convert_to_openai_function` from langchain_core with strict=True which handles all three requirements:
name: extracted from the Pydantic class name or JSON schema title key
additionalProperties: false: recursively set on all object nodes via _recursive_set_additional_properties_false
required: set to exactly list(properties.keys()) so it matches every property

## Error 4
1. Handle de-serialization and serialization in output parser step that handles parsing
2. In openai langchain integration The structured output is extracted via `message.parsed` a field the OpenAI SDK provides when you use `client.beta.chat.completions.parse()` - this relies on server side operations nad is native to openai but wouldn't work in databricks, vllm, ollama serving etc




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: with_structured_output(method="json_schema") fails due to incorrect repsonse_format schema and hanlding of content for reasoning models #350

Description

Errors

Root Cause

Errors 1-3:

Error 4:

Possible Fixes

Errors 1-3

Error 4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug: with_structured_output(method="json_schema") fails due to incorrect repsonse_format schema and hanlding of content for reasoning models #350

Description

Description

Errors

Root Cause

Errors 1-3:

Error 4:

Possible Fixes

Errors 1-3

Error 4

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions