Structured output parsing cutted by ' #27065

LudoArno · 2024-10-03T09:46:35Z

Checked other resources

I added a very descriptive title to this issue.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

from typing import Optional

from pydantic import BaseModel, Field

from langchain_google_vertexai import ChatVertexAI


class Joke(BaseModel):
    """Joke to tell user."""

    setup: str = Field(description="The setup of the joke")
    punchline: str = Field(description="The punchline to the joke")
    rating: Optional[int] = Field(description="How funny the joke is, from 1 to 10")


llm = ChatVertexAI(model="gemini-1.5-flash", project = "PROJECT_ID")
structured_llm = llm.with_structured_output(Joke)

structured_llm.invoke("Tell me a joke about cats")

output : Joke(setup='Why don’t cats play poker?', punchline='Why don', rating=7)

Adding include_raw=True doesn't solve the issue

Error Message and Stack Trace (if applicable)

No response

Description

I followed the documentation in order to use structured output with gemini and I end up with outputs being wrongly parsed due to the presence of ' in the text.

System Info

$ python -m langchain_core.sys_info

System Information
------------------
> OS:  Windows
> OS Version:  10.0.19044
> Python Version:  3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]

Package Information
-------------------
> langchain_core: 0.3.6
> langchain: 0.3.1
> langchain_community: 0.3.1
> langsmith: 0.1.128
> langchain_google_vertexai: 2.0.3
> langchain_openai: 0.2.0
> langchain_text_splitters: 0.3.0

Optional packages not installed
-------------------------------
> langgraph
> langserve

Other Dependencies
------------------
> aiohttp: 3.10.6
> anthropic[vertexai]: Installed. No version info available.
> async-timeout: 4.0.3
> dataclasses-json: 0.6.7
> google-cloud-aiplatform: 1.64.0
> google-cloud-storage: 2.18.2
> httpx: 0.27.2
> httpx-sse: 0.4.0
> jsonpatch: 1.33
> langchain-mistralai: Installed. No version info available.
> numpy: 1.26.4
> openai: 1.43.0
> orjson: 3.10.7
> packaging: 24.1
> pydantic: 2.8.2
> pydantic-settings: 2.5.2
> PyYAML: 6.0.2
> requests: 2.32.3
> SQLAlchemy: 2.0.35
> tenacity: 8.5.0
> tiktoken: 0.7.0
> typing-extensions: 4.12.2

The text was updated successfully, but these errors were encountered:

arindam-giri · 2024-10-04T04:30:46Z

Looks like the quote is giving issues. Can you run another prompt which doesn't return a single quote. For example: Tell me a LLM joke

LudoArno · 2024-10-04T09:48:24Z

Sometimes it works the model output \' instead of single quote however even if I ask him to use \' or no quote at all it generally still write quote

json_schema = {
    "title": "joke",
    "description": "Joke to tell user.",
    "type": "object",
    "properties": {
        "setup": {
            "type": "string",
            "description": "The setup of the joke",
        },
        "punchline": {
            "type": "string",
            "description": "The punchline to the joke",
        },
        "rating": {
            "type": "integer",
            "description": "How funny the joke is, from 1 to 10",
            "default": None,
        },
    },
    "required": ["setup", "punchline"],
}
llm = ChatVertexAI(model="gemini-1.5-flash", project = PROJECT_ID)
structured_llm = llm.with_structured_output(json_schema, include_raw=True)
response = structured_llm.invoke("Tell me a joke about cats without any quote and single quote, use \' instead")
response

output :

{'raw': AIMessage(content='', additional_kwargs={'function_call': {'name': 'joke', 'arguments': '{"rating": 8.0, "punchline": "Why don", "setup": "Why don"}'}}, response_metadata={'is_blocked': False, 'safety_ratings': [{'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability_label': 'NEGLIGIBLE', 'blocked': False, 'severity': 'HARM_SEVERITY_NEGLIGIBLE'}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability_label': 'NEGLIGIBLE', 'blocked': False, 'severity': 'HARM_SEVERITY_NEGLIGIBLE'}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability_label': 'NEGLIGIBLE', 'blocked': False, 'severity': 'HARM_SEVERITY_NEGLIGIBLE'}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability_label': 'NEGLIGIBLE', 'blocked': False, 'severity': 'HARM_SEVERITY_NEGLIGIBLE'}], 'usage_metadata': {'prompt_token_count': 49, 'candidates_token_count': 9, 'total_token_count': 58}, 'finish_reason': 'STOP'}, id='run-0f7e8284-5c75-4989-ab5e-2f5f27d0078b-0', tool_calls=[{'name': 'joke', 'args': {'rating': 8.0, 'punchline': 'Why don', 'setup': 'Why don'}, 'id': '7af76e9d-a07b-4a2e-bc31-432fb409a36d', 'type': 'tool_call'}], usage_metadata={'input_tokens': 49, 'output_tokens': 9, 'total_tokens': 58}),
 'parsed': {'rating': 8.0, 'punchline': 'Why don', 'setup': 'Why don'},
 'parsing_error': None}

dosubot bot added the 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature label Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Structured output parsing cutted by ' #27065

Structured output parsing cutted by ' #27065

LudoArno commented Oct 3, 2024

arindam-giri commented Oct 4, 2024

LudoArno commented Oct 4, 2024 •

edited

Loading

Structured output parsing cutted by ' #27065

Structured output parsing cutted by ' #27065

Comments

LudoArno commented Oct 3, 2024

Checked other resources

Example Code

Error Message and Stack Trace (if applicable)

Description

System Info

arindam-giri commented Oct 4, 2024

LudoArno commented Oct 4, 2024 • edited Loading

LudoArno commented Oct 4, 2024 •

edited

Loading