Skip to content

ChatGoogleGenerativeAI: tool-call arguments in additional_kwargs.function_call.arguments are emitted as \uXXXX-escaped strings (CJK / non-ASCII) #1789

@Wonder-donbury

Description

Package (Required)

  • langchain-google-genai
  • langchain-google-vertexai
  • langchain-google-community
  • Other / not sure / general

Checked other resources

  • I added a descriptive title to this issue
  • I searched the LangChain documentation and API reference (linked above)
  • I used the GitHub search to find a similar issue and didn't find it
  • I am sure this is a bug and not a question or request for help

Example Code (Python)

## Reproduction (no API key required)

  
  from types import SimpleNamespace
  from langchain_google_genai.chat_models import _parse_response_candidate

  fc = SimpleNamespace(name="echo", args={"text": "안녕하세요"})
  part = SimpleNamespace(
      text=None, thought=False, thought_signature=None,
      executable_code=None, code_execution_result=None,
      inline_data=None, function_call=fc,
  )
  candidate = SimpleNamespace(content=SimpleNamespace(parts=[part]))

  msg = _parse_response_candidate(candidate, streaming=False, model_name="gemini-2.5-flash")

  assert msg.tool_calls[0]["args"] == {"text": "안녕하세요"}                                    # OK
  assert msg.additional_kwargs["function_call"]["arguments"] == '{"text": "안녕하세요"}'         # FAILS
  # Actual:                                                  '{"text": "\\uc548\\ub155\\ud558\\uc138\\uc694"}'

Error Message and Stack Trace (if applicable)

Description

Repository / Version

  • Repo: langchain-ai/langchain-google
  • Package: langchain-google-genai
  • Installed version (where reproduced): 4.2.2

Buggy location

File: libs/genai/langchain_google_genai/chat_models.py

  # libs/genai/langchain_google_genai/chat_models.py
  if part.function_call:
      function_call = {"name": part.function_call.name}
      # dump to match other function calling llm for now
      # Convert function call args to dict first, then fix integer-like floats
      args_dict = dict(part.function_call.args) if part.function_call.args else {}
      function_call_args_dict = _convert_integer_like_floats(args_dict)
      function_call["arguments"] = json.dumps(                                  # ← bug
          {k: function_call_args_dict[k] for k in function_call_args_dict}
      )
      additional_kwargs["function_call"] = function_call

json.dumps()is called without ensure_ascii=False. Python's default (True) escapes all non-ASCII characters to \uXXXX. The resulting escaped string lands in AIMessage.additional_kwargs["function_call"]["arguments"].

Reproduction (minimal)

  from langchain_google_genai import ChatGoogleGenerativeAI
  from langchain_core.tools import tool

  @tool
  def echo(text: str) -> str:
      """Echo the given text."""
      return text

  llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash").bind_tools([echo])
  msg = llm.invoke("Call the echo tool with the text: 안녕하세요")

  print("tool_calls:", msg.tool_calls)
   #→ tool_calls: [{'name': 'echo', 'args': {'text': '안녕하세요'}, ...}]   ✅ correct (decoded back)

  print("additional_kwargs:", msg.additional_kwargs)
   #→ {'function_call': {'name': 'echo',
   #                    'arguments': '{"text": "\\uc548\\ub155\\ud558\\uc138\\uc694"}'}}   ❌ escaped

Same input via langchain_openai.ChatOpenAI produces arguments with raw UTF-8.

Why it matters

  1. Persistence layers downstream (DBs with JSON columns, file logs) store the escape sequences verbatim, making CJK / emoji / accented strings unreadable when inspected directly.
  2. Data duplication inconsistency: the same arguments are exposed twice on the same message — once correctly in tool_calls[i].args (clean dict, because parse_tool_calls round-trips through json.loads) and
    once incorrectly in additional_kwargs.function_call.arguments (escaped string). Consumers see different content depending on which field they read.
  3. Cross-provider inconsistency: langchain_openai already passes ensure_ascii=False in the analogous spot (langchain_openai/chat_models/base.py:3323):
    "arguments": json.dumps(tool_call["args"], ensure_ascii=False),
  4. and langchain_core consistently passes ensure_ascii=False in all 8 of its json.dumps call sites that touch message content. This makes the google_genai line an outlier.

Proposed fix (one-line)

       function_call["arguments"] = json.dumps(
            {k: function_call_args_dict[k] for k in function_call_args_dict},
            ensure_ascii=False,
        )

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggenai`langchain-google-genai` package

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions