Skip to content

Add input_audio support to @langchain/google-common #9829

@hntrl

Description

@hntrl

Privileged issue

  • I am a LangChain maintainer, or was asked directly by a LangChain maintainer to create an issue here.

Issue Content

@langchain/google-common (used by @langchain/google-vertexai) does not support the OpenAI-standardized input_audio message content type. This causes an Unsupported type "input_audio" error when using Gemini models with prompts containing audio variables, particularly those from LangChain Hub or OpenAI-compatible tools.

Root Cause

The messageContentComplexToPart function in @langchain/google-common/dist/utils/gemini.js (around line 290) only handles text, image_url, media, and reasoning content types. It lacks support for input_audio.

Proposed Solution

Add input_audio case to the messageContentComplexToPart function to convert OpenAI-format audio to Gemini's inlineData format:

case "input_audio":
    if ("input_audio" in content) {
        return {
            inlineData: {
                mimeType: `audio/${content.input_audio.format}`,
                data: content.input_audio.data
            }
        };
    }
    break;

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions