Skip to content

[FEATURE] Support audio input for standard Agent (Specially using Bedrock Provider, Converse API already supports AudioBlock). #2614

@dQuezada-P

Description

@dQuezada-P

Problem Statement

AWS Bedrock's Converse API natively supports audio inputs via the AudioBlock structure. Exposing this capability through our standard Agent interface will allow agents to perform direct transcription, summarization, and contextual analysis of voice data in a single step, bypassing the need for intermediate STT services.

Proposed Solution

Proposed Solution

Update Core Schema: Extend the standard Agent's input schema to accept audio payloads (e.g., base64 encoded strings, buffers, or file paths along with their MIME type).

Bedrock Provider Implementation: Map the new agent audio input type directly to the AudioBlock format required by the Bedrock Converse API.

Validation: Implement standard validation for supported audio formats (e.g., mp3, wav, flac) and size limits as defined by the underlying model providers.

Use Case

Primary Use Case: Direct Transcription & Speech-to-Text

Single-Step Audio Processing: Users can pass an audio file (e.g., a voicemail, meeting recording, or user voice note) directly to the agent with a prompt like "Transcribe this recording and extract all action items."

Pipeline Simplification: Eliminates the latency, cost, and architectural complexity of maintaining a separate transcription service just to feed text into the LLM.

Context-Aware Transcription: By processing the audio directly, multimodal LLMs can often capture nuances, speaker intent, and domain-specific terminology better than a standalone, context-blind STT model.

Alternatives Solutions

No response

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-agentRelated to the agent class or general agent questionsarea-modelRelated to models or model providersenhancementNew feature or requestready for contributionPull requests welcome

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions