[FEATURE] Support audio input for standard Agent (Specially using Bedrock Provider, Converse API already supports AudioBlock).

### Problem Statement

AWS Bedrock's Converse API natively supports audio inputs via the AudioBlock structure. Exposing this capability through our standard Agent interface will allow agents to perform direct transcription, summarization, and contextual analysis of voice data in a single step, bypassing the need for intermediate STT services.


### Proposed Solution

Proposed Solution

Update Core Schema: Extend the standard Agent's input schema to accept audio payloads (e.g., base64 encoded strings, buffers, or file paths along with their MIME type).

Bedrock Provider Implementation: Map the new agent audio input type directly to the AudioBlock format required by the Bedrock Converse API.

Validation: Implement standard validation for supported audio formats (e.g., mp3, wav, flac) and size limits as defined by the underlying model providers.

### Use Case

Primary Use Case: Direct Transcription & Speech-to-Text

Single-Step Audio Processing: Users can pass an audio file (e.g., a voicemail, meeting recording, or user voice note) directly to the agent with a prompt like "Transcribe this recording and extract all action items."

Pipeline Simplification: Eliminates the latency, cost, and architectural complexity of maintaining a separate transcription service just to feed text into the LLM.

Context-Aware Transcription: By processing the audio directly, multimodal LLMs can often capture nuances, speaker intent, and domain-specific terminology better than a standalone, context-blind STT model.

### Alternatives Solutions

_No response_

### Additional Context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Support audio input for standard Agent (Specially using Bedrock Provider, Converse API already supports AudioBlock). #2614

Problem Statement

Proposed Solution

Use Case

Alternatives Solutions

Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[FEATURE] Support audio input for standard Agent (Specially using Bedrock Provider, Converse API already supports AudioBlock). #2614

Description

Problem Statement

Proposed Solution

Use Case

Alternatives Solutions

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions