Skip to content

AI Agent: LLM capability matrix + multimodal tool results #7214

@maff

Description

@maff

Is your feature request related to a problem? Please describe.

The native provider implementations from Phase 1 hardcode their capability profile (modalities, context window, max output tokens, feature flags). The capabilities of an LLM depend on the combination of model and backend (a given Claude model on Bedrock can differ from the same model on the direct API in context window or regional rate limits), so the hardcoded approach won't scale beyond the smallest viable shipping set. Tool-result documents that contain images or PDFs are also still routed through the synthetic-UserMessage fallback regardless of whether the target model can read them natively.

Describe the solution you'd like

Introduce a configuration-driven LLM capability matrix that describes each supported (api family, backend, model) tuple. Wire each native chat model implementation to consult the matrix at request time, and route documents inside tool-call results to either native multimodal emission or the existing fallback path based on the resolved capabilities.

Describe alternatives you've considered

See the parent epic.

Metadata

Metadata

Assignees

No one assigned

    Type

    Urgency

    None yet

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions