Skip to content

Feature Request: Expand tool_call to an object for granular capabilities #342

@neilberkman

Description

@neilberkman

Feature Request: Expand tool_call to an object for granular capabilities

The Problem

The current models.dev schema defines tool_call as a simple boolean. This is great for simple cases but doesn't capture provider-specific limitations or track important capability details, forcing consumer libraries to maintain their own local override lists.

Concrete Example: Meta Llama 3.3 70B on Amazon Bedrock

The model meta.llama3-3-70b-instruct-v1:0 when served via Amazon Bedrock has tool_call = true in models.dev, but exhibits documented provider-level failures:

1. Streaming + Tools Not Supported

Issue: Bedrock API throws HTTP 400 error: "This model doesn't support tool use in streaming mode"

Evidence:

Affected Models: Not just Llama 3.3 - this affects Llama 3.1, Mistral Large, and other non-Anthropic models on Bedrock.

2. Type Coercion Issues

Issue: Bedrock coerces integers to strings in tool parameters, and drops integers ≥ 2^31 entirely (returns empty toolUse.input = {}).

Evidence:

AWS Response: "This behavior is actually a model-specific characteristic rather than a validator issue."

Impact: Breaks common cases like epoch-millisecond timestamps in tool parameters.

3. Response Format Issues

Issue: Returns tool calls as JSON text strings instead of structured tool_calls objects.

Evidence:

Quote from pydantic-ai issue:

"The model outputs the tool call as a JSON string within regular text content rather than using the proper tool_calls message structure"

Additional Use Case: Parallel Tool Calling Tracking

Issue #202 requests tracking which models support parallel tool calling. Currently there's no way to document this capability in models.dev, making it difficult for developers to choose models based on throughput requirements.

Impact on Consumers

A consumer library like req_llm that relies on models.dev data will:

  • Incorrectly believe streaming tools are supported → tests fail with HTTP 400
  • Expect integer schema validation to work → tests fail with type mismatches
  • Expect structured tool responses → parsing fails
  • Cannot determine parallel tool support → requires manual testing

Current workaround: Maintain provider-specific override lists in application code, defeating the purpose of a centralized model metadata API.

See: req_llm Issue #163: Capability Mismatch Problem


The Proposed Solution

Change the zod schema for tool_call to be a union of a boolean OR an object with granular flags.

This solution is 100% backward-compatible. All existing TOML files with tool_call = true or tool_call = false will remain valid.

Schema Change

// Before
tool_call: z.boolean();

// After
tool_call: z.union([
  z.boolean(),
  z
    .object({
      supported: z.boolean(),
      streaming: z.boolean().optional(),
      parallel: z.boolean().optional(),
      coerces_types: z.boolean().optional(),
    })
    .strict(),
]);

Example Usage

# providers/amazon-bedrock/models/meta.llama3-3-70b-instruct-v1:0.toml

[tool_call]
supported = true
streaming = false       # Documents the streaming limitation
coerces_types = true    # Documents the type coercion behavior

Note: parallel is not specified in this example because we have no evidence either way for this model. When undefined, it defaults based on the semantics below.

Default Semantics

For API consumers (like req_llm):

  • If tool_call = false: All tool sub-capabilities are false
  • If tool_call = true (boolean): All sub-capabilities work correctly (maintains current assumptions)
    • streaming = true (default - most models support this)
    • parallel = false (default - not universally supported, safer to assume no)
    • coerces_types = false (default - most models don't have this bug)
  • If tool_call = {...} (object):
    • Use supported flag as base truth
    • If streaming undefined → default to true (matches current tool_call = true assumption)
    • If parallel undefined → default to false (safer, not universal)
    • If coerces_types undefined → default to false (most models work correctly)

Rationale for defaults:

  • streaming = true: Current tool_call = true already implies streaming works. This is the norm.
  • parallel = false: Parallel tool calling is NOT universal. Safer to assume not supported unless proven.
  • coerces_types = false: Most models handle types correctly. This documents the exception.

This makes the granular flags "opt-out" for exceptions. You only specify fields when they differ from the defaults.


Benefits

  1. Eliminates local overrides - Consumer libraries can trust models.dev data
  2. Documents real-world behavior - Provider quirks are explicit, not hidden
  3. Enables parallel tool tracking - Addresses Request: New column for parallel tool calling #202
  4. 100% backward compatible - Existing TOML files need no changes, defaults match current assumptions
  5. Opt-in complexity - Simple cases stay simple
  6. Precedent exists - models.dev has evolved schemas before (modalities restructure, cache field renames)

Alternative Considered

Keep tool_call as boolean and add separate top-level fields like tool_streaming, tool_parallel, tool_coerces_types.

Rejected because: Pollutes top-level schema with tool-specific details; the union approach is cleaner and groups related metadata.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions