Skip to content

Conversation

@neilberkman
Copy link

@neilberkman neilberkman commented Nov 1, 2025

feat: Allow tool_call to be an object for granular capabilities

This PR implements the proposal from Issue #342.

It changes tool_call in schema.ts to be a z.union([z.boolean(), z.object(...)]) to capture provider-specific limitations for tool-calling.

Problem Summary

Consumer libraries like req_llm that rely on models.dev encounter failures when tool_call = true doesn't capture provider-specific quirks. For example:

  • Amazon Bedrock + Llama 3.3 70B: Returns HTTP 400 when streaming with tools (AWS re:Post)
  • Amazon Bedrock + Llama 3.1/Mistral: Coerces integers to strings, drops values ≥ 2^31 (AWS re:Post)

Additionally, Issue #202 requests tracking which models support parallel tool calling - a capability that's currently undocumented but critical for test planning and feature support detection.

Changes in this PR

1. packages/core/src/schema.ts

Updated the tool_call definition from:

tool_call: z.boolean();

To:

tool_call: z.union([
  z.boolean(),
  z
    .object({
      supported: z.boolean(),
      streaming: z.boolean().optional(),
      parallel: z.boolean().optional(),
      coerces_types: z.boolean().optional(),
    })
    .strict(),
]).describe(
  "Supports tool calling. Can be a boolean or an object for granular capabilities.",
);

2. providers/amazon-bedrock/models/meta.llama3-3-70b-instruct-v1:0.toml

Updated from:

tool_call = true

To:

[tool_call]
supported = true
streaming = false       # HTTP 400: "This model doesn't support tool use in streaming mode"
coerces_types = true    # Integers ≥ 2^31 dropped, some coerced to strings

This serves as a real-world example of the new schema and documents the known Bedrock limitations. Note that parallel is not specified because we have no evidence either way for this model.

Benefits of this Approach

  1. 100% Backward-Compatible at TOML level: All existing TOML files with tool_call = true or tool_call = false are still valid. No data migration needed.

  2. Solves the "Quirks" Problem: We can now document provider-specific TOML files that correctly describe real-world limitations. This allows consumers of the API to skip or modify tests accordingly, rather than maintaining their own local override lists.

  3. Enables Parallel Tool Tracking: Addresses Request: New column for parallel tool calling #202 by providing a place to document which models support parallel tool calling.

  4. Opt-in Complexity: The new fields are optional, so they only need to be specified for models with known exceptions.

How Consumers Should Handle Defaults

For API consumers (like req_llm):

  • If tool_call is false: All tool sub-capabilities are false
  • If tool_call is true (boolean): All sub-capabilities work correctly (maintains current assumptions)
    • streaming = true (default - most models support this)
    • parallel = false (default - not universally supported, safer to assume no)
    • coerces_types = false (default - most models don't have this bug)
  • If tool_call is an object:
    • Use the supported flag as the base truth
    • If streaming is undefined → default to true (matches current tool_call = true assumption)
    • If parallel is undefined → default to false (safer, not universal)
    • If coerces_types is undefined → default to false (most models work correctly)

Rationale for defaults:

  • streaming = true: Current tool_call = true already implies streaming works. This is the norm.
  • parallel = false: Parallel tool calling is NOT universal. Safer to assume not supported unless proven.
  • coerces_types = false: Most models handle types correctly. This documents the exception.

This makes the new granular flags "opt-out" for exceptions. You only specify fields when they differ from the defaults.

Breaking Change Note

This is a breaking change for consumers who expect tool_call to always be a boolean.

However, models.dev has precedent for breaking schema changes:

  • Commit d3e9843: Restructured input_modalities / output_modalitiesmodalities: {input: [], output: []}
  • Commit 19bc7d7: Renamed inputCached / outputCachedcache_read / cache_write

Consumers will need to update their parsing logic to handle tool_call as either a boolean or an object. This is a one-time update that provides long-term value.

Migration Guide for Consumers

Before (req_llm example):

tool_call: Map.get(metadata, "tool_call", false)  # Always boolean

After:

case Map.get(metadata, "tool_call", false) do
  # Boolean case (backward compat)
  true -> %{supported: true, streaming: true, parallel: false, coerces_types: false}
  false -> %{supported: false, streaming: false, parallel: false, coerces_types: false}

  # Object case (new granular data)
  %{"supported" => supported} = caps ->
    %{
      supported: supported,
      streaming: Map.get(caps, "streaming", true),
      parallel: Map.get(caps, "parallel", false),
      coerces_types: Map.get(caps, "coerces_types", false)
    }
end

Future Extensibility

If additional quirks are discovered, they can be added as optional fields without breaking existing consumers.

Testing

The schema change has been validated to accept:

  • Boolean values (true, false)
  • Object values with supported field
  • Object values with optional streaming, parallel, and coerces_types fields
  • Correctly rejects objects with unknown fields (.strict() enforcement)

Related Issues:

- Update schema to accept boolean OR object for tool_call
- Add streaming, parallel, and coerces_types optional fields
- Update meta.llama3-3-70b-instruct-v1:0 as example
- Documents Bedrock streaming limitation and type coercion

Default semantics (do the least harm):
- streaming: undefined → true (matches current tool_call=true assumption)
- parallel: undefined → false (safer, not universally supported)
- coerces_types: undefined → false (most models work correctly)

Addresses provider-specific tool calling limitations:
- Streaming incompatibility (AWS Bedrock non-Anthropic models)
- Type coercion issues (Bedrock integer handling)
- Parallel tool calling tracking (addresses sst#202)

Evidence:
- AWS re:Post: streaming errors, integer coercion
- langchain-aws sst#140, sst#354: streaming validation issues
- pydantic-ai #1649: response format problems
- req_llm: agentjido/req_llm#163

100% backward compatible at TOML level - existing tool_call = true/false still valid
@neilberkman neilberkman force-pushed the feature/granular-tool-capabilities branch from a9d9744 to 0359798 Compare November 1, 2025 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant