-
Notifications
You must be signed in to change notification settings - Fork 234
Description
Feature Request: Expand tool_call to an object for granular capabilities
The Problem
The current models.dev schema defines tool_call as a simple boolean. This is great for simple cases but doesn't capture provider-specific limitations or track important capability details, forcing consumer libraries to maintain their own local override lists.
Concrete Example: Meta Llama 3.3 70B on Amazon Bedrock
The model meta.llama3-3-70b-instruct-v1:0 when served via Amazon Bedrock has tool_call = true in models.dev, but exhibits documented provider-level failures:
1. Streaming + Tools Not Supported
Issue: Bedrock API throws HTTP 400 error: "This model doesn't support tool use in streaming mode"
Evidence:
- AWS re:Post: Using Bedrock with Mistral 2 Large, converse API with tools would not let me use streaming feature
- langchain-aws Issue Create gpt-5-mini.toml #140: Tool Calling Issue in AWS Bedrock Integration
- langchain-aws Issue Add MiniMax M2 model to vercel #354: Streaming validator incorrectly disables streaming for Meta models
Affected Models: Not just Llama 3.3 - this affects Llama 3.1, Mistral Large, and other non-Anthropic models on Bedrock.
2. Type Coercion Issues
Issue: Bedrock coerces integers to strings in tool parameters, and drops integers ≥ 2^31 entirely (returns empty toolUse.input = {}).
Evidence:
AWS Response: "This behavior is actually a model-specific characteristic rather than a validator issue."
Impact: Breaks common cases like epoch-millisecond timestamps in tool parameters.
3. Response Format Issues
Issue: Returns tool calls as JSON text strings instead of structured tool_calls objects.
Evidence:
- pydantic-ai Issue #1649: Llama 3.3 Tool Calling on Bedrock
- StackOverflow: Inconsistent tool calling behavior with Llama 3.1 70B model on AWS Bedrock
Quote from pydantic-ai issue:
"The model outputs the tool call as a JSON string within regular text content rather than using the proper
tool_callsmessage structure"
Additional Use Case: Parallel Tool Calling Tracking
Issue #202 requests tracking which models support parallel tool calling. Currently there's no way to document this capability in models.dev, making it difficult for developers to choose models based on throughput requirements.
Impact on Consumers
A consumer library like req_llm that relies on models.dev data will:
- Incorrectly believe streaming tools are supported → tests fail with HTTP 400
- Expect integer schema validation to work → tests fail with type mismatches
- Expect structured tool responses → parsing fails
- Cannot determine parallel tool support → requires manual testing
Current workaround: Maintain provider-specific override lists in application code, defeating the purpose of a centralized model metadata API.
See: req_llm Issue #163: Capability Mismatch Problem
The Proposed Solution
Change the zod schema for tool_call to be a union of a boolean OR an object with granular flags.
This solution is 100% backward-compatible. All existing TOML files with tool_call = true or tool_call = false will remain valid.
Schema Change
// Before
tool_call: z.boolean();
// After
tool_call: z.union([
z.boolean(),
z
.object({
supported: z.boolean(),
streaming: z.boolean().optional(),
parallel: z.boolean().optional(),
coerces_types: z.boolean().optional(),
})
.strict(),
]);Example Usage
# providers/amazon-bedrock/models/meta.llama3-3-70b-instruct-v1:0.toml
[tool_call]
supported = true
streaming = false # Documents the streaming limitation
coerces_types = true # Documents the type coercion behaviorNote: parallel is not specified in this example because we have no evidence either way for this model. When undefined, it defaults based on the semantics below.
Default Semantics
For API consumers (like req_llm):
- If
tool_call = false: All tool sub-capabilities are false - If
tool_call = true(boolean): All sub-capabilities work correctly (maintains current assumptions)streaming = true(default - most models support this)parallel = false(default - not universally supported, safer to assume no)coerces_types = false(default - most models don't have this bug)
- If
tool_call = {...}(object):- Use
supportedflag as base truth - If
streamingundefined → default totrue(matches currenttool_call = trueassumption) - If
parallelundefined → default tofalse(safer, not universal) - If
coerces_typesundefined → default tofalse(most models work correctly)
- Use
Rationale for defaults:
streaming = true: Currenttool_call = truealready implies streaming works. This is the norm.parallel = false: Parallel tool calling is NOT universal. Safer to assume not supported unless proven.coerces_types = false: Most models handle types correctly. This documents the exception.
This makes the granular flags "opt-out" for exceptions. You only specify fields when they differ from the defaults.
Benefits
- Eliminates local overrides - Consumer libraries can trust models.dev data
- Documents real-world behavior - Provider quirks are explicit, not hidden
- Enables parallel tool tracking - Addresses Request: New column for parallel tool calling #202
- 100% backward compatible - Existing TOML files need no changes, defaults match current assumptions
- Opt-in complexity - Simple cases stay simple
- Precedent exists - models.dev has evolved schemas before (modalities restructure, cache field renames)
Alternative Considered
Keep tool_call as boolean and add separate top-level fields like tool_streaming, tool_parallel, tool_coerces_types.
Rejected because: Pollutes top-level schema with tool-specific details; the union approach is cleaner and groups related metadata.