feat(deepinfra): add new models [bot]#1023
Conversation
| region: "*" | ||
| limits: | ||
| context_window: 200000 | ||
| max_tokens: 200000 |
There was a problem hiding this comment.
max_tokens incorrectly equals context_window size
Medium Severity
The max_tokens value is set equal to the context_window for each new Anthropic model, which is incorrect. The canonical Anthropic provider configs show claude-haiku-4-5 has max_tokens: 64000 (not 200000), claude-opus-4-7 has max_tokens: 128000 (not 1000000), and claude-sonnet-4-6 has max_tokens: 64000 (not 1000000). These inflated values could mislead consumers about actual output token limits.
Additional Locations (2)
Reviewed by Cursor Bugbot for commit 6dc3a58. Configure here.
|
/test-models |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 3871f6c. Configure here.
| - json_output | ||
| limits: | ||
| context_window: 1000000 | ||
| max_tokens: 1000000 |
There was a problem hiding this comment.
max_tokens incorrectly equals context_window across all new models
High Severity
The max_tokens value is set equal to context_window in all new model files, but this is incorrect. The canonical provider definitions show much lower actual limits: claude-opus-4-7 has max_tokens: 128000 (not 1000000), claude-sonnet-4-6 has max_tokens: 64000 (not 1000000), and claude-haiku-4-5 has max_tokens: 64000 (not 200000). The same issue applies to the Gemini models — gemini-3.1-pro and gemini-3.1-flash-lite both set max_tokens: 1000000 while the canonical Google Gemini definitions specify max_tokens: 65536. Consumers relying on these values will believe the models can produce far more output tokens than they actually support.
Additional Locations (2)
Reviewed by Cursor Bugbot for commit 3871f6c. Configure here.
Gateway test results
Failures (16)
Error: Code snippetfrom openai import OpenAI
client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")
response = client.chat.completions.create(
model="test-v2-deepinfra/anthropic-claude-haiku-4-5",
messages=[
{"role": "user", "content": "List 3 colors with their hex codes in JSON."},
],
response_format={"type": "json_object"},
stream=False,
)
import json as _json
_content = response.choices[0].message.content
print(_content)
if not _content:
raise Exception("VALIDATION FAILED: json-output - response content is empty")
_json.loads(_content)
print("VALIDATION: json-output SUCCESS")
Error: Code snippetfrom openai import OpenAI
client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")
response = client.chat.completions.create(
model="test-v2-deepinfra/anthropic-claude-haiku-4-5",
messages=[
{"role": "user", "content": "List 3 colors with their hex codes in JSON."},
],
response_format={"type": "json_object"},
stream=True,
)
import json as _json
_accumulated = ""
for chunk in response:
if chunk.choices and len(chunk.choices) > 0:
delta = chunk.choices[0].delta
if delta.content is not None:
_accumulated += delta.content
print(delta.content, end="", flush=True)
if not _accumulated:
raise Exception("VALIDATION FAILED: json-output stream - no content received")
_json.loads(_accumulated)
print("\nVALIDATION: json-output stream SUCCESS")
Error: Code snippetfrom openai import OpenAI
client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")
response = client.chat.completions.create(
model="test-v2-deepinfra/anthropic-claude-haiku-4-5",
messages=[
{"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
],
reasoning_effort="medium",
stream=False,
)
_usage = getattr(response, "usage", None)
_reasoning_detected = False
_choices = getattr(response, "choices", None)
if _choices and len(_choices) > 0:
_message = getattr(_choices[0], "message", None)
else:
_message = None
if _message and getattr(_message, "content", None) is not None:
print(_message.content)
if _usage is not None:
_output_token_details = getattr(_usage, "completion_tokens_details", None)
if _output_token_details and getattr(_output_token_details, "reasoning_tokens", 0) > 0:
_reasoning_detected = True
elif getattr(_usage, "reasoning", None) is not None:
_reasoning_detected = True
if getattr(_message, "reasoning_content", None) is not None:
_reasoning_detected = True
elif getattr(_message, "reasoning", None) is not None:
_reasoning_detected = True
if not _reasoning_detected:
print("Response: ", response)
raise Exception("VALIDATION FAILED: reasoning - no reasoning information in response")
print("VALIDATION: reasoning SUCCESS")
Error: Code snippetfrom openai import OpenAI
client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")
response = client.chat.completions.create(
model="test-v2-deepinfra/anthropic-claude-haiku-4-5",
messages=[
{"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
],
reasoning_effort="medium",
stream=True,
)
_reasoning_detected = False
for chunk in response:
if chunk.choices and len(chunk.choices) > 0:
delta = chunk.choices[0].delta
if delta.content is not None:
print(delta.content, end="", flush=True)
if getattr(delta, "reasoning_content", None) is not None:
_reasoning_detected = True
if getattr(delta, "reasoning", None) is not None:
_reasoning_detected = True
_usage = getattr(chunk, "usage", None)
if _usage is not None:
_details = getattr(_usage, "completion_tokens_details", None)
if _details and getattr(_details, "reasoning_tokens", 0) > 0:
_reasoning_detected = True
if not _reasoning_detected:
raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
print("\nVALIDATION: reasoning stream SUCCESS")
Error: Code snippetfrom openai import OpenAI
client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")
response = client.chat.completions.create(
model="test-v2-deepinfra/anthropic-claude-opus-4-7",
messages=[
{"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
],
reasoning_effort="medium",
stream=True,
)
_reasoning_detected = False
for chunk in response:
if chunk.choices and len(chunk.choices) > 0:
delta = chunk.choices[0].delta
if delta.content is not None:
print(delta.content, end="", flush=True)
if getattr(delta, "reasoning_content", None) is not None:
_reasoning_detected = True
if getattr(delta, "reasoning", None) is not None:
_reasoning_detected = True
_usage = getattr(chunk, "usage", None)
if _usage is not None:
_details = getattr(_usage, "completion_tokens_details", None)
if _details and getattr(_details, "reasoning_tokens", 0) > 0:
_reasoning_detected = True
if not _reasoning_detected:
raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
print("\nVALIDATION: reasoning stream SUCCESS")
Error: Code snippetfrom openai import OpenAI
client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")
response = client.chat.completions.create(
model="test-v2-deepinfra/anthropic-claude-opus-4-7",
messages=[
{"role": "user", "content": "List 3 colors with their hex codes in JSON."},
],
response_format={"type": "json_object"},
stream=True,
)
import json as _json
_accumulated = ""
for chunk in response:
if chunk.choices and len(chunk.choices) > 0:
delta = chunk.choices[0].delta
if delta.content is not None:
_accumulated += delta.content
print(delta.content, end="", flush=True)
if not _accumulated:
raise Exception("VALIDATION FAILED: json-output stream - no content received")
_json.loads(_accumulated)
print("\nVALIDATION: json-output stream SUCCESS")
Error: Code snippetfrom openai import OpenAI
client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")
response = client.chat.completions.create(
model="test-v2-deepinfra/anthropic-claude-opus-4-7",
messages=[
{"role": "user", "content": "List 3 colors with their hex codes in JSON."},
],
response_format={"type": "json_object"},
stream=False,
)
import json as _json
_content = response.choices[0].message.content
print(_content)
if not _content:
raise Exception("VALIDATION FAILED: json-output - response content is empty")
_json.loads(_content)
print("VALIDATION: json-output SUCCESS")
Error: Code snippetfrom openai import OpenAI
client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")
response = client.chat.completions.create(
model="test-v2-deepinfra/anthropic-claude-opus-4-7",
messages=[
{"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
],
reasoning_effort="medium",
stream=False,
)
_usage = getattr(response, "usage", None)
_reasoning_detected = False
_choices = getattr(response, "choices", None)
if _choices and len(_choices) > 0:
_message = getattr(_choices[0], "message", None)
else:
_message = None
if _message and getattr(_message, "content", None) is not None:
print(_message.content)
if _usage is not None:
_output_token_details = getattr(_usage, "completion_tokens_details", None)
if _output_token_details and getattr(_output_token_details, "reasoning_tokens", 0) > 0:
_reasoning_detected = True
elif getattr(_usage, "reasoning", None) is not None:
_reasoning_detected = True
if getattr(_message, "reasoning_content", None) is not None:
_reasoning_detected = True
elif getattr(_message, "reasoning", None) is not None:
_reasoning_detected = True
if not _reasoning_detected:
print("Response: ", response)
raise Exception("VALIDATION FAILED: reasoning - no reasoning information in response")
print("VALIDATION: reasoning SUCCESS")
Error: Code snippetfrom openai import OpenAI
client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")
response = client.chat.completions.create(
model="test-v2-deepinfra/google-gemini-3.1-flash-lite",
messages=[
{"role": "user", "content": "List 3 colors with their hex codes in JSON."},
],
response_format={"type": "json_object"},
stream=True,
)
import json as _json
_accumulated = ""
for chunk in response:
if chunk.choices and len(chunk.choices) > 0:
delta = chunk.choices[0].delta
if delta.content is not None:
_accumulated += delta.content
print(delta.content, end="", flush=True)
if not _accumulated:
raise Exception("VALIDATION FAILED: json-output stream - no content received")
_json.loads(_accumulated)
print("\nVALIDATION: json-output stream SUCCESS")
Error: Code snippetfrom openai import OpenAI
client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")
response = client.chat.completions.create(
model="test-v2-deepinfra/google-gemini-3.1-flash-lite",
messages=[
{"role": "user", "content": "List 3 colors with their hex codes in JSON."},
],
response_format={"type": "json_object"},
stream=False,
)
import json as _json
_content = response.choices[0].message.content
print(_content)
if not _content:
raise Exception("VALIDATION FAILED: json-output - response content is empty")
_json.loads(_content)
print("VALIDATION: json-output SUCCESS")
Error: Code snippetfrom openai import OpenAI
client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")
response = client.chat.completions.create(
model="test-v2-deepinfra/anthropic-claude-sonnet-4-6",
messages=[
{"role": "user", "content": "List 3 colors with their hex codes in JSON."},
],
response_format={"type": "json_object"},
stream=False,
)
import json as _json
_content = response.choices[0].message.content
print(_content)
if not _content:
raise Exception("VALIDATION FAILED: json-output - response content is empty")
_json.loads(_content)
print("VALIDATION: json-output SUCCESS")
Error: Code snippetfrom openai import OpenAI
client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")
response = client.chat.completions.create(
model="test-v2-deepinfra/anthropic-claude-sonnet-4-6",
messages=[
{"role": "user", "content": "List 3 colors with their hex codes in JSON."},
],
response_format={"type": "json_object"},
stream=True,
)
import json as _json
_accumulated = ""
for chunk in response:
if chunk.choices and len(chunk.choices) > 0:
delta = chunk.choices[0].delta
if delta.content is not None:
_accumulated += delta.content
print(delta.content, end="", flush=True)
if not _accumulated:
raise Exception("VALIDATION FAILED: json-output stream - no content received")
_json.loads(_accumulated)
print("\nVALIDATION: json-output stream SUCCESS")
Error: Code snippetfrom openai import OpenAI
client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")
response = client.chat.completions.create(
model="test-v2-deepinfra/anthropic-claude-sonnet-4-6",
messages=[
{"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
],
reasoning_effort="medium",
stream=False,
)
_usage = getattr(response, "usage", None)
_reasoning_detected = False
_choices = getattr(response, "choices", None)
if _choices and len(_choices) > 0:
_message = getattr(_choices[0], "message", None)
else:
_message = None
if _message and getattr(_message, "content", None) is not None:
print(_message.content)
if _usage is not None:
_output_token_details = getattr(_usage, "completion_tokens_details", None)
if _output_token_details and getattr(_output_token_details, "reasoning_tokens", 0) > 0:
_reasoning_detected = True
elif getattr(_usage, "reasoning", None) is not None:
_reasoning_detected = True
if getattr(_message, "reasoning_content", None) is not None:
_reasoning_detected = True
elif getattr(_message, "reasoning", None) is not None:
_reasoning_detected = True
if not _reasoning_detected:
print("Response: ", response)
raise Exception("VALIDATION FAILED: reasoning - no reasoning information in response")
print("VALIDATION: reasoning SUCCESS")
Error: Code snippetfrom openai import OpenAI
client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")
response = client.chat.completions.create(
model="test-v2-deepinfra/anthropic-claude-sonnet-4-6",
messages=[
{"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
],
reasoning_effort="medium",
stream=True,
)
_reasoning_detected = False
for chunk in response:
if chunk.choices and len(chunk.choices) > 0:
delta = chunk.choices[0].delta
if delta.content is not None:
print(delta.content, end="", flush=True)
if getattr(delta, "reasoning_content", None) is not None:
_reasoning_detected = True
if getattr(delta, "reasoning", None) is not None:
_reasoning_detected = True
_usage = getattr(chunk, "usage", None)
if _usage is not None:
_details = getattr(_usage, "completion_tokens_details", None)
if _details and getattr(_details, "reasoning_tokens", 0) > 0:
_reasoning_detected = True
if not _reasoning_detected:
raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
print("\nVALIDATION: reasoning stream SUCCESS")
Error: Code snippetfrom openai import OpenAI
client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")
response = client.chat.completions.create(
model="test-v2-deepinfra/google-gemini-3.1-pro",
messages=[
{"role": "user", "content": "List 3 colors with their hex codes in JSON."},
],
response_format={"type": "json_object"},
stream=True,
)
import json as _json
_accumulated = ""
for chunk in response:
if chunk.choices and len(chunk.choices) > 0:
delta = chunk.choices[0].delta
if delta.content is not None:
_accumulated += delta.content
print(delta.content, end="", flush=True)
if not _accumulated:
raise Exception("VALIDATION FAILED: json-output stream - no content received")
_json.loads(_accumulated)
print("\nVALIDATION: json-output stream SUCCESS")
Error: Code snippetfrom openai import OpenAI
client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")
response = client.chat.completions.create(
model="test-v2-deepinfra/google-gemini-3.1-pro",
messages=[
{"role": "user", "content": "List 3 colors with their hex codes in JSON."},
],
response_format={"type": "json_object"},
stream=False,
)
import json as _json
_content = response.choices[0].message.content
print(_content)
if not _content:
raise Exception("VALIDATION FAILED: json-output - response content is empty")
_json.loads(_content)
print("VALIDATION: json-output SUCCESS")Skipped (1)
Skip reason: |


Auto-generated by model-addition-agent for provider
deepinfra.Note
Low Risk
Low risk: this PR only adds new provider model metadata files (no code changes), with impact limited to model catalog/selection and cost/limit configuration.
Overview
Registers new DeepInfra-hosted models by adding YAML configs for Anthropic
claude-haiku-4-5,claude-sonnet-4-6, andclaude-opus-4-7, and Googlegemini-3.1-pro,gemini-3.1-flash-lite, plus an image-generationgemini-3-pro-image.Each config defines token pricing, supported modalities/modes, context/max token limits, and feature flags like
function_calling,json_output, andthinking(where applicable).Reviewed by Cursor Bugbot for commit 3871f6c. Bugbot is set up for automated code reviews on this repo. Configure here.