feat(deepinfra): add new models [bot] by models-bot[bot] · Pull Request #1023 · truefoundry/models

models-bot · 2026-05-14T00:03:58Z

Auto-generated by model-addition-agent for provider deepinfra.

Note

Low Risk
Low risk: this PR only adds new provider model metadata files (no code changes), with impact limited to model catalog/selection and cost/limit configuration.

Overview
Registers new DeepInfra-hosted models by adding YAML configs for Anthropic claude-haiku-4-5, claude-sonnet-4-6, and claude-opus-4-7, and Google gemini-3.1-pro, gemini-3.1-flash-lite, plus an image-generation gemini-3-pro-image.

Each config defines token pricing, supported modalities/modes, context/max token limits, and feature flags like function_calling, json_output, and thinking (where applicable).

^{Reviewed by Cursor Bugbot for commit 3871f6c. Bugbot is set up for automated code reviews on this repo. Configure here.}

cursor · 2026-05-14T00:06:59Z

+      region: "*"
+limits:
+    context_window: 200000
+    max_tokens: 200000


max_tokens incorrectly equals context_window size

Medium Severity

The max_tokens value is set equal to the context_window for each new Anthropic model, which is incorrect. The canonical Anthropic provider configs show claude-haiku-4-5 has max_tokens: 64000 (not 200000), claude-opus-4-7 has max_tokens: 128000 (not 1000000), and claude-sonnet-4-6 has max_tokens: 64000 (not 1000000). These inflated values could mislead consumers about actual output token limits.

Additional Locations (2)

providers/deepinfra/anthropic/claude-opus-4-7.yaml#L5-L7

providers/deepinfra/anthropic/claude-sonnet-4-6.yaml#L5-L7

^{Reviewed by Cursor Bugbot for commit 6dc3a58. Configure here.}

github-actions · 2026-05-14T00:10:43Z

/test-models

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 3871f6c. Configure here.}

cursor · 2026-05-14T00:14:26Z

+    - json_output
+limits:
+    context_window: 1000000
+    max_tokens: 1000000


max_tokens incorrectly equals context_window across all new models

High Severity

The max_tokens value is set equal to context_window in all new model files, but this is incorrect. The canonical provider definitions show much lower actual limits: claude-opus-4-7 has max_tokens: 128000 (not 1000000), claude-sonnet-4-6 has max_tokens: 64000 (not 1000000), and claude-haiku-4-5 has max_tokens: 64000 (not 200000). The same issue applies to the Gemini models — gemini-3.1-pro and gemini-3.1-flash-lite both set max_tokens: 1000000 while the canonical Google Gemini definitions specify max_tokens: 65536. Consumers relying on these values will believe the models can produce far more output tokens than they actually support.

Additional Locations (2)

providers/deepinfra/anthropic/claude-haiku-4-5.yaml#L8-L10

providers/deepinfra/anthropic/claude-sonnet-4-6.yaml#L8-L10

^{Reviewed by Cursor Bugbot for commit 3871f6c. Configure here.}

harshiv-26 · 2026-05-14T00:16:12Z

Gateway test results

Total: 41
Passed: 24
Failed: 11
Validation failed: 5
Errored: 0
Skipped: 1
Success rate: 60.0%

Provider	Model	Scenarios
`deepinfra`	`anthropic/claude-haiku-4-5`	success: tool-call, tool-call:stream, params:stream, params failure: json-output, json-output:stream validation_failure: reasoning, reasoning:stream
`deepinfra`	`anthropic/claude-opus-4-7`	success: params, tool-call:stream, tool-call, params:stream failure: json-output:stream, json-output, reasoning validation_failure: reasoning:stream
`deepinfra`	`anthropic/claude-sonnet-4-6`	success: tool-call:stream, params:stream, params, tool-call failure: json-output, json-output:stream validation_failure: reasoning, reasoning:stream
`deepinfra`	`google/gemini-3-pro-image`	skipped: skip-check
`deepinfra`	`google/gemini-3.1-flash-lite`	success: tool-call, params:stream, params, tool-call:stream, reasoning, reasoning:stream failure: json-output:stream, json-output
`deepinfra`	`google/gemini-3.1-pro`	success: params, tool-call, params:stream, tool-call:stream, reasoning:stream, reasoning failure: json-output:stream, json-output

Failures (16)

deepinfra/anthropic/claude-haiku-4-5 — json-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp866oiriy/snippet.py", line 22, in <module>
    _json.loads(_content)
  File "/usr/local/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/anthropic-claude-haiku-4-5",
    messages=[
        {"role": "user", "content": "List 3 colors with their hex codes in JSON."},
    ],
    response_format={"type": "json_object"},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: json-output - response content is empty")

_json.loads(_content)
print("VALIDATION: json-output SUCCESS")

deepinfra/anthropic/claude-haiku-4-5 — json-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpy4sv9uya/snippet.py", line 27, in <module>
    _json.loads(_accumulated)
  File "/usr/local/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/anthropic-claude-haiku-4-5",
    messages=[
        {"role": "user", "content": "List 3 colors with their hex codes in JSON."},
    ],
    response_format={"type": "json_object"},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: json-output stream - no content received")

_json.loads(_accumulated)
print("\nVALIDATION: json-output stream SUCCESS")

deepinfra/anthropic/claude-haiku-4-5 — reasoning (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp4lusjfh6/snippet.py", line 40, in <module>
    raise Exception("VALIDATION FAILED: reasoning - no reasoning information in response")
Exception: VALIDATION FAILED: reasoning - no reasoning information in response

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/anthropic-claude-haiku-4-5",
    messages=[
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=False,
)

_usage = getattr(response, "usage", None)
_reasoning_detected = False

_choices = getattr(response, "choices", None)
if _choices and len(_choices) > 0:
    _message = getattr(_choices[0], "message", None)
else:
    _message = None

if _message and getattr(_message, "content", None) is not None:
    print(_message.content)

if _usage is not None:
    _output_token_details = getattr(_usage, "completion_tokens_details", None)
    if _output_token_details and getattr(_output_token_details, "reasoning_tokens", 0) > 0:
        _reasoning_detected = True
    elif getattr(_usage, "reasoning", None) is not None:
        _reasoning_detected = True

if getattr(_message, "reasoning_content", None) is not None:
    _reasoning_detected = True
elif getattr(_message, "reasoning", None) is not None:
    _reasoning_detected = True

if not _reasoning_detected:
    print("Response: ", response)
    raise Exception("VALIDATION FAILED: reasoning - no reasoning information in response")
print("VALIDATION: reasoning SUCCESS")

deepinfra/anthropic/claude-haiku-4-5 — reasoning:stream (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpogv1288v/snippet.py", line 32, in <module>
    raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
Exception: VALIDATION FAILED: reasoning stream - no reasoning information in stream

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/anthropic-claude-haiku-4-5",
    messages=[
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=True,
)

_reasoning_detected = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if getattr(delta, "reasoning_content", None) is not None:
            _reasoning_detected = True
        if getattr(delta, "reasoning", None) is not None:
            _reasoning_detected = True

    _usage = getattr(chunk, "usage", None)
    if _usage is not None:
        _details = getattr(_usage, "completion_tokens_details", None)
        if _details and getattr(_details, "reasoning_tokens", 0) > 0:
            _reasoning_detected = True

if not _reasoning_detected:
    raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
print("\nVALIDATION: reasoning stream SUCCESS")

deepinfra/anthropic/claude-opus-4-7 — reasoning:stream (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpwyh_ebz8/snippet.py", line 32, in <module>
    raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
Exception: VALIDATION FAILED: reasoning stream - no reasoning information in stream

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/anthropic-claude-opus-4-7",
    messages=[
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=True,
)

_reasoning_detected = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if getattr(delta, "reasoning_content", None) is not None:
            _reasoning_detected = True
        if getattr(delta, "reasoning", None) is not None:
            _reasoning_detected = True

    _usage = getattr(chunk, "usage", None)
    if _usage is not None:
        _details = getattr(_usage, "completion_tokens_details", None)
        if _details and getattr(_details, "reasoning_tokens", 0) > 0:
            _reasoning_detected = True

if not _reasoning_detected:
    raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
print("\nVALIDATION: reasoning stream SUCCESS")

deepinfra/anthropic/claude-opus-4-7 — json-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpcc2ly3qb/snippet.py", line 27, in <module>
    _json.loads(_accumulated)
  File "/usr/local/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/anthropic-claude-opus-4-7",
    messages=[
        {"role": "user", "content": "List 3 colors with their hex codes in JSON."},
    ],
    response_format={"type": "json_object"},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: json-output stream - no content received")

_json.loads(_accumulated)
print("\nVALIDATION: json-output stream SUCCESS")

deepinfra/anthropic/claude-opus-4-7 — json-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpzgzg56oe/snippet.py", line 22, in <module>
    _json.loads(_content)
  File "/usr/local/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/anthropic-claude-opus-4-7",
    messages=[
        {"role": "user", "content": "List 3 colors with their hex codes in JSON."},
    ],
    response_format={"type": "json_object"},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: json-output - response content is empty")

_json.loads(_content)
print("VALIDATION: json-output SUCCESS")

deepinfra/anthropic/claude-opus-4-7 — reasoning (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmps6r_pm4p/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Error code: 500 - {'status': 'failure', 'message': 'Invalid response received from deepinfra: {"error":{"message":"{\\"type\\":\\"error\\",\\"error\\":{\\"type\\":\\"invalid_request_error\\",\\"message\\":\\"\\\\\\"thinking.type.enabled\\\\\\" is not supported for this model. Use \\\\\\"thinking.type.adaptive\\\\\\" and \\\\\\"output_config.effort\\\\\\" to control thinking behavior.\\"},\\"request_id\\":\\"req_vrtx_011Cb1VpwJLvP7rZiwonZt9R\\"}"}}', 'error': {'message': 'Invalid response received from deepinfra: {"error":{"message":"{\\"type\\":\\"error\\",\\"error\\":{\\"type\\":\\"invalid_request_error\\",\\"message\\":\\"\\\\\\"thinking.type.enabled\\\\\\" is not supported for this model. Use \\\\\\"thinking.type.adaptive\\\\\\" and \\\\\\"output_config.effort\\\\\\" to control thinking behavior.\\"},\\"request_id\\":\\"req_vrtx_011Cb1VpwJLvP7rZiwonZt9R\\"}"}}', 'type': 'APIError', 'code': '500'}, 'error_origin_level': 'api_error', 'provider': 'deepinfra'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/anthropic-claude-opus-4-7",
    messages=[
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=False,
)

_usage = getattr(response, "usage", None)
_reasoning_detected = False

_choices = getattr(response, "choices", None)
if _choices and len(_choices) > 0:
    _message = getattr(_choices[0], "message", None)
else:
    _message = None

if _message and getattr(_message, "content", None) is not None:
    print(_message.content)

if _usage is not None:
    _output_token_details = getattr(_usage, "completion_tokens_details", None)
    if _output_token_details and getattr(_output_token_details, "reasoning_tokens", 0) > 0:
        _reasoning_detected = True
    elif getattr(_usage, "reasoning", None) is not None:
        _reasoning_detected = True

if getattr(_message, "reasoning_content", None) is not None:
    _reasoning_detected = True
elif getattr(_message, "reasoning", None) is not None:
    _reasoning_detected = True

if not _reasoning_detected:
    print("Response: ", response)
    raise Exception("VALIDATION FAILED: reasoning - no reasoning information in response")
print("VALIDATION: reasoning SUCCESS")

deepinfra/google/gemini-3.1-flash-lite — json-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpdi2hp18l/snippet.py", line 27, in <module>
    _json.loads(_accumulated)
  File "/usr/local/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/google-gemini-3.1-flash-lite",
    messages=[
        {"role": "user", "content": "List 3 colors with their hex codes in JSON."},
    ],
    response_format={"type": "json_object"},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: json-output stream - no content received")

_json.loads(_accumulated)
print("\nVALIDATION: json-output stream SUCCESS")

deepinfra/google/gemini-3.1-flash-lite — json-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpa2cj9ggp/snippet.py", line 22, in <module>
    _json.loads(_content)
  File "/usr/local/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/google-gemini-3.1-flash-lite",
    messages=[
        {"role": "user", "content": "List 3 colors with their hex codes in JSON."},
    ],
    response_format={"type": "json_object"},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: json-output - response content is empty")

_json.loads(_content)
print("VALIDATION: json-output SUCCESS")

deepinfra/anthropic/claude-sonnet-4-6 — json-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp3wd5h8jo/snippet.py", line 22, in <module>
    _json.loads(_content)
  File "/usr/local/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/anthropic-claude-sonnet-4-6",
    messages=[
        {"role": "user", "content": "List 3 colors with their hex codes in JSON."},
    ],
    response_format={"type": "json_object"},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: json-output - response content is empty")

_json.loads(_content)
print("VALIDATION: json-output SUCCESS")

deepinfra/anthropic/claude-sonnet-4-6 — json-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpe62rnxpb/snippet.py", line 27, in <module>
    _json.loads(_accumulated)
  File "/usr/local/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/anthropic-claude-sonnet-4-6",
    messages=[
        {"role": "user", "content": "List 3 colors with their hex codes in JSON."},
    ],
    response_format={"type": "json_object"},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: json-output stream - no content received")

_json.loads(_accumulated)
print("\nVALIDATION: json-output stream SUCCESS")

deepinfra/anthropic/claude-sonnet-4-6 — reasoning (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpcmwwc0bg/snippet.py", line 40, in <module>
    raise Exception("VALIDATION FAILED: reasoning - no reasoning information in response")
Exception: VALIDATION FAILED: reasoning - no reasoning information in response

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/anthropic-claude-sonnet-4-6",
    messages=[
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=False,
)

_usage = getattr(response, "usage", None)
_reasoning_detected = False

_choices = getattr(response, "choices", None)
if _choices and len(_choices) > 0:
    _message = getattr(_choices[0], "message", None)
else:
    _message = None

if _message and getattr(_message, "content", None) is not None:
    print(_message.content)

if _usage is not None:
    _output_token_details = getattr(_usage, "completion_tokens_details", None)
    if _output_token_details and getattr(_output_token_details, "reasoning_tokens", 0) > 0:
        _reasoning_detected = True
    elif getattr(_usage, "reasoning", None) is not None:
        _reasoning_detected = True

if getattr(_message, "reasoning_content", None) is not None:
    _reasoning_detected = True
elif getattr(_message, "reasoning", None) is not None:
    _reasoning_detected = True

if not _reasoning_detected:
    print("Response: ", response)
    raise Exception("VALIDATION FAILED: reasoning - no reasoning information in response")
print("VALIDATION: reasoning SUCCESS")

deepinfra/anthropic/claude-sonnet-4-6 — reasoning:stream (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpga8cwhto/snippet.py", line 32, in <module>
    raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
Exception: VALIDATION FAILED: reasoning stream - no reasoning information in stream

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/anthropic-claude-sonnet-4-6",
    messages=[
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=True,
)

_reasoning_detected = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if getattr(delta, "reasoning_content", None) is not None:
            _reasoning_detected = True
        if getattr(delta, "reasoning", None) is not None:
            _reasoning_detected = True

    _usage = getattr(chunk, "usage", None)
    if _usage is not None:
        _details = getattr(_usage, "completion_tokens_details", None)
        if _details and getattr(_details, "reasoning_tokens", 0) > 0:
            _reasoning_detected = True

if not _reasoning_detected:
    raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
print("\nVALIDATION: reasoning stream SUCCESS")

deepinfra/google/gemini-3.1-pro — json-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpksob_ocr/snippet.py", line 27, in <module>
    _json.loads(_accumulated)
  File "/usr/local/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/google-gemini-3.1-pro",
    messages=[
        {"role": "user", "content": "List 3 colors with their hex codes in JSON."},
    ],
    response_format={"type": "json_object"},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: json-output stream - no content received")

_json.loads(_accumulated)
print("\nVALIDATION: json-output stream SUCCESS")

deepinfra/google/gemini-3.1-pro — json-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpoka0maup/snippet.py", line 22, in <module>
    _json.loads(_content)
  File "/usr/local/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/google-gemini-3.1-pro",
    messages=[
        {"role": "user", "content": "List 3 colors with their hex codes in JSON."},
    ],
    response_format={"type": "json_object"},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: json-output - response content is empty")

_json.loads(_content)
print("VALIDATION: json-output SUCCESS")

Skipped (1)

deepinfra/google/gemini-3-pro-image — skip-check (skipped)

Skip reason:

unsupported mode 'image'

feat(deepinfra): add new models [bot]

6dc3a58

cursor Bot reviewed May 14, 2026

View reviewed changes

feat(deepinfra): update model YAMLs [bot]

3871f6c

cursor Bot reviewed May 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(deepinfra): add new models [bot]#1023

feat(deepinfra): add new models [bot]#1023
models-bot[bot] wants to merge 2 commits into
mainfrom
bot/add-deepinfra-20260514-000356

models-bot Bot commented May 14, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

cursor Bot May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 14, 2026

Uh oh!

harshiv-26 commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

models-bot Bot commented May 14, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

cursor Bot May 14, 2026

Choose a reason for hiding this comment

max_tokens incorrectly equals context_window size

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 14, 2026

Choose a reason for hiding this comment

max_tokens incorrectly equals context_window across all new models

Uh oh!

harshiv-26 commented May 14, 2026

Gateway test results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

models-bot Bot commented May 14, 2026 •

edited by cursor Bot

Loading

`max_tokens` incorrectly equals `context_window` size