Skip to content

feat(deepinfra): add new models [bot]#1023

Open
models-bot[bot] wants to merge 2 commits into
mainfrom
bot/add-deepinfra-20260514-000356
Open

feat(deepinfra): add new models [bot]#1023
models-bot[bot] wants to merge 2 commits into
mainfrom
bot/add-deepinfra-20260514-000356

Conversation

@models-bot
Copy link
Copy Markdown
Contributor

@models-bot models-bot Bot commented May 14, 2026

Auto-generated by model-addition-agent for provider deepinfra.


Note

Low Risk
Low risk: this PR only adds new provider model metadata files (no code changes), with impact limited to model catalog/selection and cost/limit configuration.

Overview
Registers new DeepInfra-hosted models by adding YAML configs for Anthropic claude-haiku-4-5, claude-sonnet-4-6, and claude-opus-4-7, and Google gemini-3.1-pro, gemini-3.1-flash-lite, plus an image-generation gemini-3-pro-image.

Each config defines token pricing, supported modalities/modes, context/max token limits, and feature flags like function_calling, json_output, and thinking (where applicable).

Reviewed by Cursor Bugbot for commit 3871f6c. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment thread providers/deepinfra/anthropic/claude-haiku-4-5.yaml
region: "*"
limits:
context_window: 200000
max_tokens: 200000
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_tokens incorrectly equals context_window size

Medium Severity

The max_tokens value is set equal to the context_window for each new Anthropic model, which is incorrect. The canonical Anthropic provider configs show claude-haiku-4-5 has max_tokens: 64000 (not 200000), claude-opus-4-7 has max_tokens: 128000 (not 1000000), and claude-sonnet-4-6 has max_tokens: 64000 (not 1000000). These inflated values could mislead consumers about actual output token limits.

Additional Locations (2)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 6dc3a58. Configure here.

@github-actions
Copy link
Copy Markdown
Contributor

/test-models

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 3871f6c. Configure here.

- json_output
limits:
context_window: 1000000
max_tokens: 1000000
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_tokens incorrectly equals context_window across all new models

High Severity

The max_tokens value is set equal to context_window in all new model files, but this is incorrect. The canonical provider definitions show much lower actual limits: claude-opus-4-7 has max_tokens: 128000 (not 1000000), claude-sonnet-4-6 has max_tokens: 64000 (not 1000000), and claude-haiku-4-5 has max_tokens: 64000 (not 200000). The same issue applies to the Gemini models — gemini-3.1-pro and gemini-3.1-flash-lite both set max_tokens: 1000000 while the canonical Google Gemini definitions specify max_tokens: 65536. Consumers relying on these values will believe the models can produce far more output tokens than they actually support.

Additional Locations (2)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 3871f6c. Configure here.

@harshiv-26
Copy link
Copy Markdown
Collaborator

Gateway test results

  • Total: 41
  • Passed: 24
  • Failed: 11
  • Validation failed: 5
  • Errored: 0
  • Skipped: 1
  • Success rate: 60.0%
Provider Model Scenarios
deepinfra anthropic/claude-haiku-4-5 success: tool-call, tool-call:stream, params:stream, params

failure: json-output, json-output:stream

validation_failure: reasoning, reasoning:stream
deepinfra anthropic/claude-opus-4-7 success: params, tool-call:stream, tool-call, params:stream

failure: json-output:stream, json-output, reasoning

validation_failure: reasoning:stream
deepinfra anthropic/claude-sonnet-4-6 success: tool-call:stream, params:stream, params, tool-call

failure: json-output, json-output:stream

validation_failure: reasoning, reasoning:stream
deepinfra google/gemini-3-pro-image skipped: skip-check
deepinfra google/gemini-3.1-flash-lite success: tool-call, params:stream, params, tool-call:stream, reasoning, reasoning:stream

failure: json-output:stream, json-output
deepinfra google/gemini-3.1-pro success: params, tool-call, params:stream, tool-call:stream, reasoning:stream, reasoning

failure: json-output:stream, json-output
Failures (16)

deepinfra/anthropic/claude-haiku-4-5 — json-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp866oiriy/snippet.py", line 22, in <module>
    _json.loads(_content)
  File "/usr/local/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Code snippet
from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/anthropic-claude-haiku-4-5",
    messages=[
        {"role": "user", "content": "List 3 colors with their hex codes in JSON."},
    ],
    response_format={"type": "json_object"},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: json-output - response content is empty")

_json.loads(_content)
print("VALIDATION: json-output SUCCESS")

deepinfra/anthropic/claude-haiku-4-5 — json-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpy4sv9uya/snippet.py", line 27, in <module>
    _json.loads(_accumulated)
  File "/usr/local/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Code snippet
from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/anthropic-claude-haiku-4-5",
    messages=[
        {"role": "user", "content": "List 3 colors with their hex codes in JSON."},
    ],
    response_format={"type": "json_object"},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: json-output stream - no content received")

_json.loads(_accumulated)
print("\nVALIDATION: json-output stream SUCCESS")

deepinfra/anthropic/claude-haiku-4-5 — reasoning (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp4lusjfh6/snippet.py", line 40, in <module>
    raise Exception("VALIDATION FAILED: reasoning - no reasoning information in response")
Exception: VALIDATION FAILED: reasoning - no reasoning information in response
Code snippet
from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/anthropic-claude-haiku-4-5",
    messages=[
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=False,
)

_usage = getattr(response, "usage", None)
_reasoning_detected = False

_choices = getattr(response, "choices", None)
if _choices and len(_choices) > 0:
    _message = getattr(_choices[0], "message", None)
else:
    _message = None

if _message and getattr(_message, "content", None) is not None:
    print(_message.content)

if _usage is not None:
    _output_token_details = getattr(_usage, "completion_tokens_details", None)
    if _output_token_details and getattr(_output_token_details, "reasoning_tokens", 0) > 0:
        _reasoning_detected = True
    elif getattr(_usage, "reasoning", None) is not None:
        _reasoning_detected = True

if getattr(_message, "reasoning_content", None) is not None:
    _reasoning_detected = True
elif getattr(_message, "reasoning", None) is not None:
    _reasoning_detected = True

if not _reasoning_detected:
    print("Response: ", response)
    raise Exception("VALIDATION FAILED: reasoning - no reasoning information in response")
print("VALIDATION: reasoning SUCCESS")

deepinfra/anthropic/claude-haiku-4-5 — reasoning:stream (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpogv1288v/snippet.py", line 32, in <module>
    raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
Exception: VALIDATION FAILED: reasoning stream - no reasoning information in stream
Code snippet
from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/anthropic-claude-haiku-4-5",
    messages=[
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=True,
)

_reasoning_detected = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if getattr(delta, "reasoning_content", None) is not None:
            _reasoning_detected = True
        if getattr(delta, "reasoning", None) is not None:
            _reasoning_detected = True

    _usage = getattr(chunk, "usage", None)
    if _usage is not None:
        _details = getattr(_usage, "completion_tokens_details", None)
        if _details and getattr(_details, "reasoning_tokens", 0) > 0:
            _reasoning_detected = True

if not _reasoning_detected:
    raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
print("\nVALIDATION: reasoning stream SUCCESS")

deepinfra/anthropic/claude-opus-4-7 — reasoning:stream (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpwyh_ebz8/snippet.py", line 32, in <module>
    raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
Exception: VALIDATION FAILED: reasoning stream - no reasoning information in stream
Code snippet
from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/anthropic-claude-opus-4-7",
    messages=[
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=True,
)

_reasoning_detected = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if getattr(delta, "reasoning_content", None) is not None:
            _reasoning_detected = True
        if getattr(delta, "reasoning", None) is not None:
            _reasoning_detected = True

    _usage = getattr(chunk, "usage", None)
    if _usage is not None:
        _details = getattr(_usage, "completion_tokens_details", None)
        if _details and getattr(_details, "reasoning_tokens", 0) > 0:
            _reasoning_detected = True

if not _reasoning_detected:
    raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
print("\nVALIDATION: reasoning stream SUCCESS")

deepinfra/anthropic/claude-opus-4-7 — json-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpcc2ly3qb/snippet.py", line 27, in <module>
    _json.loads(_accumulated)
  File "/usr/local/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Code snippet
from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/anthropic-claude-opus-4-7",
    messages=[
        {"role": "user", "content": "List 3 colors with their hex codes in JSON."},
    ],
    response_format={"type": "json_object"},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: json-output stream - no content received")

_json.loads(_accumulated)
print("\nVALIDATION: json-output stream SUCCESS")

deepinfra/anthropic/claude-opus-4-7 — json-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpzgzg56oe/snippet.py", line 22, in <module>
    _json.loads(_content)
  File "/usr/local/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Code snippet
from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/anthropic-claude-opus-4-7",
    messages=[
        {"role": "user", "content": "List 3 colors with their hex codes in JSON."},
    ],
    response_format={"type": "json_object"},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: json-output - response content is empty")

_json.loads(_content)
print("VALIDATION: json-output SUCCESS")

deepinfra/anthropic/claude-opus-4-7 — reasoning (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmps6r_pm4p/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Error code: 500 - {'status': 'failure', 'message': 'Invalid response received from deepinfra: {"error":{"message":"{\\"type\\":\\"error\\",\\"error\\":{\\"type\\":\\"invalid_request_error\\",\\"message\\":\\"\\\\\\"thinking.type.enabled\\\\\\" is not supported for this model. Use \\\\\\"thinking.type.adaptive\\\\\\" and \\\\\\"output_config.effort\\\\\\" to control thinking behavior.\\"},\\"request_id\\":\\"req_vrtx_011Cb1VpwJLvP7rZiwonZt9R\\"}"}}', 'error': {'message': 'Invalid response received from deepinfra: {"error":{"message":"{\\"type\\":\\"error\\",\\"error\\":{\\"type\\":\\"invalid_request_error\\",\\"message\\":\\"\\\\\\"thinking.type.enabled\\\\\\" is not supported for this model. Use \\\\\\"thinking.type.adaptive\\\\\\" and \\\\\\"output_config.effort\\\\\\" to control thinking behavior.\\"},\\"request_id\\":\\"req_vrtx_011Cb1VpwJLvP7rZiwonZt9R\\"}"}}', 'type': 'APIError', 'code': '500'}, 'error_origin_level': 'api_error', 'provider': 'deepinfra'}
Code snippet
from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/anthropic-claude-opus-4-7",
    messages=[
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=False,
)

_usage = getattr(response, "usage", None)
_reasoning_detected = False

_choices = getattr(response, "choices", None)
if _choices and len(_choices) > 0:
    _message = getattr(_choices[0], "message", None)
else:
    _message = None

if _message and getattr(_message, "content", None) is not None:
    print(_message.content)

if _usage is not None:
    _output_token_details = getattr(_usage, "completion_tokens_details", None)
    if _output_token_details and getattr(_output_token_details, "reasoning_tokens", 0) > 0:
        _reasoning_detected = True
    elif getattr(_usage, "reasoning", None) is not None:
        _reasoning_detected = True

if getattr(_message, "reasoning_content", None) is not None:
    _reasoning_detected = True
elif getattr(_message, "reasoning", None) is not None:
    _reasoning_detected = True

if not _reasoning_detected:
    print("Response: ", response)
    raise Exception("VALIDATION FAILED: reasoning - no reasoning information in response")
print("VALIDATION: reasoning SUCCESS")

deepinfra/google/gemini-3.1-flash-lite — json-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpdi2hp18l/snippet.py", line 27, in <module>
    _json.loads(_accumulated)
  File "/usr/local/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Code snippet
from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/google-gemini-3.1-flash-lite",
    messages=[
        {"role": "user", "content": "List 3 colors with their hex codes in JSON."},
    ],
    response_format={"type": "json_object"},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: json-output stream - no content received")

_json.loads(_accumulated)
print("\nVALIDATION: json-output stream SUCCESS")

deepinfra/google/gemini-3.1-flash-lite — json-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpa2cj9ggp/snippet.py", line 22, in <module>
    _json.loads(_content)
  File "/usr/local/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Code snippet
from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/google-gemini-3.1-flash-lite",
    messages=[
        {"role": "user", "content": "List 3 colors with their hex codes in JSON."},
    ],
    response_format={"type": "json_object"},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: json-output - response content is empty")

_json.loads(_content)
print("VALIDATION: json-output SUCCESS")

deepinfra/anthropic/claude-sonnet-4-6 — json-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp3wd5h8jo/snippet.py", line 22, in <module>
    _json.loads(_content)
  File "/usr/local/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Code snippet
from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/anthropic-claude-sonnet-4-6",
    messages=[
        {"role": "user", "content": "List 3 colors with their hex codes in JSON."},
    ],
    response_format={"type": "json_object"},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: json-output - response content is empty")

_json.loads(_content)
print("VALIDATION: json-output SUCCESS")

deepinfra/anthropic/claude-sonnet-4-6 — json-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpe62rnxpb/snippet.py", line 27, in <module>
    _json.loads(_accumulated)
  File "/usr/local/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Code snippet
from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/anthropic-claude-sonnet-4-6",
    messages=[
        {"role": "user", "content": "List 3 colors with their hex codes in JSON."},
    ],
    response_format={"type": "json_object"},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: json-output stream - no content received")

_json.loads(_accumulated)
print("\nVALIDATION: json-output stream SUCCESS")

deepinfra/anthropic/claude-sonnet-4-6 — reasoning (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpcmwwc0bg/snippet.py", line 40, in <module>
    raise Exception("VALIDATION FAILED: reasoning - no reasoning information in response")
Exception: VALIDATION FAILED: reasoning - no reasoning information in response
Code snippet
from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/anthropic-claude-sonnet-4-6",
    messages=[
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=False,
)

_usage = getattr(response, "usage", None)
_reasoning_detected = False

_choices = getattr(response, "choices", None)
if _choices and len(_choices) > 0:
    _message = getattr(_choices[0], "message", None)
else:
    _message = None

if _message and getattr(_message, "content", None) is not None:
    print(_message.content)

if _usage is not None:
    _output_token_details = getattr(_usage, "completion_tokens_details", None)
    if _output_token_details and getattr(_output_token_details, "reasoning_tokens", 0) > 0:
        _reasoning_detected = True
    elif getattr(_usage, "reasoning", None) is not None:
        _reasoning_detected = True

if getattr(_message, "reasoning_content", None) is not None:
    _reasoning_detected = True
elif getattr(_message, "reasoning", None) is not None:
    _reasoning_detected = True

if not _reasoning_detected:
    print("Response: ", response)
    raise Exception("VALIDATION FAILED: reasoning - no reasoning information in response")
print("VALIDATION: reasoning SUCCESS")

deepinfra/anthropic/claude-sonnet-4-6 — reasoning:stream (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpga8cwhto/snippet.py", line 32, in <module>
    raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
Exception: VALIDATION FAILED: reasoning stream - no reasoning information in stream
Code snippet
from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/anthropic-claude-sonnet-4-6",
    messages=[
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=True,
)

_reasoning_detected = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if getattr(delta, "reasoning_content", None) is not None:
            _reasoning_detected = True
        if getattr(delta, "reasoning", None) is not None:
            _reasoning_detected = True

    _usage = getattr(chunk, "usage", None)
    if _usage is not None:
        _details = getattr(_usage, "completion_tokens_details", None)
        if _details and getattr(_details, "reasoning_tokens", 0) > 0:
            _reasoning_detected = True

if not _reasoning_detected:
    raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
print("\nVALIDATION: reasoning stream SUCCESS")

deepinfra/google/gemini-3.1-pro — json-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpksob_ocr/snippet.py", line 27, in <module>
    _json.loads(_accumulated)
  File "/usr/local/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Code snippet
from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/google-gemini-3.1-pro",
    messages=[
        {"role": "user", "content": "List 3 colors with their hex codes in JSON."},
    ],
    response_format={"type": "json_object"},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: json-output stream - no content received")

_json.loads(_accumulated)
print("\nVALIDATION: json-output stream SUCCESS")

deepinfra/google/gemini-3.1-pro — json-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpoka0maup/snippet.py", line 22, in <module>
    _json.loads(_content)
  File "/usr/local/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Code snippet
from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/google-gemini-3.1-pro",
    messages=[
        {"role": "user", "content": "List 3 colors with their hex codes in JSON."},
    ],
    response_format={"type": "json_object"},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: json-output - response content is empty")

_json.loads(_content)
print("VALIDATION: json-output SUCCESS")
Skipped (1)

deepinfra/google/gemini-3-pro-image — skip-check (skipped)

Skip reason:

unsupported mode 'image'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant