Skip to content

feat(together-ai): update model YAMLs [bot]#1016

Merged
harshiv-26 merged 3 commits into
mainfrom
bot/update-together-ai-20260513-022331
May 13, 2026
Merged

feat(together-ai): update model YAMLs [bot]#1016
harshiv-26 merged 3 commits into
mainfrom
bot/update-together-ai-20260513-022331

Conversation

@models-bot
Copy link
Copy Markdown
Contributor

@models-bot models-bot Bot commented May 13, 2026

Auto-generated by poc-agent for provider together-ai.


Note

Low Risk
Low risk metadata-only updates to Together.ai model YAMLs; main risk is incorrect capabilities (modalities/context window/features) causing misrouting or failed requests.

Overview
Updates Together.ai’s NVIDIA Nemotron model YAMLs to better describe capabilities and availability.

NVIDIA-Nemotron-3-Super-120B-A12B-BF16 now declares function_calling/system_messages, adds source URLs, and flags thinking.

nemotron-3-nano-omni-30b-a3b-reasoning-fp8 is changed from unknown to chat, increases context_window to 256000, declares multimodal inputs, and adds provisioning/source/status metadata plus thinking.

Reviewed by Cursor Bugbot for commit f391da8. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions
Copy link
Copy Markdown
Contributor

/test-models

@harshiv-26
Copy link
Copy Markdown
Collaborator

Gateway test results

  • Total: 7
  • Passed: 0
  • Failed: 6
  • Validation failed: 0
  • Errored: 0
  • Skipped: 1
  • Success rate: 0.0%
Provider Model Scenarios
together-ai nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 skipped: skip-check
together-ai nvidia/nemotron-3-nano-omni-30b-a3b-reasoning-fp8 failure: reasoning, params:stream, params, tool-call, reasoning:stream, tool-call:stream
Failures (6)

together-ai/nvidia/nemotron-3-nano-omni-30b-a3b-reasoning-fp8 — reasoning (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpw92su60d/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'status': 'failure', 'message': 'together-ai error: Unable to access non-serverless model nvidia/nemotron-3-nano-omni-30b-a3b-reasoning-fp8. Please visit https://api.together.ai/models/nvidia/nemotron-3-nano-omni-30b-a3b-reasoning-fp8 to create and start a new dedicated endpoint for the model.', 'error': {'message': 'together-ai error: Unable to access non-serverless model nvidia/nemotron-3-nano-omni-30b-a3b-reasoning-fp8. Please visit https://api.together.ai/models/nvidia/nemotron-3-nano-omni-30b-a3b-reasoning-fp8 to create and start a new dedicated endpoint for the model.', 'type': 'APIError', 'code': '400'}, 'error_origin_level': 'api_error', 'provider': 'together-ai'}
Code snippet
from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-together-ai/nvidia-nemotron-3-nano-omni-30b-a3b-reasoning-fp8",
    messages=[
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=False,
)

_usage = getattr(response, "usage", None)
_reasoning_detected = False

_choices = getattr(response, "choices", None)
if _choices and len(_choices) > 0:
    _message = getattr(_choices[0], "message", None)
else:
    _message = None

if _message and getattr(_message, "content", None) is not None:
    print(_message.content)

if _usage is not None:
    _output_token_details = getattr(_usage, "completion_tokens_details", None)
    if _output_token_details and getattr(_output_token_details, "reasoning_tokens", 0) > 0:
        _reasoning_detected = True
    elif getattr(_usage, "reasoning", None) is not None:
        _reasoning_detected = True

if getattr(_message, "reasoning_content", None) is not None:
    _reasoning_detected = True
elif getattr(_message, "reasoning", None) is not None:
    _reasoning_detected = True

if not _reasoning_detected:
    print("Response: ", response)
    raise Exception("VALIDATION FAILED: reasoning - no reasoning information in response")
print("VALIDATION: reasoning SUCCESS")

together-ai/nvidia/nemotron-3-nano-omni-30b-a3b-reasoning-fp8 — params:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpuknawst9/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'status': 'failure', 'message': 'together-ai error: Unable to access non-serverless model nvidia/nemotron-3-nano-omni-30b-a3b-reasoning-fp8. Please visit https://api.together.ai/models/nvidia/nemotron-3-nano-omni-30b-a3b-reasoning-fp8 to create and start a new dedicated endpoint for the model.', 'error': {'message': 'together-ai error: Unable to access non-serverless model nvidia/nemotron-3-nano-omni-30b-a3b-reasoning-fp8. Please visit https://api.together.ai/models/nvidia/nemotron-3-nano-omni-30b-a3b-reasoning-fp8 to create and start a new dedicated endpoint for the model.', 'type': 'APIError', 'code': '400'}, 'error_origin_level': 'api_error', 'provider': 'together-ai'}
Code snippet
from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-together-ai/nvidia-nemotron-3-nano-omni-30b-a3b-reasoning-fp8",
    messages=[
        {"role": "user", "content": "What is the capital of France?"},
    ],
    stream=True,
)

for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)

together-ai/nvidia/nemotron-3-nano-omni-30b-a3b-reasoning-fp8 — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmploe9ury_/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'status': 'failure', 'message': 'together-ai error: Unable to access non-serverless model nvidia/nemotron-3-nano-omni-30b-a3b-reasoning-fp8. Please visit https://api.together.ai/models/nvidia/nemotron-3-nano-omni-30b-a3b-reasoning-fp8 to create and start a new dedicated endpoint for the model.', 'error': {'message': 'together-ai error: Unable to access non-serverless model nvidia/nemotron-3-nano-omni-30b-a3b-reasoning-fp8. Please visit https://api.together.ai/models/nvidia/nemotron-3-nano-omni-30b-a3b-reasoning-fp8 to create and start a new dedicated endpoint for the model.', 'type': 'APIError', 'code': '400'}, 'error_origin_level': 'api_error', 'provider': 'together-ai'}
Code snippet
from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-together-ai/nvidia-nemotron-3-nano-omni-30b-a3b-reasoning-fp8",
    messages=[
        {"role": "user", "content": "What is the capital of France?"},
    ],
    stream=False,
)

print(response.choices[0].message.content)

together-ai/nvidia/nemotron-3-nano-omni-30b-a3b-reasoning-fp8 — tool-call (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp1d5p2ma9/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'status': 'failure', 'message': 'together-ai error: Unable to access non-serverless model nvidia/nemotron-3-nano-omni-30b-a3b-reasoning-fp8. Please visit https://api.together.ai/models/nvidia/nemotron-3-nano-omni-30b-a3b-reasoning-fp8 to create and start a new dedicated endpoint for the model.', 'error': {'message': 'together-ai error: Unable to access non-serverless model nvidia/nemotron-3-nano-omni-30b-a3b-reasoning-fp8. Please visit https://api.together.ai/models/nvidia/nemotron-3-nano-omni-30b-a3b-reasoning-fp8 to create and start a new dedicated endpoint for the model.', 'type': 'APIError', 'code': '400'}, 'error_origin_level': 'api_error', 'provider': 'together-ai'}
Code snippet
from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-together-ai/nvidia-nemotron-3-nano-omni-30b-a3b-reasoning-fp8",
    messages=[
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=False,
)

_message = response.choices[0].message
if _message.tool_calls:
    for _tc in _message.tool_calls:
        print(f"Function: {_tc.function.name}")
        print(f"Arguments: {_tc.function.arguments}")
else:
    print(_message.content)

if not _message.tool_calls or len(_message.tool_calls) == 0:
    raise Exception("VALIDATION FAILED: tool-call - no tool calls in response")
print("VALIDATION: tool-call SUCCESS")

together-ai/nvidia/nemotron-3-nano-omni-30b-a3b-reasoning-fp8 — reasoning:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpf9rdtpag/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'status': 'failure', 'message': 'together-ai error: Unable to access non-serverless model nvidia/nemotron-3-nano-omni-30b-a3b-reasoning-fp8. Please visit https://api.together.ai/models/nvidia/nemotron-3-nano-omni-30b-a3b-reasoning-fp8 to create and start a new dedicated endpoint for the model.', 'error': {'message': 'together-ai error: Unable to access non-serverless model nvidia/nemotron-3-nano-omni-30b-a3b-reasoning-fp8. Please visit https://api.together.ai/models/nvidia/nemotron-3-nano-omni-30b-a3b-reasoning-fp8 to create and start a new dedicated endpoint for the model.', 'type': 'APIError', 'code': '400'}, 'error_origin_level': 'api_error', 'provider': 'together-ai'}
Code snippet
from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-together-ai/nvidia-nemotron-3-nano-omni-30b-a3b-reasoning-fp8",
    messages=[
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=True,
)

_reasoning_detected = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if getattr(delta, "reasoning_content", None) is not None:
            _reasoning_detected = True
        if getattr(delta, "reasoning", None) is not None:
            _reasoning_detected = True

    _usage = getattr(chunk, "usage", None)
    if _usage is not None:
        _details = getattr(_usage, "completion_tokens_details", None)
        if _details and getattr(_details, "reasoning_tokens", 0) > 0:
            _reasoning_detected = True

if not _reasoning_detected:
    raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
print("\nVALIDATION: reasoning stream SUCCESS")

together-ai/nvidia/nemotron-3-nano-omni-30b-a3b-reasoning-fp8 — tool-call:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpr6lfrwq4/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'status': 'failure', 'message': 'together-ai error: Unable to access non-serverless model nvidia/nemotron-3-nano-omni-30b-a3b-reasoning-fp8. Please visit https://api.together.ai/models/nvidia/nemotron-3-nano-omni-30b-a3b-reasoning-fp8 to create and start a new dedicated endpoint for the model.', 'error': {'message': 'together-ai error: Unable to access non-serverless model nvidia/nemotron-3-nano-omni-30b-a3b-reasoning-fp8. Please visit https://api.together.ai/models/nvidia/nemotron-3-nano-omni-30b-a3b-reasoning-fp8 to create and start a new dedicated endpoint for the model.', 'type': 'APIError', 'code': '400'}, 'error_origin_level': 'api_error', 'provider': 'together-ai'}
Code snippet
from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-together-ai/nvidia-nemotron-3-nano-omni-30b-a3b-reasoning-fp8",
    messages=[
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=True,
)

_tool_calls_made = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if delta.tool_calls:
            _tool_calls_made = True
            for _tc in delta.tool_calls:
                if _tc.function:
                    print(_tc.function.arguments or "", end="", flush=True)

if not _tool_calls_made:
    raise Exception("VALIDATION FAILED: tool-call stream - no tool calls received")
print("\nVALIDATION: tool-call stream SUCCESS")
Skipped (1)

together-ai/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 — skip-check (skipped)

Skip reason:

Provisioned model

@github-actions
Copy link
Copy Markdown
Contributor

/test-models

@harshiv-26 harshiv-26 enabled auto-merge (squash) May 13, 2026 12:46
@github-actions
Copy link
Copy Markdown
Contributor

/test-models

@harshiv-26 harshiv-26 merged commit cbfa9a0 into main May 13, 2026
8 checks passed
@harshiv-26 harshiv-26 deleted the bot/update-together-ai-20260513-022331 branch May 13, 2026 12:47
@harshiv-26
Copy link
Copy Markdown
Collaborator

Gateway test results

  • Total: 2
  • Passed: 0
  • Failed: 0
  • Validation failed: 0
  • Errored: 0
  • Skipped: 2
  • Success rate: 0.0%
Provider Model Scenarios
together-ai nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 skipped: skip-check
together-ai nvidia/nemotron-3-nano-omni-30b-a3b-reasoning-fp8 skipped: skip-check
Skipped (2)

together-ai/nvidia/nemotron-3-nano-omni-30b-a3b-reasoning-fp8 — skip-check (skipped)

Skip reason:

Provisioned model

together-ai/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 — skip-check (skipped)

Skip reason:

Provisioned model

1 similar comment
@harshiv-26
Copy link
Copy Markdown
Collaborator

Gateway test results

  • Total: 2
  • Passed: 0
  • Failed: 0
  • Validation failed: 0
  • Errored: 0
  • Skipped: 2
  • Success rate: 0.0%
Provider Model Scenarios
together-ai nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 skipped: skip-check
together-ai nvidia/nemotron-3-nano-omni-30b-a3b-reasoning-fp8 skipped: skip-check
Skipped (2)

together-ai/nvidia/nemotron-3-nano-omni-30b-a3b-reasoning-fp8 — skip-check (skipped)

Skip reason:

Provisioned model

together-ai/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 — skip-check (skipped)

Skip reason:

Provisioned model

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit f391da8. Configure here.

region: "*"
features:
- function_calling
- system_messages
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BF16 model missing features present in equivalent FP8 variant

Medium Severity

The newly added features list for the BF16 variant only includes function_calling and system_messages, while the equivalent FP8 variant (NVIDIA-Nemotron-3-Super-120B-A12B-FP8.yaml) declares function_calling, tool_choice, structured_output, and system_messages. Since both are the same base model at different quantization levels (and both are provisioned), the BF16 variant is likely missing tool_choice and structured_output. This could cause the gateway to incorrectly withhold these capabilities for the BF16 model.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit f391da8. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant