Skip to content

[BUG]: LLM proxy sends legacy max_tokens / fixed temperature to Azure OpenAI reasoning models (gpt-5.x, o-series) -> 400 #5021

@simone-gasparini

Description

@simone-gasparini

🐞 Bug Summary

The Azure OpenAI request builder always sends the legacy max_tokens parameter and forwards a non-default temperature. Newer reasoning-class Azure deployments (gpt-5.x, o1/o3/o-series, gpt-chat-latest) reject both, returning HTTP 400. Both the Admin UI Test button and POST /v1/chat/completions fail for these models unless the caller manually omits max_tokens and sets temperature=1.


🧩 Affected Component

  • mcpgateway - API
  • mcpgateway - UI (admin panel)
  • mcpgateway.wrapper - stdio wrapper
  • Federation or Transports
  • CLI, Makefiles, or shell scripts
  • Container setup (Docker/Podman/Compose)
  • Other (explain below)

🔁 Steps to Reproduce

  1. Add an Azure OpenAI provider for a reasoning-class deployment (e.g. gpt-5.4-mini or gpt-chat-latest), api_base = https://<resource>.openai.azure.com.
  2. Admin UI -> test the model (or POST /v1/chat/completions with max_tokens set).
  3. Get 400. Omit max_tokens -> next 400 on temperature (0.7 unsupported). Set temperature=1 AND omit max_tokens -> 200 OK.

Note: the Admin test path defaults max_tokens = body.get("max_tokens", 100) and always sends it, so the Test button cannot succeed against a reasoning model at all.


🤔 Expected Behavior

For Azure OpenAI / OpenAI reasoning-class models the gateway should:

  1. Send max_completion_tokens instead of max_tokens (or omit when unset, rather than defaulting to 100 in the admin test path).
  2. Not force an unsupported temperature - omit it, or only send non-default values when the model accepts them.

Ideally detect reasoning models (by deployment/model name or a provider flag) and map parameters accordingly so the Test button and /v1/chat/completions work out of the box.


📓 Logs / Error Output

LLM request failed: 400 - {"error":{"message":"Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead.","type":"invalid_request_error","param":"max_tokens","code":"unsupported_parameter"}}

LLM request failed: 400 - {"error":{"message":"Unsupported value: 'temperature' does not support 0.7 with this model. Only the default (1) value is supported.","type":"invalid_request_error","param":"temperature","code":"unsupported_value"}}

UI test result:

{ "success": false, "error": "Request failed: 400", "metrics": { "duration": 365 } }

Relevant code:

  • mcpgateway/services/llm_proxy_service.py - _build_azure_request (~L296): builds body with max_tokens.
  • mcpgateway/routers/llm_admin_router.py - admin_test_api (~L531): max_tokens = body.get("max_tokens", 100), always set on the request (~L569).

🧠 Environment Info

Key Value
Version or commit v1.0.2 (ghcr.io/ibm/mcp-context-forge:v1.0.2)
Runtime Python 3.12, Gunicorn (uvicorn workers)
Platform / OS macOS (Docker Desktop)
Container Docker (docker compose)

🧩 Additional Context

  • Provider type: azure_openai, API version 2024-02-15-preview.
  • Workaround: set provider default_temperature=1.0, leave default_max_tokens null, clear the Max Tokens field in the test dialog. Real inference via /v1/chat/completions then returns 200 (verified with gpt-5.4-mini and gpt-chat-latest).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriageIssues / Features awaiting triage

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions