🐞 Bug Summary
The Azure OpenAI request builder always sends the legacy max_tokens parameter and forwards a non-default temperature. Newer reasoning-class Azure deployments (gpt-5.x, o1/o3/o-series, gpt-chat-latest) reject both, returning HTTP 400. Both the Admin UI Test button and POST /v1/chat/completions fail for these models unless the caller manually omits max_tokens and sets temperature=1.
🧩 Affected Component
🔁 Steps to Reproduce
- Add an Azure OpenAI provider for a reasoning-class deployment (e.g.
gpt-5.4-mini or gpt-chat-latest), api_base = https://<resource>.openai.azure.com.
- Admin UI -> test the model (or
POST /v1/chat/completions with max_tokens set).
- Get
400. Omit max_tokens -> next 400 on temperature (0.7 unsupported). Set temperature=1 AND omit max_tokens -> 200 OK.
Note: the Admin test path defaults max_tokens = body.get("max_tokens", 100) and always sends it, so the Test button cannot succeed against a reasoning model at all.
🤔 Expected Behavior
For Azure OpenAI / OpenAI reasoning-class models the gateway should:
- Send
max_completion_tokens instead of max_tokens (or omit when unset, rather than defaulting to 100 in the admin test path).
- Not force an unsupported
temperature - omit it, or only send non-default values when the model accepts them.
Ideally detect reasoning models (by deployment/model name or a provider flag) and map parameters accordingly so the Test button and /v1/chat/completions work out of the box.
📓 Logs / Error Output
LLM request failed: 400 - {"error":{"message":"Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead.","type":"invalid_request_error","param":"max_tokens","code":"unsupported_parameter"}}
LLM request failed: 400 - {"error":{"message":"Unsupported value: 'temperature' does not support 0.7 with this model. Only the default (1) value is supported.","type":"invalid_request_error","param":"temperature","code":"unsupported_value"}}
UI test result:
{ "success": false, "error": "Request failed: 400", "metrics": { "duration": 365 } }
Relevant code:
mcpgateway/services/llm_proxy_service.py - _build_azure_request (~L296): builds body with max_tokens.
mcpgateway/routers/llm_admin_router.py - admin_test_api (~L531): max_tokens = body.get("max_tokens", 100), always set on the request (~L569).
🧠 Environment Info
| Key |
Value |
| Version or commit |
v1.0.2 (ghcr.io/ibm/mcp-context-forge:v1.0.2) |
| Runtime |
Python 3.12, Gunicorn (uvicorn workers) |
| Platform / OS |
macOS (Docker Desktop) |
| Container |
Docker (docker compose) |
🧩 Additional Context
- Provider type:
azure_openai, API version 2024-02-15-preview.
- Workaround: set provider
default_temperature=1.0, leave default_max_tokens null, clear the Max Tokens field in the test dialog. Real inference via /v1/chat/completions then returns 200 (verified with gpt-5.4-mini and gpt-chat-latest).
🐞 Bug Summary
The Azure OpenAI request builder always sends the legacy
max_tokensparameter and forwards a non-defaulttemperature. Newer reasoning-class Azure deployments (gpt-5.x, o1/o3/o-series,gpt-chat-latest) reject both, returning HTTP 400. Both the Admin UI Test button andPOST /v1/chat/completionsfail for these models unless the caller manually omitsmax_tokensand setstemperature=1.🧩 Affected Component
mcpgateway- APImcpgateway- UI (admin panel)mcpgateway.wrapper- stdio wrapper🔁 Steps to Reproduce
gpt-5.4-miniorgpt-chat-latest),api_base = https://<resource>.openai.azure.com.POST /v1/chat/completionswithmax_tokensset).400. Omitmax_tokens-> next400ontemperature(0.7 unsupported). Settemperature=1AND omitmax_tokens->200 OK.Note: the Admin test path defaults
max_tokens = body.get("max_tokens", 100)and always sends it, so the Test button cannot succeed against a reasoning model at all.🤔 Expected Behavior
For Azure OpenAI / OpenAI reasoning-class models the gateway should:
max_completion_tokensinstead ofmax_tokens(or omit when unset, rather than defaulting to 100 in the admin test path).temperature- omit it, or only send non-default values when the model accepts them.Ideally detect reasoning models (by deployment/model name or a provider flag) and map parameters accordingly so the Test button and
/v1/chat/completionswork out of the box.📓 Logs / Error Output
UI test result:
{ "success": false, "error": "Request failed: 400", "metrics": { "duration": 365 } }Relevant code:
mcpgateway/services/llm_proxy_service.py-_build_azure_request(~L296): builds body withmax_tokens.mcpgateway/routers/llm_admin_router.py-admin_test_api(~L531):max_tokens = body.get("max_tokens", 100), always set on the request (~L569).🧠 Environment Info
v1.0.2(ghcr.io/ibm/mcp-context-forge:v1.0.2)Python 3.12, Gunicorn (uvicorn workers)macOS (Docker Desktop)Docker (docker compose)🧩 Additional Context
azure_openai, API version2024-02-15-preview.default_temperature=1.0, leavedefault_max_tokensnull, clear the Max Tokens field in the test dialog. Real inference via/v1/chat/completionsthen returns200(verified withgpt-5.4-miniandgpt-chat-latest).