[BUG]: LLM proxy sends legacy max_tokens / fixed temperature to Azure OpenAI reasoning models (gpt-5.x, o-series) -> 400

### 🐞 Bug Summary
The Azure OpenAI request builder always sends the legacy `max_tokens` parameter and forwards a non-default `temperature`. Newer reasoning-class Azure deployments (gpt-5.x, o1/o3/o-series, `gpt-chat-latest`) reject both, returning HTTP 400. Both the Admin UI **Test** button and `POST /v1/chat/completions` fail for these models unless the caller manually omits `max_tokens` and sets `temperature=1`.

---

### 🧩 Affected Component

- [x] `mcpgateway` - API
- [x] `mcpgateway` - UI (admin panel)
- [ ] `mcpgateway.wrapper` - stdio wrapper
- [ ] Federation or Transports
- [ ] CLI, Makefiles, or shell scripts
- [ ] Container setup (Docker/Podman/Compose)
- [ ] Other (explain below)

---

### 🔁 Steps to Reproduce

1. Add an Azure OpenAI provider for a reasoning-class deployment (e.g. `gpt-5.4-mini` or `gpt-chat-latest`), `api_base = https://<resource>.openai.azure.com`.
2. Admin UI -> test the model (or `POST /v1/chat/completions` with `max_tokens` set).
3. Get `400`. Omit `max_tokens` -> next `400` on `temperature` (0.7 unsupported). Set `temperature=1` AND omit `max_tokens` -> `200 OK`.

Note: the Admin test path defaults `max_tokens = body.get("max_tokens", 100)` and always sends it, so the Test button cannot succeed against a reasoning model at all.

---

### 🤔 Expected Behavior

For Azure OpenAI / OpenAI reasoning-class models the gateway should:
1. Send `max_completion_tokens` instead of `max_tokens` (or omit when unset, rather than defaulting to 100 in the admin test path).
2. Not force an unsupported `temperature` - omit it, or only send non-default values when the model accepts them.

Ideally detect reasoning models (by deployment/model name or a provider flag) and map parameters accordingly so the Test button and `/v1/chat/completions` work out of the box.

---

### 📓 Logs / Error Output
```
LLM request failed: 400 - {"error":{"message":"Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead.","type":"invalid_request_error","param":"max_tokens","code":"unsupported_parameter"}}

LLM request failed: 400 - {"error":{"message":"Unsupported value: 'temperature' does not support 0.7 with this model. Only the default (1) value is supported.","type":"invalid_request_error","param":"temperature","code":"unsupported_value"}}
```
UI test result:
```json
{ "success": false, "error": "Request failed: 400", "metrics": { "duration": 365 } }
```

Relevant code:
- `mcpgateway/services/llm_proxy_service.py` - `_build_azure_request` (~L296): builds body with `max_tokens`.
- `mcpgateway/routers/llm_admin_router.py` - `admin_test_api` (~L531): `max_tokens = body.get("max_tokens", 100)`, always set on the request (~L569).

---

### 🧠 Environment Info

| Key | Value |
|-----|-------|
| Version or commit | `v1.0.2` (ghcr.io/ibm/mcp-context-forge:v1.0.2) |
| Runtime | `Python 3.12, Gunicorn (uvicorn workers)` |
| Platform / OS | `macOS (Docker Desktop)` |
| Container | `Docker (docker compose)` |

---

### 🧩 Additional Context
- Provider type: `azure_openai`, API version `2024-02-15-preview`.
- Workaround: set provider `default_temperature=1.0`, leave `default_max_tokens` null, clear the **Max Tokens** field in the test dialog. Real inference via `/v1/chat/completions` then returns `200` (verified with `gpt-5.4-mini` and `gpt-chat-latest`).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: LLM proxy sends legacy max_tokens / fixed temperature to Azure OpenAI reasoning models (gpt-5.x, o-series) -> 400 #5021

🐞 Bug Summary

🧩 Affected Component

🔁 Steps to Reproduce

🤔 Expected Behavior

📓 Logs / Error Output

🧠 Environment Info

🧩 Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Key	Value
Version or commit	`v1.0.2` (ghcr.io/ibm/mcp-context-forge:v1.0.2)
Runtime	`Python 3.12, Gunicorn (uvicorn workers)`
Platform / OS	`macOS (Docker Desktop)`
Container	`Docker (docker compose)`

[BUG]: LLM proxy sends legacy max_tokens / fixed temperature to Azure OpenAI reasoning models (gpt-5.x, o-series) -> 400 #5021

Description

🐞 Bug Summary

🧩 Affected Component

🔁 Steps to Reproduce

🤔 Expected Behavior

📓 Logs / Error Output

🧠 Environment Info

🧩 Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions