[Bug]: LiteLLM triton provider adds ~5s delay to time-to-first-token (TTFT)

### Check for existing issues

- [x] I have searched the existing issues and checked that my issue is not a duplicate.

### What happened?

When using LiteLLM with the Triton provider to proxy requests to a Triton server (vLLM backend), every request shows an extra ~5.0s delay to TTFT compared to calling Triton directly.

**What I see:**
- Request via LiteLLM → Triton: TTFT 5.10s
- Same request directly to Triton: TTFT 0.10s
- The ~5s delay occurs for every model served by Triton when proxied through LiteLLM.
- Using LiteLLM → vLLM (with openai provider) shows no meaningful extra delay versus direct requests.

**Environment**
- LiteLLM version: v1.83.7-stable
- Triton with vLLM backend
- Both LiteLLM and Triton running in same Kubernetes namespace
- Fully air-gapped (no Internet access)

**Debugging done**
- Checked LiteLLM logs: no obvious errors or warnings pointing to the delay.
- Observed a repeated internal API call that includes header Authorization: Bearer <invalid JWT>; unclear whether this call is related.
- Timing is consistently ~5.0s, suggesting a timeout or retry behavior.

**Impact**

This consistent 5s TTFT addition prevents using LiteLLM + Triton in production.

**Requested help**
- Any pointers where LiteLLM might add a fixed ~5s wait (timeouts, retries, health checks, auth calls)?
- Guidance on which additional logs/traces or configuration settings would be most useful to capture next

### Steps to Reproduce

1. Make LiteLLM request to model hosted in triton via triton provider


### Relevant log output

```shell

```

### What part of LiteLLM is this about?

Proxy

### What LiteLLM version are you on ?

v1.83.7-stable

### Twitter / LinkedIn details

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: LiteLLM triton provider adds ~5s delay to time-to-first-token (TTFT) #26699

Check for existing issues

What happened?

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: LiteLLM triton provider adds ~5s delay to time-to-first-token (TTFT) #26699

Description

Check for existing issues

What happened?

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions