Check for existing issues
What happened?
When using LiteLLM with the Triton provider to proxy requests to a Triton server (vLLM backend), every request shows an extra ~5.0s delay to TTFT compared to calling Triton directly.
What I see:
- Request via LiteLLM → Triton: TTFT 5.10s
- Same request directly to Triton: TTFT 0.10s
- The ~5s delay occurs for every model served by Triton when proxied through LiteLLM.
- Using LiteLLM → vLLM (with openai provider) shows no meaningful extra delay versus direct requests.
Environment
- LiteLLM version: v1.83.7-stable
- Triton with vLLM backend
- Both LiteLLM and Triton running in same Kubernetes namespace
- Fully air-gapped (no Internet access)
Debugging done
- Checked LiteLLM logs: no obvious errors or warnings pointing to the delay.
- Observed a repeated internal API call that includes header Authorization: Bearer ; unclear whether this call is related.
- Timing is consistently ~5.0s, suggesting a timeout or retry behavior.
Impact
This consistent 5s TTFT addition prevents using LiteLLM + Triton in production.
Requested help
- Any pointers where LiteLLM might add a fixed ~5s wait (timeouts, retries, health checks, auth calls)?
- Guidance on which additional logs/traces or configuration settings would be most useful to capture next
Steps to Reproduce
- Make LiteLLM request to model hosted in triton via triton provider
Relevant log output
What part of LiteLLM is this about?
Proxy
What LiteLLM version are you on ?
v1.83.7-stable
Twitter / LinkedIn details
No response
Check for existing issues
What happened?
When using LiteLLM with the Triton provider to proxy requests to a Triton server (vLLM backend), every request shows an extra ~5.0s delay to TTFT compared to calling Triton directly.
What I see:
Environment
Debugging done
Impact
This consistent 5s TTFT addition prevents using LiteLLM + Triton in production.
Requested help
Steps to Reproduce
Relevant log output
What part of LiteLLM is this about?
Proxy
What LiteLLM version are you on ?
v1.83.7-stable
Twitter / LinkedIn details
No response