Skip to content

Commit 4db7435

Browse files
erictang000claude
andauthored
[deps] cap fastapi<0.137 to fix vLLM /health 500 (_IncludedRouter) (NovaSky-AI#1810)
## Summary Cap `fastapi<0.137` to fix the vLLM inference server failing to start. **fastapi 0.137.0** refactored `include_router()` to store `_IncludedRouter` wrapper objects in `app.routes`. `prometheus-fastapi-instrumentator` 8.0.0 (pulled in transitively by vLLM) iterates `app.routes` in its metrics middleware and reads `route.path` unconditionally, which raises on the new objects: ``` GET /health HTTP/1.1" 500 Internal Server Error AttributeError: '_IncludedRouter' object has no attribute 'path' prometheus_fastapi_instrumentator/routing.py:55 in _get_route_name -> route.path ``` So **every request through the instrumented vLLM app returns 500**, including the `/health` probe the trainer waits on — the server never becomes healthy and startup fails with `TimeoutError: Server failed to become healthy within 600s` (then Ray retries). This started biting us after a re-lock bumped fastapi to 0.137.1. This is an upstream ecosystem incompatibility, not SkyRL code. Cap fastapi below the breaking release until `prometheus-fastapi-instrumentator` ships a fix and vLLM picks it up. ## Change - `pyproject.toml`: add `"fastapi<0.137"` to `[tool.uv].constraint-dependencies`. - `uv.lock`: re-locked → `fastapi 0.137.1 → 0.136.3` (with consistent `huggingface-hub` / `typer` resolution). ## Test plan Verified a fully-async GSM8K run (`--extra fsdp`, vLLM 0.20.2) now starts healthy and trains, whereas with fastapi 0.137.1 it 500s on `/health` and times out. Refs: trallnag/prometheus-fastapi-instrumentator#370, vllm-project/vllm#45596 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent e1b995f commit 4db7435

2 files changed

Lines changed: 79 additions & 178 deletions

File tree

pyproject.toml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -204,6 +204,14 @@ required-environments = [
204204
constraint-dependencies = [
205205
"flashinfer-jit-cache==0.6.8.post1",
206206
"flashinfer-cubin==0.6.8.post1",
207+
# fastapi 0.137.0 refactored include_router() to store `_IncludedRouter` wrapper objects in
208+
# `app.routes`, which prometheus-fastapi-instrumentator (pulled in transitively by vLLM) cannot
209+
# handle: `_get_route_name` accesses `route.path` and raises
210+
# `AttributeError: '_IncludedRouter' object has no attribute 'path'`, so the vLLM server's /health
211+
# endpoint 500s and the server never becomes healthy. Cap below 0.137 until the instrumentator is
212+
# fixed. See https://github.com/trallnag/prometheus-fastapi-instrumentator/issues/370 and
213+
# https://github.com/vllm-project/vllm/issues/45596
214+
"fastapi<0.137",
207215
]
208216
# each backend should have separate dependencies that can potentially clash
209217
# megatron also clashes with the jax dependency from gpu and tpu extras

0 commit comments

Comments
 (0)