[deps] Pin torch to pytorch-cpu index in the vllm extra#4663
Open
AlienKevin wants to merge 2 commits intomainfrom
Open
[deps] Pin torch to pytorch-cpu index in the vllm extra#4663AlienKevin wants to merge 2 commits intomainfrom
AlienKevin wants to merge 2 commits intomainfrom
Conversation
The vllm extra installs vllm-tpu, which transitively depends on torch but does not pin which index to use. Without an explicit binding, uv resolves the transitive dep against the default PyPI index and installs the CUDA build wheel. On TPU workers this crashes at module init with "libcublas.so.*[0-9] not found in the system path", because torch's _load_global_deps preloads the CUDA runtime libraries that don't exist on the TPU workers. The bug is normally hidden by uv.lock pinning torch to the cpu index from the cpu/tpu extras. It surfaces on Iris workers because those workers drop uv.lock from the workspace bundle when it exceeds 1MB (Kubernetes ConfigMap limit) and fall back to a fresh `uv sync --extra vllm` resolve, which then picks the wrong torch wheel. Fix: add an explicit `torch==2.9.0` (and matching torchvision) pin to the vllm extra and route it to the pytorch-cpu index via `[tool.uv.sources]`. Also declare a vllm/gpu mutual-exclusion in `[tool.uv.conflicts]` since marin only ships vllm-tpu (no vllm-cuda variant) and the two extras would otherwise conflict over which torch index to use during full-workspace locking. Verified by running `uv sync --package marin --extra vllm` in a clean worktree off main: torch resolves to `2.9.0+cpu`, `torch.cuda` is None, `import torch` succeeds without libcublas. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This was referenced Apr 12, 2026
AlienKevin
added a commit
that referenced
this pull request
Apr 12, 2026
Pin shards away from europe-west4 region by default. Some workers in that region have a broken vllm-tpu venv (CUDA torch instead of CPU torch_xla) that crashes vLLM at engine-core init. Iris max-retries reassigns to the SAME worker, so a single bad worker poisons all 5 retries of a shard. us-east5 and us-east1 v6e-4 workers consistently work in our experience (verified by the multilang topup and the Step 6 32K runs). Flag is configurable so we can broaden the pool once #4663 lands and the worker-image divergence is resolved. Part of #4666
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The vllm extra installs vllm-tpu, which transitively depends on torch but does not pin which index to use. Without an explicit binding, uv resolves the transitive dep against the default PyPI index and installs the CUDA build wheel. On TPU workers this crashes at module init with libcublas.so.*[0-9] not found in the system path, because torch's _load_global_deps preloads the CUDA runtime libraries that don't exist on the TPU workers.
The bug is normally hidden by uv.lock pinning torch to the cpu index from the cpu/tpu extras. It surfaces on Iris workers because those workers drop uv.lock from the workspace bundle when it exceeds 1MB (Kubernetes ConfigMap limit) and fall back to a fresh uv sync --extra vllm resolve, which then picks the wrong torch wheel. This was first hit by the SWE-ZERO multi-language experiment in #4653 — every preempted worker that had to do a fresh resolve crashed at vLLM startup until the script was rewritten to spawn vllm via subprocess and the user was instructed to pass --extra vllm --extra tpu manually.
Fix: add an explicit torch==2.9.0 (and matching torchvision) pin to the vllm extra and route it to the pytorch-cpu index via [tool.uv.sources]. Also declare a vllm/gpu mutual-exclusion in [tool.uv.conflicts] since marin only ships vllm-tpu (no vllm-cuda variant) and the two extras would otherwise conflict over which torch index to use during full-workspace locking.
Verified by running uv sync --package marin --extra vllm in a clean worktree off main: torch resolves to 2.9.0+cpu, torch.cuda is None, import torch succeeds without libcublas. After this lands, --extra vllm alone is sufficient on Iris TPU workers and the --extra vllm --extra tpu workaround can be dropped.
Fixes the libcublas.so.*[0-9] not found in the system path crash hit by #4653.