Skip to content

Commit 90704e9

Browse files
authored
Merge branch 'vllm-project:main' into add_custom_voice
2 parents 1f93f21 + 87847a2 commit 90704e9

File tree

28 files changed

+886
-504
lines changed

28 files changed

+886
-504
lines changed

.buildkite/test-nightly.yaml

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
steps:
2+
- label: ":docker: Build image"
3+
key: image-build
4+
commands:
5+
- "aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws/q9t5s3a7"
6+
- "docker build --file docker/Dockerfile.ci -t vllm-omni-ci ."
7+
- "docker tag vllm-omni-ci public.ecr.aws/q9t5s3a7/vllm-ci-test-repo:$BUILDKITE_COMMIT"
8+
- "docker push public.ecr.aws/q9t5s3a7/vllm-ci-test-repo:$BUILDKITE_COMMIT"
9+
agents:
10+
queue: "cpu_queue_premerge"
11+
12+
- label: "Omni Model Test with H100"
13+
timeout_in_minutes: 180
14+
depends_on: image-build
15+
commands:
16+
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
17+
- pytest -s -v tests/e2e/online_serving/test_qwen3_omni_expansion.py
18+
agents:
19+
queue: "mithril-h100-pool"
20+
plugins:
21+
- kubernetes:
22+
podSpec:
23+
containers:
24+
- image: public.ecr.aws/q9t5s3a7/vllm-ci-test-repo:$BUILDKITE_COMMIT
25+
resources:
26+
limits:
27+
nvidia.com/gpu: 2
28+
volumeMounts:
29+
- name: devshm
30+
mountPath: /dev/shm
31+
- name: hf-cache
32+
mountPath: /root/.cache/huggingface
33+
env:
34+
- name: HF_HOME
35+
value: /root/.cache/huggingface
36+
nodeSelector:
37+
node.kubernetes.io/instance-type: gpu-h100-sxm
38+
volumes:
39+
- name: devshm
40+
emptyDir:
41+
medium: Memory
42+
- name: hf-cache
43+
hostPath:
44+
path: /mnt/hf-cache
45+
type: DirectoryOrCreate

docker/Dockerfile.npu

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,8 @@ WORKDIR ${APP_DIR}
77

88
COPY . .
99

10-
# Remove this replace when the dispatch of requirements is ready
11-
RUN sed -i -E 's/^([[:space:]]*)"fa3-fwd==0\.0\.1",/\1# "fa3-fwd==0.0.1",/' pyproject.toml \
12-
&& sed -i -E 's/\bonnxruntime\b/onnxruntime-cann/g' pyproject.toml
13-
1410
# Install vllm-omni with dev dependencies
15-
RUN pip install --no-cache-dir -e .
11+
RUN pip install --no-cache-dir -e . --no-build-isolation
1612

1713
ENV VLLM_WORKER_MULTIPROC_METHOD=spawn
1814

docker/Dockerfile.npu.a3

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,8 @@ WORKDIR ${APP_DIR}
77

88
COPY . .
99

10-
# Remove this replace when the dispatch of requirements is ready
11-
RUN sed -i -E 's/^([[:space:]]*)"fa3-fwd==0\.0\.1",/\1# "fa3-fwd==0.0.1",/' pyproject.toml \
12-
&& sed -i -E 's/\bonnxruntime\b/onnxruntime-cann/g' pyproject.toml
13-
1410
# Install vllm-omni with dev dependencies
15-
RUN pip install --no-cache-dir -e .
11+
RUN pip install --no-cache-dir -e . --no-build-isolation
1612

1713
ENV VLLM_WORKER_MULTIPROC_METHOD=spawn
1814

docker/Dockerfile.rocm

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,11 +15,7 @@ RUN mkdir -p ${COMMON_WORKDIR}/vllm-omni
1515

1616
# Step 2: Copy vllm-omni code and install without uv
1717
COPY . ${COMMON_WORKDIR}/vllm-omni
18-
RUN cd ${COMMON_WORKDIR}/vllm-omni && uv pip install --python "$(python3 -c 'import sys; print(sys.executable)')" --no-cache-dir ".[dev]"
19-
20-
# When we are installing onnxruntime-rocm, we need to uninstall the system-installed onnxruntime first.
21-
# These are the dependencies of Qwen3-TTS.
22-
RUN uv pip uninstall onnxruntime --system && uv pip install --no-cache-dir onnxruntime-rocm sox --system
18+
RUN cd ${COMMON_WORKDIR}/vllm-omni && uv pip install --python "$(python3 -c 'import sys; print(sys.executable)')" --no-cache-dir ".[dev]" --no-build-isolation
2319

2420
RUN ln -sf /usr/bin/python3 /usr/bin/python
2521

docs/getting_started/installation/gpu.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,8 @@ vLLM-Omni is a Python library that supports the following GPU variants. The libr
2626

2727
### Pre-built wheels
2828

29+
Note: Pre-built wheels are currently only available for vLLM-Omni 0.11.0rc1, 0.12.0rc1, 0.14.0rc1, 0.14.0. For the latest version, please [build from source](https://docs.vllm.ai/projects/vllm-omni/en/latest/getting_started/installation/gpu/#build-wheel-from-source).
30+
2931
=== "NVIDIA CUDA"
3032

3133
--8<-- "docs/getting_started/installation/gpu/cuda.inc.md:pre-built-wheels"

docs/getting_started/installation/gpu/cuda.inc.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,6 @@ Therefore, it is recommended to install vLLM and vLLM-Omni with a **fresh new**
1717
# --8<-- [start:pre-built-wheels]
1818

1919
#### Installation of vLLM
20-
Note: Pre-built wheels are currently only available for vLLM-Omni 0.11.0rc1, 0.12.0rc1, 0.14.0rc1, 0.14.0. For the latest version, please [build from source](https://docs.vllm.ai/projects/vllm-omni/en/latest/getting_started/installation/gpu/#build-wheel-from-source).
21-
2220

2321
vLLM-Omni is built based on vLLM. Please install it with command below.
2422
```bash

docs/getting_started/installation/gpu/rocm.inc.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,60 @@ vLLM-Omni current recommends the steps in under setup through Docker Images.
99

1010
# --8<-- [start:pre-built-wheels]
1111

12+
#### Installation of vLLM
13+
14+
vLLM-Omni is built based on vLLM. Please install it with command below.
15+
```bash
16+
uv pip install vllm==0.14.0+rocm700 --extra-index-url https://wheels.vllm.ai/rocm/0.14.0/rocm700
17+
```
18+
19+
#### Installation of vLLM-Omni
20+
21+
```bash
22+
# we need to add --no-build-isolation as the torch
23+
# is not obtained from pypi, we have to install using the
24+
# torch installed in our environment
25+
uv pip install vllm-omni
26+
27+
# Optional if want to run Qwen3 TTS
28+
uv pip uninstall onnxruntime # should be removed before we can install onnxruntime-rocm
29+
uv pip install onnxruntime-rocm sox
30+
```
31+
1232
# --8<-- [end:pre-built-wheels]
1333

1434
# --8<-- [start:build-wheel-from-source]
1535

36+
#### Installation of vLLM
37+
If you do not need to modify source code of vLLM, you can directly install the stable 0.14.0 release version of the library
38+
39+
```bash
40+
uv pip install vllm==0.14.0+rocm700 --extra-index-url https://wheels.vllm.ai/rocm/0.14.0/rocm700
41+
```
42+
43+
The release 0.14.0 of vLLM requires ROCm 7.0 environment.
44+
45+
#### Installation of vLLM-Omni
46+
Since vllm-omni is rapidly evolving, it's recommended to install it from source
47+
```bash
48+
git clone https://github.com/vllm-project/vllm-omni.git
49+
cd vllm-omni
50+
VLLM_OMNI_TARGET_DEVICE=rocm uv pip install -e .
51+
# OR
52+
uv pip install -e . --no-build-isolation
53+
```
54+
55+
<details><summary>(Optional) Installation of vLLM from source</summary>
56+
If you want to check, modify or debug with source code of vLLM, install the library from source with the following instructions:
57+
58+
```bash
59+
git clone https://github.com/vllm-project/vllm.git
60+
cd vllm
61+
git checkout v0.14.0
62+
python3 -m pip install -r requirements/rocm.txt
63+
python3 setup.py develop
64+
```
65+
1666
# --8<-- [end:build-wheel-from-source]
1767

1868
# --8<-- [start:build-docker]

docs/getting_started/installation/npu/npu.inc.md

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -38,12 +38,9 @@ docker run --rm \
3838
cd /vllm-workspace
3939
git clone -b v0.14.0 https://github.com/vllm-project/vllm-omni.git
4040

41-
# Remove this replace when the dispatch of requirements is ready
42-
RUN sed -i -E 's/^([[:space:]]*)"fa3-fwd==0\.0\.1",/\1# "fa3-fwd==0.0.1",/' pyproject.toml \
43-
&& sed -i -E 's/\bonnxruntime\b/onnxruntime-cann/g' pyproject.toml
44-
4541
cd vllm-omni
46-
pip install -v -e .
42+
VLLM_OMNI_TARGET_DEVICE=npu pip install -v -e .
43+
# OR pip install -v -e . --no-build-isolation
4744
export VLLM_WORKER_MULTIPROC_METHOD=spawn
4845
```
4946

docs/user_guide/diffusion/parallelism_acceleration.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ The following table shows which models are currently supported by parallelism me
4949

5050
| Model | Model Identifier | Ulysses-SP | Ring-SP | Tensor-Parallel |
5151
|-------|------------------|------------|---------|--------------------------|
52-
| **Wan2.2** | `Wan-AI/Wan2.2-T2V-A14B-Diffusers` ||| |
52+
| **Wan2.2** | `Wan-AI/Wan2.2-T2V-A14B-Diffusers` ||| |
5353

5454
### Tensor Parallelism
5555

examples/offline_inference/text_to_video/text_to_video.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,12 @@ def parse_args() -> argparse.Namespace:
109109
choices=[1, 2],
110110
help="Number of GPUs used for classifier free guidance parallel size.",
111111
)
112-
112+
parser.add_argument(
113+
"--tensor_parallel_size",
114+
type=int,
115+
default=1,
116+
help="Number of GPUs used for tensor parallelism (TP) inside the DiT.",
117+
)
113118
return parser.parse_args()
114119

115120

@@ -141,6 +146,7 @@ def main():
141146
ulysses_degree=args.ulysses_degree,
142147
ring_degree=args.ring_degree,
143148
cfg_parallel_size=args.cfg_parallel_size,
149+
tensor_parallel_size=args.tensor_parallel_size,
144150
)
145151

146152
# Check if profiling is requested via environment variable
@@ -173,7 +179,7 @@ def main():
173179
print(f" Inference steps: {args.num_inference_steps}")
174180
print(f" Frames: {args.num_frames}")
175181
print(
176-
f" Parallel configuration: ulysses_degree={args.ulysses_degree}, ring_degree={args.ring_degree}, cfg_parallel_size={args.cfg_parallel_size}"
182+
f" Parallel configuration: ulysses_degree={args.ulysses_degree}, ring_degree={args.ring_degree}, cfg_parallel_size={args.cfg_parallel_size}, tensor_parallel_size={args.tensor_parallel_size}"
177183
)
178184
print(f" Video size: {args.width}x{args.height}")
179185
print(f"{'=' * 60}\n")

0 commit comments

Comments
 (0)