Skip to content

Commit ee90710

Browse files
committed
Merge remote-tracking branch 'upstream/main' into feat/cli-runtime-overrides-a
# Conflicts: # sglang_omni/models/ming_omni/config.py
2 parents 47b5447 + c4da436 commit ee90710

72 files changed

Lines changed: 5464 additions & 1824 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/test-qwen3-omni-ci.yaml

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -181,3 +181,88 @@ jobs:
181181
shell: bash
182182
run: |
183183
bash .github/scripts/delete_gpu_process.sh
184+
185+
stage-4-mmsu:
186+
name: stage 4 - MMSU accuracy + speed
187+
needs: stage-3-mmmu-tts-consistency
188+
runs-on: [self-hosted]
189+
timeout-minutes: 20
190+
container:
191+
image: frankleeeee/sglang-omni:dev
192+
options: --gpus all --rm -v /dev/shm:/dev/shm
193+
steps:
194+
- name: Checkout code
195+
uses: actions/checkout@v4
196+
197+
- uses: ./.github/actions/omni-setup
198+
with:
199+
venv-name: omni-qwen3
200+
201+
- name: Run MMSU CI (accuracy + speed)
202+
shell: bash
203+
run: |
204+
source omni-qwen3/bin/activate
205+
export PYTHONPATH=$PWD
206+
pytest tests/test_model/test_qwen3_omni_mmsu_ci.py -v -s -x
207+
env:
208+
HF_ENDPOINT: https://hf-mirror.com
209+
210+
- name: Print MMSU CI artifacts (accuracy + speed)
211+
if: always()
212+
shell: bash
213+
run: |
214+
source omni-qwen3/bin/activate
215+
echo "=== Qwen3-Omni MMSU CI results (summary only) ==="
216+
for f in $(find /tmp -path '*/mmsu/mmsu_results.json' 2>/dev/null); do
217+
echo "--- $f ---"
218+
python -c "import json,sys; d=json.load(open(sys.argv[1])); d.pop('per_sample',None); print(json.dumps(d, indent=2, ensure_ascii=False))" "$f"
219+
echo ""
220+
done
221+
222+
- name: Kill GPU processes
223+
if: always()
224+
shell: bash
225+
run: |
226+
bash .github/scripts/delete_gpu_process.sh
227+
228+
stage-5-mmsu-tts-consistency:
229+
name: stage 5 - MMSU TTS consistency
230+
needs: stage-4-mmsu
231+
runs-on: [self-hosted]
232+
timeout-minutes: 15
233+
container:
234+
image: frankleeeee/sglang-omni:dev
235+
options: --gpus all --rm -v /dev/shm:/dev/shm
236+
steps:
237+
- name: Checkout code
238+
uses: actions/checkout@v4
239+
240+
- uses: ./.github/actions/omni-setup
241+
with:
242+
venv-name: omni-qwen3
243+
244+
- name: Run MMSU TTS Consistency CI (WER + speed)
245+
shell: bash
246+
run: |
247+
source omni-qwen3/bin/activate
248+
export PYTHONPATH=$PWD
249+
pytest tests/test_model/test_qwen3_omni_mmsu_tts_consistency_ci.py -v -s -x
250+
env:
251+
HF_ENDPOINT: https://hf-mirror.com
252+
253+
- name: Print MMSU TTS Consistency CI artifacts (WER + speed)
254+
if: always()
255+
shell: bash
256+
run: |
257+
echo "=== Qwen3-Omni MMSU TTS Consistency CI results ==="
258+
for f in $(find /tmp -path '*/mmsu_audio/mmsu_results.json' 2>/dev/null); do
259+
echo "--- $f ---"
260+
cat "$f"
261+
echo ""
262+
done
263+
264+
- name: Kill GPU processes
265+
if: always()
266+
shell: bash
267+
run: |
268+
bash .github/scripts/delete_gpu_process.sh

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ repos:
3232
hooks:
3333
- id: ruff
3434
args: [--select=F401, --fixable=F401]
35-
files: ^(benchmark/|docs/|examples/)
35+
files: ^(benchmarks/|docs/|examples/)
3636
exclude: \.ipynb$
3737
- repo: https://github.com/psf/black
3838
rev: 24.10.0

benchmarks/README.md

Lines changed: 44 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ and accuracy (WER, MMSU, MMMU) across supported modality combinations.
77

88
```
99
benchmarks/
10-
├── tasks/ # Per-task logic (tts, mmsu, visual_understand)
10+
├── tasks/ # Per-task logic (tts, audio_understanding, visual_understand)
1111
├── metrics/ # Metric computation (performance, accuracy)
1212
├── dataset/ # Dataset loaders + download helpers
1313
├── benchmarker/ # Framework: runner, data structures, utilities
@@ -29,6 +29,10 @@ python -m sglang_omni.cli.cli serve \
2929
--model-path fishaudio/s2-pro \
3030
--config examples/configs/s2pro_tts.yaml --port 8000
3131

32+
# Voxtral-4B-TTS — for section 2d (plain TTS, no voice cloning)
33+
python -m sglang_omni.cli.cli serve \
34+
--model-path mistralai/Voxtral-4B-TTS-2603 --port 8000
35+
3236
# Qwen3-Omni, speech mode — for section 3 (SeedTTS; multi-GPU)
3337
python -m sglang_omni.cli.cli serve \
3438
--model-path Qwen/Qwen3-Omni-30B-A3B-Instruct --port 8000
@@ -56,11 +60,35 @@ python -m benchmarks.eval.benchmark_tts_seedtts \
5660
--model fishaudio/s2-pro \
5761
--output-dir results/s2pro_en --lang en --device cuda:0
5862

59-
# 3. Qwen3-Omni — same two-phase pipeline
63+
# 2d. Voxtral — full pipeline without voice cloning
64+
python -m benchmarks.eval.benchmark_tts_seedtts \
65+
--meta seedtts_testset/en/meta.lst \
66+
--model mistralai/Voxtral-4B-TTS-2603 --port 8000 \
67+
--max-concurrency 16 \
68+
--output-dir results/voxtral_en --lang en --max-samples 50 \
69+
--no-ref-audio --voice cheerful_female
70+
71+
# 3a. Qwen3-Omni — full pipeline (generate + transcribe)
6072
python -m benchmarks.eval.benchmark_omni_seedtts \
6173
--meta seedtts_testset/en/meta.lst \
62-
--model qwen3-omni --port 8000 \
63-
--output-dir results/qwen3_omni_en --max-samples 50
74+
--output-dir results/qwen3_omni_en \
75+
--max-concurrency 16 \
76+
--model qwen3-omni --port 8000 --max-samples 50
77+
78+
# 3b. Qwen3-Omni — generate only (server required; use in CI to split phases)
79+
python -m benchmarks.eval.benchmark_omni_seedtts \
80+
--generate-only \
81+
--meta seedtts_testset/en/meta.lst \
82+
--output-dir results/qwen3_omni_en \
83+
--max-concurrency 16 \
84+
--model qwen3-omni --port 8000 --max-samples 50
85+
86+
# 3c. Qwen3-Omni — transcribe only (reuses audio; no server)
87+
python -m benchmarks.eval.benchmark_omni_seedtts \
88+
--transcribe-only \
89+
--meta seedtts_testset/en/meta.lst \
90+
--output-dir results/qwen3_omni_en \
91+
--model qwen3-omni --lang en --device cuda:0
6492

6593
# 4. Qwen3-Omni — MMSU (audio comprehension)
6694
python -m benchmarks.eval.benchmark_omni_mmsu \
@@ -76,7 +104,7 @@ python -m benchmarks.eval.benchmark_omni_mmmu \
76104

77105
| Script | Task | Model | API |
78106
|--------|------|-------|-----|
79-
| `eval/benchmark_tts_seedtts.py` | TTS speed + WER (unified) | S2-Pro | `/v1/audio/speech` |
107+
| `eval/benchmark_tts_seedtts.py` | TTS speed + WER (unified) | e.g. S2-Pro, Voxtral | `/v1/audio/speech` |
80108
| `eval/benchmark_omni_seedtts.py` | TTS speed + WER (unified) | Qwen3-Omni | `/v1/chat/completions` |
81109
| `eval/benchmark_omni_mmsu.py` | MMSU (audio comprehension) | Qwen3-Omni | `/v1/chat/completions` |
82110
| `eval/benchmark_omni_mmmu.py` | MMMU (VLM accuracy + speed) | Qwen3-Omni | `/v1/chat/completions` |
@@ -85,7 +113,10 @@ The two `*_seedtts.py` scripts merge the previous `benchmark_*_tts_speed.py`
85113
and `voice_clone_*_wer.py` pairs into a single two-phase pipeline: phase 1
86114
generates + persists WAVs while the server runs, phase 2 transcribes offline
87115
to avoid GPU contention with the server. Use `--generate-only` or
88-
`--transcribe-only` to run a single phase.
116+
`--transcribe-only` to run a single phase. For TTS, `--concurrency` and
117+
`--max-concurrency` are equivalent (see `benchmark_tts_seedtts.py`).
118+
`benchmark_omni_seedtts.py` documents local vs CI GPU usage in its module
119+
docstring (sequential phases on CI to reduce OOM risk).
89120

90121
## Adding a New Model or Task
91122

@@ -104,5 +135,12 @@ Download helpers live in `benchmarks/dataset/prepare.py`:
104135
python -m benchmarks.dataset.prepare --dataset seedtts # full SeedTTS
105136
python -m benchmarks.dataset.prepare --dataset seedtts-mini # smoke-test subset
106137
python -m benchmarks.dataset.prepare --dataset seedtts-50 # 50-sample subset
138+
python -m benchmarks.dataset.prepare --dataset mmmu # full MMMU (30 subjects)
107139
python -m benchmarks.dataset.prepare --dataset mmmu-ci-50 # MMMU CI subset
140+
python -m benchmarks.dataset.prepare --dataset mmsu # full MMSU (ddwang2000/MMSU)
108141
```
142+
143+
SeedTTS datasets are materialized into `./seedtts_testset/` (override with
144+
`--local-dir`). MMMU/MMSU datasets are pre-warmed into the default
145+
HuggingFace cache and consumed via `datasets.load_dataset(repo_id)`, so
146+
`--local-dir` is a no-op for them.

0 commit comments

Comments
 (0)