[Fix]Ensure HuggingFace downloads complete before initialization.#1213
[Fix]Ensure HuggingFace downloads complete before initialization.#1213zzhuoxin1508 wants to merge 14 commits intovllm-project:mainfrom
Conversation
Signed-off-by: zhou zhuoxin <zhouzhuoxin1508@outlook.com>
Signed-off-by: zhou zhuoxin <zhouzhuoxin1508@outlook.com>
Signed-off-by: zhou zhuoxin <zhouzhuoxin1508@outlook.com>
Signed-off-by: zhou zhuoxin <zhouzhuoxin1508@outlook.com>
Signed-off-by: zhou zhuoxin <zhouzhuoxin1508@outlook.com>
Signed-off-by: zhou zhuoxin <zhouzhuoxin1508@outlook.com>
Signed-off-by: zhou zhuoxin <zhouzhuoxin1508@outlook.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e1edaa8168
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
vllm_omni/entrypoints/omni.py
Outdated
| allow_patterns=[ | ||
| "**/*.json", "**/*.bin", "**/*.safetensors", "**/*.pt", | ||
| "**/*.txt", "**/*.model", "**/*.yaml" |
There was a problem hiding this comment.
Include legacy .npy assets in HF prefetch
The new prefetch list omits *.npy, but the Qwen2.5 Omni loader explicitly supports legacy speaker assets stored under inputs/*spk_emb.npy and inputs/*ref_mel.npy in the model directory (see _init_token2wav_model in vllm_omni/model_executor/models/qwen2_5_omni/qwen2_5_omni.py). Because omni_snapshot_download now converts repo IDs into a local snapshot path, downstream loaders will treat it as a local directory and won’t fall back to Hugging Face to fetch missing files. For models that only ship the legacy .npy assets (no spk_dict.pt), this change silently drops conditioning data and forces the fallback zeros path, which breaks speaker conditioning quality. Consider adding **/*.npy (or using * for the prefetch) to avoid losing these files.
Useful? React with 👍 / 👎.
|
fix precommits please |
Signed-off-by: zhou zhuoxin <zhouzhuoxin1508@outlook.com>
|
could you please also test the qwen2.5-omni model? |
Signed-off-by: zhou zhuoxin <zhouzhuoxin1508@outlook.com>
ok i'll try it |
There was a problem hiding this comment.
Pull request overview
This pull request addresses initialization issues in multimodal models within multi-stage pipelines by ensuring that the Orchestrator completes all critical file downloads before spawning Stage Workers. This eliminates concurrent download conflicts and initialization timeouts in multi-process environments.
Changes:
- Added
require_allparameter todownload_weights_from_hf_specificto force downloading all matching patterns - Refactored
omni_snapshot_downloadto use recursive glob patterns (**/*.ext) and the newrequire_allfunctionality - Added local path validation to
omni_snapshot_downloadfor optimization
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| vllm_omni/model_executor/model_loader/weight_utils.py | Added require_all parameter to control whether all patterns should be downloaded |
| vllm_omni/entrypoints/omni.py | Refactored snapshot download to use recursive patterns and ensure complete downloads |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # If it's already a local path, just return it | ||
| if os.path.exists(model_id): | ||
| return model_id | ||
| # TODO: this is just a workaround for quickly use modelscope, we should support |
There was a problem hiding this comment.
Line 77 has trailing whitespace. Remove the trailing space after "return model_id".
| else: | ||
| return _dummy_snapshot_download(model_id) | ||
| # For other cases (Hugging Face), perform a real download to ensure all | ||
| # necessary files (including *.pt for audio/diffusion) are available locally |
There was a problem hiding this comment.
Line 84 has trailing whitespace. Remove the trailing space after "return snapshot_download(model_id)".
|
qwen2.5-omni test @hsliuustc0106 @lishunyang12 @nussejzz PTAL |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: zhou zhuoxin <zhouzhuoxin1508@outlook.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: zhou zhuoxin <zhouzhuoxin1508@outlook.com>
There was a problem hiding this comment.
Tested with Qwen-image-edit. It will break.
(workspace) root@925981d52983:/workspace/vllm-omni/examples/offline_inference/image_to_image# python image_edit.py \
--image qwen-bear.png \
--prompt "Let this mascot dance under the moon, surrounded by floating stars and poetic bubbles such as 'Be Kind'" \
--output output_image_edit.png \
--num_inference_steps 50 \
--cfg_scale 4.0
INFO 02-06 15:54:04 [weight_utils.py:50] Using model weights format ['**/*.json', '**/*.bin', '**/*.safetensors', '**/*.pt', '**/*.txt', '**/*.model', '**/*.yaml']
added_tokens.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 605/605 [00:00<00:00, 1.68MB/s]
preprocessor_config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 788/788 [00:00<00:00, 2.61MB/s]
special_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 613/613 [00:00<00:00, 2.54MB/s]
processor/tokenizer.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11.4M/11.4M [00:01<00:00, 11.0MB/s]
tokenizer_config.json: 4.73kB [00:00, 16.3MB/s]
video_preprocessor_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 904/904 [00:00<00:00, 7.05MB/s]
vocab.json: 2.78MB [00:00, 18.2MB/s]
scheduler_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 485/485 [00:00<00:00, 2.00MB/s]
config.json: 3.22kB [00:00, 7.97MB/s]
generation_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 244/244 [00:00<00:00, 1.02MB/s]
model.safetensors.index.json: 57.7kB [00:00, 102MB/s]
tokenizer_config.json: 4.69kB [00:00, 12.3MB/s]
vocab.json: 3.38MB [00:00, 53.7MB/s]
config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 339/339 [00:00<00:00, 1.37MB/s]
(…)ion_pytorch_model.safetensors.index.json: 199kB [00:00, 110MB/s]
config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 730/730 [00:00<00:00, 3.07MB/s]
text_encoder/model-00001-of-00004.safete(…): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 4.97G/4.97G [00:12<00:00, 398MB/s]
text_encoder/model-00002-of-00004.safete(…): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 4.99G/4.99G [00:12<00:00, 408MB/s]
text_encoder/model-00003-of-00004.safete(…): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 4.93G/4.93G [00:12<00:00, 399MB/s]
text_encoder/model-00004-of-00004.safete(…): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 1.69G/1.69G [00:04<00:00, 353MB/s]
transformer/diffusion_pytorch_model-0000(…): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 4.99G/4.99G [00:11<00:00, 422MB/s]
transformer/diffusion_pytorch_model-0000(…): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 4.98G/4.98G [00:11<00:00, 421MB/s]
transformer/diffusion_pytorch_model-0000(…): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 4.95G/4.95G [00:33<00:00, 149MB/s]
transformer/diffusion_pytorch_model-0000(…): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 4.98G/4.98G [00:12<00:00, 411MB/s]
transformer/diffusion_pytorch_model-0000(…): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 4.95G/4.95G [00:11<00:00, 422MB/s]
transformer/diffusion_pytorch_model-0000(…): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 4.95G/4.95G [00:11<00:00, 427MB/s]
transformer/diffusion_pytorch_model-0000(…): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 4.91G/4.91G [00:12<00:00, 394MB/s]
transformer/diffusion_pytorch_model-0000(…): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 4.98G/4.98G [00:12<00:00, 398MB/s]
transformer/diffusion_pytorch_model-0000(…): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 1.17G/1.17G [00:03<00:00, 317MB/s]
vae/diffusion_pytorch_model.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 254M/254M [00:01<00:00, 156MB/s]
merges.txt: 1.67MB [00:00, 41.0MB/s]
INFO 02-06 15:56:58 [weight_utils.py:71] Time spent downloading weights for Qwen/Qwen-Image-Edit: 174.753237 seconds
INFO 02-06 15:56:58 [omni.py:132] Initializing stages for model: /workspace/.cache/huggingface/hub/models--Qwen--Qwen-Image-Edit/snapshots/ac7f9318f633fc4b5778c59367c8128225f1e3de
Traceback (most recent call last):
File "/workspace/.venv/lib/python3.12/site-packages/vllm/transformers_utils/config.py", line 604, in get_config
raise ValueError(
ValueError: Could not detect config format for no config file found. With config_format 'auto', ensure your model has either config.json (HF format) or params.json (Mistral format). Otherwise please specify your_custom_config_format in engine args for customized config parser.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/workspace/vllm-omni/vllm_omni/entrypoints/utils.py", line 139, in resolve_model_config_path
hf_config = get_config(model, trust_remote_code=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/.venv/lib/python3.12/site-packages/vllm/transformers_utils/config.py", line 625, in get_config
raise ValueError(error_message) from e
ValueError: Invalid repository ID or local directory specified: '/workspace/.cache/huggingface/hub/models--Qwen--Qwen-Image-Edit/snapshots/ac7f9318f633fc4b5778c59367c8128225f1e3de'.
Please verify the following requirements:
1. Provide a valid Hugging Face repository ID.
2. Specify a local directory that contains a recognized configuration file.
- For Hugging Face models: ensure the presence of a 'config.json'.
- For Mistral models: ensure the presence of a 'params.json'.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/workspace/vllm-omni/examples/offline_inference/image_to_image/image_edit.py", line 492, in <module>
main()
File "/workspace/vllm-omni/examples/offline_inference/image_to_image/image_edit.py", line 362, in main
omni = Omni(
^^^^^
File "/workspace/vllm-omni/vllm_omni/entrypoints/omni.py", line 535, in __init__
super().__init__(model, **kwargs)
File "/workspace/vllm-omni/vllm_omni/entrypoints/omni.py", line 133, in __init__
self._initialize_stages(model, kwargs)
File "/workspace/vllm-omni/vllm_omni/entrypoints/omni.py", line 221, in _initialize_stages
self.config_path = resolve_model_config_path(model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/vllm-omni/vllm_omni/entrypoints/utils.py", line 162, in resolve_model_config_path
raise ValueError(
ValueError: Could not determine model_type for model: /workspace/.cache/huggingface/hub/models--Qwen--Qwen-Image-Edit/snapshots/ac7f9318f633fc4b5778c59367c8128225f1e3de. Model is not in standard transformers format and does not have model_index.json. Please ensure the model has proper configuration files with 'model_type' field
(workspace) root@925981d52983:/workspace/vllm-omni/examples/offline_inference/image_to_image#
|
Can you take a look on how diffuser and vllm handle this situation? Track the respective code and try to run their examples. |
ok |
Purpose
This PR enhances the startup stability of multimodal models within multi-stage pipelines. By ensuring the Orchestrator completes all critical file downloads before spawning any Stage Workers, eliminate issues related to concurrent download conflicts, and initialization timeouts in multi-process environments.
Solution
Test Plan
Validated the fix using the Tongyi-MAI/Z-Image-Turbo model.
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)