-
Notifications
You must be signed in to change notification settings - Fork 219
Update Qwen3-ASR.md for AMD #299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2,7 +2,7 @@ | |
| Qwen3-ASR is a speech-to-text model that achieves accurate and robust speech recogition performance, supporting 11 languages and multiple accents. Qwen3-ASR supports users to prompt the model with texture context in any format to obtain costumized ASR results, and is also good at singing voice recognition. This guide demonstrates how to deploy Qwen3-ASR efficiently with vLLM. | ||
|
|
||
| ## Installing vllm | ||
|
|
||
| ### CUDA | ||
| ```bash | ||
| uv venv | ||
| source .venv/bin/activate | ||
|
|
@@ -12,13 +12,30 @@ uv pip install -U vllm --pre \ | |
| --index-strategy unsafe-best-match | ||
| uv pip install "vllm[audio]" # For additional audio dependencies | ||
| ``` | ||
| ### ROCm | ||
| > Note: The vLLM wheel for ROCm requires Python 3.12, ROCm 7.0, and glibc >= 2.35. If your environment does not meet these requirements, please use the Docker-based setup as described in the [documentation](https://docs.vllm.ai/en/latest/getting_started/installation/gpu/#pre-built-images). | ||
| ```bash | ||
| uv venv | ||
| source .venv/bin/activate | ||
| uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/ | ||
| uv pip install "vllm[audio]" # For additional audio dependencies | ||
| ``` | ||
|
|
||
| ## Launching Qwen3-ASR with vLLM | ||
| ### Online Serving | ||
| You can easily deploy Qwen3-ASR with vLLM by running the following command | ||
| ### CUDA | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please address Gemini feedback. |
||
| ```bash | ||
| vllm serve Qwen/Qwen3-ASR-1.7B | ||
| ``` | ||
| ### ROCm | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please address Gemini feedback. |
||
| ```bash | ||
| SAFETENSORS_FAST_GPU=1 \ | ||
| VLLM_ROCM_USE_AITER=1 \ | ||
| vllm serve Qwen/Qwen3-ASR-1.7B | ||
| ``` | ||
|
|
||
|
|
||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please address Gemini feedback. |
||
| After the model server is successfully deployed, you can interact with it in multiple ways. | ||
|
|
||
| #### Using OpenAI SDK | ||
|
|
@@ -121,4 +138,4 @@ sampling_params = SamplingParams(temperature=0.01, max_tokens=256) | |
| # Run inference using .chat() | ||
| outputs = llm.chat(conversation, sampling_params=sampling_params) | ||
| print(outputs[0].outputs[0].text) | ||
| ``` | ||
| ``` | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The heading level for
CUDAappears to be incorrect. It's currently a level 3 heading (###), which makes it a sibling ofOnline Servingin the document structure. To correctly place it as a sub-section ofOnline Serving, it should be a level 4 heading (####). This will also fix the document outline, ensuring subsequent sections likeUsing OpenAI SDKare correctly nested.