Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 19 additions & 2 deletions Qwen/Qwen3-ASR.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
Qwen3-ASR is a speech-to-text model that achieves accurate and robust speech recogition performance, supporting 11 languages and multiple accents. Qwen3-ASR supports users to prompt the model with texture context in any format to obtain costumized ASR results, and is also good at singing voice recognition. This guide demonstrates how to deploy Qwen3-ASR efficiently with vLLM.

## Installing vllm

### CUDA
```bash
uv venv
source .venv/bin/activate
Expand All @@ -12,13 +12,30 @@ uv pip install -U vllm --pre \
--index-strategy unsafe-best-match
uv pip install "vllm[audio]" # For additional audio dependencies
```
### ROCm
> Note: The vLLM wheel for ROCm requires Python 3.12, ROCm 7.0, and glibc >= 2.35. If your environment does not meet these requirements, please use the Docker-based setup as described in the [documentation](https://docs.vllm.ai/en/latest/getting_started/installation/gpu/#pre-built-images).
```bash
uv venv
source .venv/bin/activate
uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/
uv pip install "vllm[audio]" # For additional audio dependencies
```

## Launching Qwen3-ASR with vLLM
### Online Serving
You can easily deploy Qwen3-ASR with vLLM by running the following command
### CUDA
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The heading level for CUDA appears to be incorrect. It's currently a level 3 heading (###), which makes it a sibling of Online Serving in the document structure. To correctly place it as a sub-section of Online Serving, it should be a level 4 heading (####). This will also fix the document outline, ensuring subsequent sections like Using OpenAI SDK are correctly nested.

Suggested change
### CUDA
#### CUDA

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address Gemini feedback.

```bash
vllm serve Qwen/Qwen3-ASR-1.7B
```
### ROCm
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the CUDA heading, the heading level for ROCm appears to be incorrect. It's currently a level 3 heading (###), making it a sibling of Online Serving. It should be a level 4 heading (####) to be a sub-section under Online Serving.

Suggested change
### ROCm
#### ROCm

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address Gemini feedback.

```bash
SAFETENSORS_FAST_GPU=1 \
VLLM_ROCM_USE_AITER=1 \
vllm serve Qwen/Qwen3-ASR-1.7B
```


Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is an extra blank line here. For better readability of the markdown source, it's best to use only one blank line to separate blocks of content.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address Gemini feedback.

After the model server is successfully deployed, you can interact with it in multiple ways.

#### Using OpenAI SDK
Expand Down Expand Up @@ -121,4 +138,4 @@ sampling_params = SamplingParams(temperature=0.01, max_tokens=256)
# Run inference using .chat()
outputs = llm.chat(conversation, sampling_params=sampling_params)
print(outputs[0].outputs[0].text)
```
```