Skip to content

Commit a5cb8e4

Browse files
authored
[doc]Modify quantization tutorials (#5026)
### What this PR does / why we need it? Modify quantization tutorials to correct a few mistakes: Qwen3-32B-W4A4.md and Qwen3-8B-W4A8.md Qwen3-8B-W4A8: need to set one idle npu card. Qwen3-32B-W4A4: need to set two idle npu cards for the flatquant training and modify the calib_file path which does not match the ModeSlim version. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: IncSec <[email protected]>
1 parent e90e8af commit a5cb8e4

File tree

2 files changed

+5
-1
lines changed

2 files changed

+5
-1
lines changed

docs/source/tutorials/Qwen3-32B-W4A4.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,10 +55,12 @@ cd example/Qwen
5555
MODEL_PATH=/home/models/Qwen3-32B
5656
# Path to save converted weight, Replace with your local path
5757
SAVE_PATH=/home/models/Qwen3-32B-w4a4
58+
# Set two idle NPU cards
59+
export ASCEND_RT_VISIBLE_DEVICES=0,1
5860

5961
python3 w4a4.py --model_path $MODEL_PATH \
6062
--save_directory $SAVE_PATH \
61-
--calib_file ../common/qwen_qwen3_cot_w4a4.json \
63+
--calib_file ./calib_data/qwen3_cot_w4a4.json \
6264
--trust_remote_code True \
6365
--batch_size 1
6466
```

docs/source/tutorials/Qwen3-8B-W4A8.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,8 @@ cd example/Qwen
4747
MODEL_PATH=/home/models/Qwen3-8B
4848
# Path to save converted weight, Replace with your local path
4949
SAVE_PATH=/home/models/Qwen3-8B-w4a8
50+
# Set an idle NPU card
51+
export ASCEND_RT_VISIBLE_DEVICES=0
5052

5153
python quant_qwen.py \
5254
--model_path $MODEL_PATH \

0 commit comments

Comments
 (0)