Skip to content

Commit 2f8e510

Browse files
committed
AWQ
1 parent 0f831a1 commit 2f8e510

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

benchmarks_and_experiments/coding_vs_vllm/start_kvboost.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ set -euo pipefail
3333

3434
# int4 (Marlin) by default — the single biggest decode lever on Ampere (~4× less
3535
# weight bandwidth). Override MODEL=Qwen/Qwen2.5-3B-Instruct for plain fp16.
36-
MODEL="${MODEL:-Qwen/Qwen2.5-3B-Instruct}"
36+
MODEL="${MODEL:-Qwen/Qwen2.5-3B-Instruct-AWQ}"
3737
PORT="${PORT:-9000}"
3838
# KV-cache budget for cross-request chunk reuse. The int4 model is only ~2 GB
3939
# (vs ~6 GB fp16) so on a 12 GB 3060 there's far more room for cache → bigger

0 commit comments

Comments
 (0)