We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
1 parent 0f831a1 commit 2f8e510Copy full SHA for 2f8e510
1 file changed
benchmarks_and_experiments/coding_vs_vllm/start_kvboost.sh
@@ -33,7 +33,7 @@ set -euo pipefail
33
34
# int4 (Marlin) by default — the single biggest decode lever on Ampere (~4× less
35
# weight bandwidth). Override MODEL=Qwen/Qwen2.5-3B-Instruct for plain fp16.
36
-MODEL="${MODEL:-Qwen/Qwen2.5-3B-Instruct}"
+MODEL="${MODEL:-Qwen/Qwen2.5-3B-Instruct-AWQ}"
37
PORT="${PORT:-9000}"
38
# KV-cache budget for cross-request chunk reuse. The int4 model is only ~2 GB
39
# (vs ~6 GB fp16) so on a 12 GB 3060 there's far more room for cache → bigger
0 commit comments