Skip to content

Commit 6dc454d

Browse files
authored
[main][CI] update nightly DeepSeek-V3_2-W8A8-EP (#11024)
### What this PR does / why we need it? This PR updates the GPU memory utilization configuration for the nightly `DeepSeek-V3_2-W8A8-EP` multi-node test. Specifically, it reduces the `--gpu-memory-utilization` from `0.95` and `0.92` to `0.90` across different deployment configurations. This change helps prevent potential Out-Of-Memory (OOM) issues during nightly test runs. ### Does this PR introduce _any_ user-facing change? No. This is a test configuration update and does not affect any user-facing APIs or behaviors. ### How was this patch tested? Tested via nightly CI multi-node end-to-end tests. - vLLM version: v0.23.0 - vLLM main: vllm-project/vllm@967c5c3 Signed-off-by: pppeng <372907983@qq.com>
1 parent 7baeb54 commit 6dc454d

1 file changed

Lines changed: 4 additions & 4 deletions

File tree

tests/e2e/nightly/multi_node/internal_dp/config/DeepSeek-V3_2-W8A8-EP.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ deployment:
4949
--max-model-len 133000
5050
--max-num-batched-tokens 8192
5151
--trust-remote-code
52-
--gpu-memory-utilization 0.95
52+
--gpu-memory-utilization 0.90
5353
--enforce-eager
5454
--no-enable-prefix-caching
5555
--additional-config '{"enable_cpu_binding" : false, "enable_sfa_cp":false,"layer_sharding": ["q_b_proj", "o_proj"]}'
@@ -95,7 +95,7 @@ deployment:
9595
--max-model-len 133000
9696
--max-num-batched-tokens 8192
9797
--trust-remote-code
98-
--gpu-memory-utilization 0.95
98+
--gpu-memory-utilization 0.90
9999
--enforce-eager
100100
--no-enable-prefix-caching
101101
--additional-config '{"enable_cpu_binding" : false, "enable_sfa_cp":false,"layer_sharding": ["q_b_proj", "o_proj"]}'
@@ -141,7 +141,7 @@ deployment:
141141
--compilation-config '{"cudagraph_mode":"FULL_DECODE_ONLY", "cudagraph_capture_sizes":[3,6,9,12,15,18,21,24,27,30,33,36,39,42]}'
142142
--trust-remote-code
143143
--max-num-seqs 14
144-
--gpu-memory-utilization 0.92
144+
--gpu-memory-utilization 0.90
145145
--no-enable-prefix-caching
146146
--additional-config '{"enable_cpu_binding" : false,"recompute_scheduler_enable" : true}'
147147
--tokenizer-mode deepseek_v32
@@ -188,7 +188,7 @@ deployment:
188188
--compilation-config '{"cudagraph_mode":"FULL_DECODE_ONLY", "cudagraph_capture_sizes":[3,6,9,12,15,18,21,24,27,30,33,36,39,42]}'
189189
--trust-remote-code
190190
--max-num-seqs 14
191-
--gpu-memory-utilization 0.92
191+
--gpu-memory-utilization 0.90
192192
--no-enable-prefix-caching
193193
--additional-config '{"enable_cpu_binding" : false,"recompute_scheduler_enable" : true}'
194194
--tokenizer-mode deepseek_v32

0 commit comments

Comments
 (0)