Skip to content

Commit 3f7b266

Browse files
Evrard-Nilclaude
andcommitted
fix: reduce gpt-oss-120b GPU memory utilization to 0.90
Lower --gpu-memory-utilization from 0.95 to 0.90 to address CUDA OOM crashes in vllm-gpt-oss containers under load. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent d92a32e commit 3f7b266

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

small-models.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ x-gpt-oss-common: &gpt-oss-common
7070
command: >
7171
openai/gpt-oss-120b
7272
--tensor-parallel-size 1
73-
--gpu-memory-utilization 0.95
73+
--gpu-memory-utilization 0.90
7474
--enable-prefix-caching
7575
--async-scheduling
7676
--max-num-seqs 64

0 commit comments

Comments
 (0)