Skip to content

Commit 6f44b7e

Browse files
Evrard-Nilclaude
andcommitted
fix: limit GLM-5 max running requests and update sglang image
Add --max-running-requests 16 to prevent server from hanging under load (EAGLE speculative decoding default of 48 is too aggressive at 90% memory). Update sglang image from glm5-hopper to glm5-hopper-patched (Feb 25). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 639ac0d commit 6f44b7e

1 file changed

Lines changed: 2 additions & 1 deletion

File tree

GLM-5.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ services:
6565

6666
glm:
6767
<<: *vllm-common
68-
image: lmsysorg/sglang:glm5-hopper@sha256:e1876a9b43494fa8e0205f420db71e0e263081ed6da7173b30647d238a429bac
68+
image: lmsysorg/sglang:glm5-hopper-patched@sha256:abf8deb5e81cd7f942be8be10b1a92d4360d2f0a245b50ca8d9e27e9c05a98d6
6969
container_name: glm
7070
command: >
7171
sglang serve
@@ -79,6 +79,7 @@ services:
7979
--speculative-eagle-topk 1
8080
--speculative-num-draft-tokens 4
8181
--mem-fraction-static 0.90
82+
--max-running-requests 16
8283
--port 8000
8384
--host 0.0.0.0
8485
--enable-cache-report

0 commit comments

Comments
 (0)