Commit 6f44b7e
fix: limit GLM-5 max running requests and update sglang image
Add --max-running-requests 16 to prevent server from hanging under load
(EAGLE speculative decoding default of 48 is too aggressive at 90% memory).
Update sglang image from glm5-hopper to glm5-hopper-patched (Feb 25).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>1 parent 639ac0d commit 6f44b7e
1 file changed
Lines changed: 2 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
65 | 65 | | |
66 | 66 | | |
67 | 67 | | |
68 | | - | |
| 68 | + | |
69 | 69 | | |
70 | 70 | | |
71 | 71 | | |
| |||
79 | 79 | | |
80 | 80 | | |
81 | 81 | | |
| 82 | + | |
82 | 83 | | |
83 | 84 | | |
84 | 85 | | |
| |||
0 commit comments