Skip to content

Commit d8364e0

Browse files
committed
gemma-4-31B-it: drop context-length and max-running-requests caps
Lets sglang auto-detect gemma4's native context (was capped at 32768, the source of upstream 400s for long prompts) and removes the explicit --max-running-requests 64 limit so the scheduler sizes concurrency from available KV instead.
1 parent 91e6917 commit d8364e0

1 file changed

Lines changed: 0 additions & 2 deletions

File tree

small-models.yaml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -515,8 +515,6 @@ services:
515515
--reasoning-parser gemma4
516516
--tool-call-parser gemma4
517517
--mem-fraction-static 0.85
518-
--max-running-requests 64
519-
--context-length 32768
520518
--chunked-prefill-size 8192
521519
--num-continuous-decode-steps 5
522520
--enable-mixed-chunk

0 commit comments

Comments
 (0)