Commit d8364e0
committed
gemma-4-31B-it: drop context-length and max-running-requests caps
Lets sglang auto-detect gemma4's native context (was capped at 32768,
the source of upstream 400s for long prompts) and removes the explicit
--max-running-requests 64 limit so the scheduler sizes concurrency from
available KV instead.1 parent 91e6917 commit d8364e0
1 file changed
Lines changed: 0 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
515 | 515 | | |
516 | 516 | | |
517 | 517 | | |
518 | | - | |
519 | | - | |
520 | 518 | | |
521 | 519 | | |
522 | 520 | | |
| |||
0 commit comments