Skip to content

fix: warn about non-obvious chunked prefill budget behavior#29531

Open
ntny wants to merge 1 commit into
sgl-project:mainfrom
ntny:fix-chunked-prefill-budget-warning
Open

fix: warn about non-obvious chunked prefill budget behavior#29531
ntny wants to merge 1 commit into
sgl-project:mainfrom
ntny:fix-chunked-prefill-budget-warning

Conversation

@ntny

@ntny ntny commented Jun 27, 2026

Copy link
Copy Markdown

Motivation

chunked_prefill_size can be read as only a per-request chunk size, while max_prefill_tokens looks like the batch-level budget. In practice, when chunked prefill is enabled, both parameters cap the prefill batch budget, and the effective limit is min(max_prefill_tokens, chunked_prefill_size).

Modifications

Added a startup warning when max_prefill_tokens > chunked_prefill_size to make the effective prefill budget explicit.

Accuracy Tests

Not applicable. This change only adds a warning.

Speed Tests and Profiling

Not applicable. This change only adds a warning.

Checklist

Not applicable. This change only adds a warning.


CI States

Latest PR Test (Base): ⏳ Run #28304069792
Latest PR Test (Extra): ⏳ Run #28304069702

chunked_prefill_size is easy to read as only a per-request chunk size, while max_prefill_tokens looks like the batch-level budget. In practice, when chunked prefill is enabled, both parameters cap the prefill batch budget and the effective limit is min(max_prefill_tokens, chunked_prefill_size).

Add a warning for the confusing case where max_prefill_tokens is greater than chunked_prefill_size, so users can understand the non-obvious effective limit from startup logs.

Signed-off-by: Anton Pechenin <ntny1986@gmail.com>
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant