Skip to content

Enable chunked decode in the routing proxy#603

Draft
andreyod wants to merge 1 commit intollm-d:mainfrom
andreyod:ch-decode
Draft

Enable chunked decode in the routing proxy#603
andreyod wants to merge 1 commit intollm-d:mainfrom
andreyod:ch-decode

Conversation

@andreyod
Copy link
Contributor

@andreyod andreyod commented Feb 9, 2026

Add in the routing proxy sidecar ability to run the decode request in chunks by setting max_completion_tokens as the output chunk size. This will allow to break a long decode requests in to multiple smaller requests.
When all the iterations are done, single response will be returned.
This will allow more fairness and prevent head-of-line blocking.

@github-actions
Copy link

github-actions bot commented Feb 9, 2026

🚨 Unsigned commits detected! Please sign your commits.

For instructions on how to set up GPG/SSH signing and verify your commits, please see GitHub Documentation.

Signed-off-by: andreyod <andreyo@il.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant