Optimize TTFT: send first token immediately after prefill for streaming by RishabhSaini · Pull Request #701 · llm-d/llm-d-inference-scheduler

RishabhSaini · 2026-03-10T15:28:26Z

Reduces Time To First Token (TTFT) for streaming clients by sending the first token immediately after prefill completes, before KV cache transfer to decode.

Convert non-streaming prefill response to SSE and forward first token to streaming clients
Fix token budget: decrement max_tokens by 1 for decode stage
Fix TTFT metrics to measure when first token actually reaches user (streaming vs non-streaming)
Strip internal kv_transfer_params from user-facing responses
Add headersSentWriter to prevent duplicate WriteHeader errors

RishabhSaini · 2026-03-10T16:18:01Z

On H200s with 1P 1D (TP=2) GPT-OSS-120B model with always_pd_disagg_decider and on sanity_concurrent benchmark averaged across 3 runs for each:

Metric	Baseline (runs 1-3)	Optimized (runs 4-6)	Change	% Change
TTFT p50	50.24 ms	31.41 ms	-18.83 ms	-37.5%
TTFT p75	51.41 ms	32.57 ms	-18.84 ms	-36.6%
TTFT p99	65.00 ms	52.35 ms	-12.65 ms	-19.5%
TPOT p50	3.70 ms/token	3.99 ms/token	+0.29 ms	+7.8%
TPOT p75	3.71 ms/token	4.00 ms/token	+0.29 ms	+7.8%
TPOT p99	3.97 ms/token	5.96 ms/token	+1.99 ms	+50.1%

pkg/sidecar/proxy/connector_nixlv2.go

Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com>

vMaroon · 2026-03-12T20:13:57Z

How does this affect the standard UX of first staring at a blank screen then getting a fast stream of tokens? does the time between the first and 2nd tokens match the average ITL? Or would the user experience the equivalent of "lag"?

github-actions · 2026-04-04T01:40:35Z

This PR is marked as stale after 21d of inactivity. After an additional 14d of inactivity (7d to become rotten, then 7d more), it will be closed. To prevent this PR from being closed, add a comment or remove the lifecycle/stale label.

github-project-automation bot added this to llm-d-inference-scheduler Mar 10, 2026

github-actions bot requested review from elevran and kfswain March 10, 2026 15:28

kfswain reviewed Mar 10, 2026

View reviewed changes

pkg/sidecar/proxy/connector_nixlv2.go Outdated Show resolved Hide resolved

pkg/sidecar/proxy/connector_nixlv2.go Show resolved Hide resolved

pkg/sidecar/proxy/connector_nixlv2.go Outdated Show resolved Hide resolved

RishabhSaini force-pushed the streamPrefill branch 3 times, most recently from 8d4df8c to 528c60b Compare March 10, 2026 16:53

RishabhSaini requested a review from kfswain March 10, 2026 19:29

RishabhSaini added 2 commits March 11, 2026 19:42

non-steam prefill response -> sse event

2844b5a

Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com>

fix max token decrement and streaming logic

2b1a630

Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com>

RishabhSaini force-pushed the streamPrefill branch from 528c60b to b806520 Compare March 11, 2026 23:43

parallelize sending first token and decode request prep

f04f1ee

Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com>

RishabhSaini force-pushed the streamPrefill branch from b806520 to f04f1ee Compare March 12, 2026 18:45

github-actions bot added the lifecycle/stale label Apr 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize TTFT: send first token immediately after prefill for streaming#701

Optimize TTFT: send first token immediately after prefill for streaming#701
RishabhSaini wants to merge 3 commits intollm-d:mainfrom
RishabhSaini:streamPrefill

RishabhSaini commented Mar 10, 2026 •

edited

Loading

Uh oh!

RishabhSaini commented Mar 10, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vMaroon commented Mar 12, 2026

Uh oh!

github-actions bot commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

RishabhSaini commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RishabhSaini commented Mar 10, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vMaroon commented Mar 12, 2026

Uh oh!

github-actions bot commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

RishabhSaini commented Mar 10, 2026 •

edited

Loading