Fix disagg PD bootstrap and KV transfer metrics by Kangyan-Zhou · Pull Request #19009 · sgl-project/sglang

Kangyan-Zhou · 2026-02-19T12:07:18Z

Summary

Add bootstrap_done_time to TimeStats and correctly compute bootstrap_duration and alloc_waiting_duration on both prefill and decode paths (replacing the previous # TODO: correct set them placeholder)
Add KV transfer latency, size, and speed metrics computation in the disagg prefill inflight queue processing
Add missing log_prefill_stats call in the disagg prefill batch result path to match the non-disagg path

Test plan

Verify metrics are correctly reported in Prometheus on a PD disaggregation setup
Confirm bootstrap_duration and alloc_waiting_duration have reasonable values under normal load
Confirm aborted requests are excluded from KV transfer metrics

🤖 Generated with Claude Code

Add bootstrap_done_time to TimeStats and correctly compute bootstrap_duration, alloc_waiting_duration, and KV transfer latency/speed/size metrics for both prefill and decode paths. Also add missing log_prefill_stats call in the disagg prefill path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gemini-code-assist · 2026-02-19T12:07:36Z

Summary of Changes

Hello @Kangyan-Zhou, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the observability of disaggregated Paging and Data (PD) operations by refining performance metrics. It introduces precise tracking of the bootstrap completion time, enabling more accurate calculation of bootstrap and allocation waiting durations. Furthermore, it adds crucial metrics for KV transfer, including latency, size, and speed, specifically for disaggregated prefill requests. These improvements provide deeper insights into the performance bottlenecks and overall efficiency of disaggregated setups.

Highlights

Bootstrap Completion Time Tracking: Introduced bootstrap_done_time in TimeStats to accurately mark when the bootstrap phase completes for both prefill and decode requests.
Accurate Duration Calculation: Implemented correct calculation of bootstrap_duration and alloc_waiting_duration by utilizing the new bootstrap_done_time.
KV Transfer Metrics: Added new metrics to measure KV transfer latency, total size, and speed for disaggregated prefill requests, excluding aborted ones.
Prefill Stats Logging: Ensured log_prefill_stats is called for disaggregated prefill batch results, aligning its behavior with non-disaggregated paths.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

python/sglang/srt/disaggregation/decode.py
- Recorded bootstrap_done_time when a decode request enters the WaitingForInput state.
- Calculated bootstrap_duration and alloc_waiting_duration when a preallocated decode request is processed.
python/sglang/srt/disaggregation/prefill.py
- Imported FINISH_ABORT to correctly handle aborted requests in metrics calculations.
- Recorded bootstrap_done_time for prefill requests upon reaching the WaitingForInput state.
- Calculated bootstrap_duration and alloc_waiting_duration for bootstrapped prefill requests.
- Added a call to log_prefill_stats within the disaggregated prefill batch result processing.
- Implemented logic to compute KV transfer latency, total size, and speed for completed disaggregated prefill requests, filtering out aborted ones.
python/sglang/srt/metrics/collector.py
- Added bootstrap_done_time as a new float field to the TimeStats class.

Activity

No human activity (comments, reviews, etc.) has been recorded on this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The pull request correctly addresses the missing metrics for disaggregated prefill and decode paths, replacing placeholders with actual computations for bootstrap and allocation waiting durations. It also adds KV transfer metrics (latency, size, and speed) and ensures prefill stats are logged in the disaggregated path. However, there are a few issues regarding metric reporting consistency and missing field definitions in the TimeStats class that should be addressed to ensure robust monitoring.

gemini-code-assist · 2026-02-19T12:10:03Z

python/sglang/srt/disaggregation/prefill.py

+                    self.kv_transfer_latency_ms = transfer_latency_s * 1000
+
+                    # Transfer size
+                    num_tokens = len(req.origin_input_ids)
+                    num_pages = kv_to_page_num(num_tokens, page_size)
+                    total_bytes = bytes_per_page_all_layers * num_pages
+                    total_mb = total_bytes / (1024 * 1024)
+                    self.kv_transfer_total_mb = total_mb
+                    ts.transfer_total_mb = total_mb
+
+                    # Transfer speed
+                    if transfer_latency_s > 0:
+                        speed = (total_mb / 1024) / transfer_latency_s
+                        self.kv_transfer_speed_gb_s = speed
+                        ts.transfer_speed_gb_s = speed
+
+                # Bootstrap and alloc durations
+                if (
+                    ts.prefill_bootstrap_queue_entry_time > 0
+                    and ts.wait_queue_entry_time > 0
+                ):
+                    self.kv_transfer_bootstrap_ms = ts.bootstrap_duration * 1000
+                    self.kv_transfer_alloc_ms = ts.alloc_waiting_duration * 1000


The metrics self.kv_transfer_latency_ms, self.kv_transfer_total_mb, self.kv_transfer_speed_gb_s, self.kv_transfer_bootstrap_ms, and self.kv_transfer_alloc_ms are being overwritten in a loop for each request in done_reqs. If multiple requests finish in the same iteration, the scheduler's state will only reflect the metrics of the last request processed. These should likely be observed in a histogram via self.metrics_collector inside the loop, or aggregated if batch-level metrics are intended.

gemini-code-assist · 2026-02-19T12:10:03Z

python/sglang/srt/metrics/collector.py

+    bootstrap_done_time: float = (
+        0.0  # When bootstrap completes (poll -> WaitingForInput)
+    )


The fields transfer_total_mb and transfer_speed_gb_s are assigned to TimeStats instances in prefill.py (lines 650 and 656) but are not defined in the TimeStats class. These should be added to the class definition to ensure they are properly handled by any logic that iterates over the dataclass fields (e.g., serialization or logging).

Suggested change

bootstrap_done_time: float = (

0.0 # When bootstrap completes (poll -> WaitingForInput)

)

bootstrap_done_time: float = (

0.0 # When bootstrap completes (poll -> WaitingForInput)

)

transfer_total_mb: float = 0.0

transfer_speed_gb_s: float = 0.0

ShangmingCai

LGTM

Kangyan-Zhou requested review from ByronHsu, ShangmingCai and hnyls2002 as code owners February 19, 2026 12:07

gemini-code-assist bot reviewed Feb 19, 2026

View reviewed changes

ShangmingCai approved these changes Feb 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix disagg PD bootstrap and KV transfer metrics#19009

Fix disagg PD bootstrap and KV transfer metrics#19009
Kangyan-Zhou wants to merge 1 commit intosgl-project:mainfrom
Kangyan-Zhou:fix_metrics

Kangyan-Zhou commented Feb 19, 2026

Uh oh!

gemini-code-assist bot commented Feb 19, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 19, 2026

Uh oh!

gemini-code-assist bot Feb 19, 2026

Uh oh!

ShangmingCai left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

Kangyan-Zhou commented Feb 19, 2026

Summary

Test plan

Uh oh!

gemini-code-assist bot commented Feb 19, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

ShangmingCai left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments