Fix disagg PD bootstrap and KV transfer metrics#19009
Fix disagg PD bootstrap and KV transfer metrics#19009Kangyan-Zhou wants to merge 1 commit intosgl-project:mainfrom
Conversation
Add bootstrap_done_time to TimeStats and correctly compute bootstrap_duration, alloc_waiting_duration, and KV transfer latency/speed/size metrics for both prefill and decode paths. Also add missing log_prefill_stats call in the disagg prefill path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary of ChangesHello @Kangyan-Zhou, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the observability of disaggregated Paging and Data (PD) operations by refining performance metrics. It introduces precise tracking of the bootstrap completion time, enabling more accurate calculation of bootstrap and allocation waiting durations. Furthermore, it adds crucial metrics for KV transfer, including latency, size, and speed, specifically for disaggregated prefill requests. These improvements provide deeper insights into the performance bottlenecks and overall efficiency of disaggregated setups. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request correctly addresses the missing metrics for disaggregated prefill and decode paths, replacing placeholders with actual computations for bootstrap and allocation waiting durations. It also adds KV transfer metrics (latency, size, and speed) and ensures prefill stats are logged in the disaggregated path. However, there are a few issues regarding metric reporting consistency and missing field definitions in the TimeStats class that should be addressed to ensure robust monitoring.
| self.kv_transfer_latency_ms = transfer_latency_s * 1000 | ||
|
|
||
| # Transfer size | ||
| num_tokens = len(req.origin_input_ids) | ||
| num_pages = kv_to_page_num(num_tokens, page_size) | ||
| total_bytes = bytes_per_page_all_layers * num_pages | ||
| total_mb = total_bytes / (1024 * 1024) | ||
| self.kv_transfer_total_mb = total_mb | ||
| ts.transfer_total_mb = total_mb | ||
|
|
||
| # Transfer speed | ||
| if transfer_latency_s > 0: | ||
| speed = (total_mb / 1024) / transfer_latency_s | ||
| self.kv_transfer_speed_gb_s = speed | ||
| ts.transfer_speed_gb_s = speed | ||
|
|
||
| # Bootstrap and alloc durations | ||
| if ( | ||
| ts.prefill_bootstrap_queue_entry_time > 0 | ||
| and ts.wait_queue_entry_time > 0 | ||
| ): | ||
| self.kv_transfer_bootstrap_ms = ts.bootstrap_duration * 1000 | ||
| self.kv_transfer_alloc_ms = ts.alloc_waiting_duration * 1000 |
There was a problem hiding this comment.
The metrics self.kv_transfer_latency_ms, self.kv_transfer_total_mb, self.kv_transfer_speed_gb_s, self.kv_transfer_bootstrap_ms, and self.kv_transfer_alloc_ms are being overwritten in a loop for each request in done_reqs. If multiple requests finish in the same iteration, the scheduler's state will only reflect the metrics of the last request processed. These should likely be observed in a histogram via self.metrics_collector inside the loop, or aggregated if batch-level metrics are intended.
| bootstrap_done_time: float = ( | ||
| 0.0 # When bootstrap completes (poll -> WaitingForInput) | ||
| ) |
There was a problem hiding this comment.
The fields transfer_total_mb and transfer_speed_gb_s are assigned to TimeStats instances in prefill.py (lines 650 and 656) but are not defined in the TimeStats class. These should be added to the class definition to ensure they are properly handled by any logic that iterates over the dataclass fields (e.g., serialization or logging).
| bootstrap_done_time: float = ( | |
| 0.0 # When bootstrap completes (poll -> WaitingForInput) | |
| ) | |
| bootstrap_done_time: float = ( | |
| 0.0 # When bootstrap completes (poll -> WaitingForInput) | |
| ) | |
| transfer_total_mb: float = 0.0 | |
| transfer_speed_gb_s: float = 0.0 |
Summary
bootstrap_done_timetoTimeStatsand correctly computebootstrap_durationandalloc_waiting_durationon both prefill and decode paths (replacing the previous# TODO: correct set themplaceholder)log_prefill_statscall in the disagg prefill batch result path to match the non-disagg pathTest plan
bootstrap_durationandalloc_waiting_durationhave reasonable values under normal load🤖 Generated with Claude Code