Skip to content

Fix disagg PD bootstrap and KV transfer metrics#19009

Open
Kangyan-Zhou wants to merge 1 commit intosgl-project:mainfrom
Kangyan-Zhou:fix_metrics
Open

Fix disagg PD bootstrap and KV transfer metrics#19009
Kangyan-Zhou wants to merge 1 commit intosgl-project:mainfrom
Kangyan-Zhou:fix_metrics

Conversation

@Kangyan-Zhou
Copy link
Collaborator

Summary

  • Add bootstrap_done_time to TimeStats and correctly compute bootstrap_duration and alloc_waiting_duration on both prefill and decode paths (replacing the previous # TODO: correct set them placeholder)
  • Add KV transfer latency, size, and speed metrics computation in the disagg prefill inflight queue processing
  • Add missing log_prefill_stats call in the disagg prefill batch result path to match the non-disagg path

Test plan

  • Verify metrics are correctly reported in Prometheus on a PD disaggregation setup
  • Confirm bootstrap_duration and alloc_waiting_duration have reasonable values under normal load
  • Confirm aborted requests are excluded from KV transfer metrics

🤖 Generated with Claude Code

Add bootstrap_done_time to TimeStats and correctly compute
bootstrap_duration, alloc_waiting_duration, and KV transfer
latency/speed/size metrics for both prefill and decode paths.
Also add missing log_prefill_stats call in the disagg prefill path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Kangyan-Zhou, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the observability of disaggregated Paging and Data (PD) operations by refining performance metrics. It introduces precise tracking of the bootstrap completion time, enabling more accurate calculation of bootstrap and allocation waiting durations. Furthermore, it adds crucial metrics for KV transfer, including latency, size, and speed, specifically for disaggregated prefill requests. These improvements provide deeper insights into the performance bottlenecks and overall efficiency of disaggregated setups.

Highlights

  • Bootstrap Completion Time Tracking: Introduced bootstrap_done_time in TimeStats to accurately mark when the bootstrap phase completes for both prefill and decode requests.
  • Accurate Duration Calculation: Implemented correct calculation of bootstrap_duration and alloc_waiting_duration by utilizing the new bootstrap_done_time.
  • KV Transfer Metrics: Added new metrics to measure KV transfer latency, total size, and speed for disaggregated prefill requests, excluding aborted ones.
  • Prefill Stats Logging: Ensured log_prefill_stats is called for disaggregated prefill batch results, aligning its behavior with non-disaggregated paths.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • python/sglang/srt/disaggregation/decode.py
    • Recorded bootstrap_done_time when a decode request enters the WaitingForInput state.
    • Calculated bootstrap_duration and alloc_waiting_duration when a preallocated decode request is processed.
  • python/sglang/srt/disaggregation/prefill.py
    • Imported FINISH_ABORT to correctly handle aborted requests in metrics calculations.
    • Recorded bootstrap_done_time for prefill requests upon reaching the WaitingForInput state.
    • Calculated bootstrap_duration and alloc_waiting_duration for bootstrapped prefill requests.
    • Added a call to log_prefill_stats within the disaggregated prefill batch result processing.
    • Implemented logic to compute KV transfer latency, total size, and speed for completed disaggregated prefill requests, filtering out aborted ones.
  • python/sglang/srt/metrics/collector.py
    • Added bootstrap_done_time as a new float field to the TimeStats class.
Activity
  • No human activity (comments, reviews, etc.) has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request correctly addresses the missing metrics for disaggregated prefill and decode paths, replacing placeholders with actual computations for bootstrap and allocation waiting durations. It also adds KV transfer metrics (latency, size, and speed) and ensures prefill stats are logged in the disaggregated path. However, there are a few issues regarding metric reporting consistency and missing field definitions in the TimeStats class that should be addressed to ensure robust monitoring.

Comment on lines +642 to +664
self.kv_transfer_latency_ms = transfer_latency_s * 1000

# Transfer size
num_tokens = len(req.origin_input_ids)
num_pages = kv_to_page_num(num_tokens, page_size)
total_bytes = bytes_per_page_all_layers * num_pages
total_mb = total_bytes / (1024 * 1024)
self.kv_transfer_total_mb = total_mb
ts.transfer_total_mb = total_mb

# Transfer speed
if transfer_latency_s > 0:
speed = (total_mb / 1024) / transfer_latency_s
self.kv_transfer_speed_gb_s = speed
ts.transfer_speed_gb_s = speed

# Bootstrap and alloc durations
if (
ts.prefill_bootstrap_queue_entry_time > 0
and ts.wait_queue_entry_time > 0
):
self.kv_transfer_bootstrap_ms = ts.bootstrap_duration * 1000
self.kv_transfer_alloc_ms = ts.alloc_waiting_duration * 1000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The metrics self.kv_transfer_latency_ms, self.kv_transfer_total_mb, self.kv_transfer_speed_gb_s, self.kv_transfer_bootstrap_ms, and self.kv_transfer_alloc_ms are being overwritten in a loop for each request in done_reqs. If multiple requests finish in the same iteration, the scheduler's state will only reflect the metrics of the last request processed. These should likely be observed in a histogram via self.metrics_collector inside the loop, or aggregated if batch-level metrics are intended.

Comment on lines +70 to +72
bootstrap_done_time: float = (
0.0 # When bootstrap completes (poll -> WaitingForInput)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The fields transfer_total_mb and transfer_speed_gb_s are assigned to TimeStats instances in prefill.py (lines 650 and 656) but are not defined in the TimeStats class. These should be added to the class definition to ensure they are properly handled by any logic that iterates over the dataclass fields (e.g., serialization or logging).

Suggested change
bootstrap_done_time: float = (
0.0 # When bootstrap completes (poll -> WaitingForInput)
)
bootstrap_done_time: float = (
0.0 # When bootstrap completes (poll -> WaitingForInput)
)
transfer_total_mb: float = 0.0
transfer_speed_gb_s: float = 0.0

Copy link
Collaborator

@ShangmingCai ShangmingCai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments