Skip to content

spilling peak and average#60809

Draft
xinyuangui2 wants to merge 1 commit intoray-project:masterfrom
xinyuangui2:spilling-peak
Draft

spilling peak and average#60809
xinyuangui2 wants to merge 1 commit intoray-project:masterfrom
xinyuangui2:spilling-peak

Conversation

@xinyuangui2
Copy link
Contributor

Thank you for contributing to Ray! 🚀
Please review the Ray Contribution Guide before opening a pull request.

⚠️ Remove these instructions before submitting your PR.

💡 Tip: Mark as draft if you want early feedback, or ready for review when it's complete.

Description

Briefly describe what this PR accomplishes and why it's needed.

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

Signed-off-by: xgui <xgui@anyscale.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a SpillMetricsMonitor actor to compute and report peak and average object store spilling rates, which is a valuable addition for performance monitoring in benchmarks. The implementation uses a detached Ray actor with a background polling thread, which is a suitable design. The integration into RayDataLoaderFactory is clean. My review includes a couple of suggestions to enhance the robustness of the metric calculation and improve code consistency.

)
return memory_info.store_stats.spilled_bytes_total

def _poll_loop(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The _poll_loop method is missing a return type hint. For consistency with the rest of the codebase's type annotations, it should be specified. Since this method doesn't return a value, the hint should be -> None.

Suggested change
def _poll_loop(self):
def _poll_loop(self) -> None:

Comment on lines +62 to +65
if delta_time > 0:
rate_gb_s = (delta_bytes / (1024**3)) / delta_time
with self._lock:
self._spill_rates_gb_s.append(rate_gb_s)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The spilled_bytes_total counter could theoretically reset (e.g., on GCS restart), which would cause delta_bytes to be negative. This would result in a negative spill rate being recorded, skewing the average calculation. It's safer to only calculate the rate for non-negative delta_bytes.

Suggested change
if delta_time > 0:
rate_gb_s = (delta_bytes / (1024**3)) / delta_time
with self._lock:
self._spill_rates_gb_s.append(rate_gb_s)
if delta_time > 0 and delta_bytes >= 0:
rate_gb_s = (delta_bytes / (1024**3)) / delta_time
with self._lock:
self._spill_rates_gb_s.append(rate_gb_s)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ray fails to serialize self-reference objects

1 participant