Skip to content

[WIP][Core][GPU fraction][6/n] Migrate Node struct and callers from original class to base class pointer type#63508

Open
dancingactor wants to merge 5 commits into
ray-project:masterfrom
dancingactor:newNodeResource_third
Open

[WIP][Core][GPU fraction][6/n] Migrate Node struct and callers from original class to base class pointer type#63508
dancingactor wants to merge 5 commits into
ray-project:masterfrom
dancingactor:newNodeResource_third

Conversation

@dancingactor
Copy link
Copy Markdown
Contributor

Thank you for contributing to Ray! 🚀
Please review the Ray Contribution Guide before opening a pull request.

⚠️ Remove these instructions before submitting your PR.

💡 Tip: Mark as draft if you want early feedback, or ready for review when it's complete.

Description

Briefly describe what this PR accomplishes and why it's needed.

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

@dancingactor dancingactor requested a review from a team as a code owner May 19, 2026 16:11
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a polymorphic resource management architecture by establishing a NodeResourcesBase abstract class and a NodeResourcesV2 implementation. The V2 implementation enables per-instance resource tracking (specifically for GPU fractional scheduling) using NodeResourceInstanceSet. The Node structure and ClusterResourceManager have been refactored to manage these resources via unique pointers. The review feedback identifies several critical issues, including unsafe static_cast operations that will cause undefined behavior when V2 nodes are enabled, object slicing in GetNodeResources resulting in data loss, and malformed JSON generation in the DebugString method. There is also a recommendation to replace manual type checking with a virtual Clone pattern to improve the robustness of polymorphic copying.

Comment thread src/ray/raylet/scheduling/cluster_resource_manager.cc Outdated
Comment thread src/ray/raylet/scheduling/cluster_resource_manager.cc Outdated
Comment thread src/ray/raylet/scheduling/cluster_resource_manager.cc
Comment thread src/ray/common/scheduling/cluster_resource_data.cc
Comment thread src/ray/common/scheduling/cluster_resource_data.h
@dancingactor dancingactor force-pushed the newNodeResource_third branch 2 times, most recently from 36cd79f to 1a5bd77 Compare May 19, 2026 16:13
Comment thread src/ray/raylet/scheduling/cluster_resource_manager.cc Outdated
Comment thread src/ray/common/scheduling/cluster_resource_data.h
@ray-gardener ray-gardener Bot added core Issues that should be addressed in Ray Core community-contribution Contributed by the community labels May 19, 2026
… wrapper methods

Signed-off-by: dancingactor <s990346@gmail.com>
…ass for branch-by-abstraction migration strategy

Signed-off-by: dancingactor <s990346@gmail.com>
…d related methods from scalar to per-instance view version

Signed-off-by: dancingactor <s990346@gmail.com>
…al class to base class pointer type

Signed-off-by: dancingactor <s990346@gmail.com>
2. Change GetNodeResources parameter from NodeResourcesBase back to NodeResources to avoid

Signed-off-by: dancingactor <s990346@gmail.com>
@dancingactor dancingactor force-pushed the newNodeResource_third branch from 1a5bd77 to c605a54 Compare May 20, 2026 17:10
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit c605a54. Configure here.

Comment thread src/ray/common/scheduling/cluster_resource_data.cc
Comment thread src/ray/raylet/scheduling/cluster_resource_manager.cc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community core Issues that should be addressed in Ray Core

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ray fails to serialize self-reference objects

1 participant