Skip to content

[Data] Move estimate_object_store_usage logic into physical op#63961

Open
owenowenisme wants to merge 3 commits into
ray-project:masterfrom
owenowenisme:data/plasma-accounting-method-dispatch
Open

[Data] Move estimate_object_store_usage logic into physical op#63961
owenowenisme wants to merge 3 commits into
ray-project:masterfrom
owenowenisme:data/plasma-accounting-method-dispatch

Conversation

@owenowenisme

@owenowenisme owenowenisme commented Jun 9, 2026

Copy link
Copy Markdown
Member

Description

Some Operators (e.g. ShuffleMap) need to declare object store accounting that differs from the framework's generic model — their outputs are pipeline-internal intermediates that shouldn't count against the global budget.

This PR refactors ResourceManager._estimate_object_store_memory_usage into PhysicalOperator.estimate_object_store_usage so ops can cleanly override their own accounting. The base implementation matches the existing inline logic, so no behavior change for existing ops.

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

Signed-off-by: You-Cheng Lin <mses010108@gmail.com>
@owenowenisme owenowenisme requested a review from a team as a code owner June 9, 2026 16:54
@owenowenisme owenowenisme added data Ray Data-related issues go add ONLY when ready to merge, run all tests labels Jun 9, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the object store memory estimation logic by moving the estimation code from resource_manager.py into a new method estimate_object_store_usage on the PhysicalOperator class. The review feedback points out a potential type mismatch in the return value of this new method, where a float could be returned instead of an integer, and suggests casting the metric value to int to align with the Tuple[int, int] return type annotation.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread python/ray/data/_internal/execution/interfaces/physical_operator.py
Signed-off-by: You-Cheng Lin <mses010108@gmail.com>

@edoakes edoakes left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, much better IMO. Consider writing a unit test for a fake operator that overrides the implementation. I don't know how hard that would be :)

Comment thread python/ray/data/_internal/execution/interfaces/physical_operator.py Outdated
Signed-off-by: You-Cheng Lin <mses010108@gmail.com>

@edoakes edoakes left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let’s have @bveeramani do a quick pass though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ray fails to serialize self-reference objects

2 participants