[Data] Move estimate_object_store_usage logic into physical op#63961
[Data] Move estimate_object_store_usage logic into physical op#63961owenowenisme wants to merge 3 commits into
estimate_object_store_usage logic into physical op#63961Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the object store memory estimation logic by moving the estimation code from resource_manager.py into a new method estimate_object_store_usage on the PhysicalOperator class. The review feedback points out a potential type mismatch in the return value of this new method, where a float could be returned instead of an integer, and suggests casting the metric value to int to align with the Tuple[int, int] return type annotation.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
edoakes
left a comment
There was a problem hiding this comment.
Nice, much better IMO. Consider writing a unit test for a fake operator that overrides the implementation. I don't know how hard that would be :)
Signed-off-by: You-Cheng Lin <mses010108@gmail.com>
edoakes
left a comment
There was a problem hiding this comment.
LGTM, let’s have @bveeramani do a quick pass though
Description
Some Operators (e.g. ShuffleMap) need to declare object store accounting that differs from the framework's generic model — their outputs are pipeline-internal intermediates that shouldn't count against the global budget.
This PR refactors
ResourceManager._estimate_object_store_memory_usageintoPhysicalOperator.estimate_object_store_usageso ops can cleanly override their own accounting. The base implementation matches the existing inline logic, so no behavior change for existing ops.Related issues
Additional information