Skip to content

[Data] Cleaning up ResourceManager and ReservationOpResourceAllocator#60273

Merged
alexeykudinkin merged 24 commits intomasterfrom
ak/rm-div-rec
Jan 31, 2026
Merged

[Data] Cleaning up ResourceManager and ReservationOpResourceAllocator#60273
alexeykudinkin merged 24 commits intomasterfrom
ak/rm-div-rec

Conversation

@alexeykudinkin
Copy link
Contributor

@alexeykudinkin alexeykudinkin commented Jan 18, 2026

Description

Cleaning up ResourceManager

  • Cleaning up methods duplication
  • Fixing _should_unblock_streaming_output_backpressure semantic
  • Abstracting common _is_blocking_materializing_op util to determine if operation is a blocking materializing op

Cleaning up ReservationOpResourceAllocator

  • Adjusting can_submit_new_task to check for available Object Store when launching tasks

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request provides a solid cleanup of the ResourceManager and related components. The changes clarify the handling of materializing operators, refactor memory usage accounting, and centralize logic for completed operators. The removal of object store memory estimation when no samples are available simplifies the code. The test updates are also comprehensive and improve test quality by using more realistic mocks.

I have a couple of suggestions to improve code clarity by avoiding shadowed variables in list/generator comprehensions.

@ray-gardener ray-gardener bot added the data Ray Data-related issues label Jan 18, 2026
@alexeykudinkin alexeykudinkin added regression go add ONLY when ready to merge, run all tests and removed regression labels Jan 20, 2026
cursor[bot]

This comment was marked as outdated.

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Revisiting selection of eligible ops

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>

# Conflicts:
#	python/ray/data/_internal/execution/resource_manager.py

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
…bytes`

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Updated usages

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
@alexeykudinkin alexeykudinkin changed the title [WIP][Data] Cleaning up ResourceManager and ReservationOpResourceAllocator [Data] Cleaning up ResourceManager and ReservationOpResourceAllocator Jan 30, 2026
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

logger.warning(msg)

def __init__(self, topology: "Topology"):
self._topology = topology
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing rate limiting update in idle detection

Medium Severity

The detect_idle method no longer updates last_detection_time[op] when detecting idle state. Previously, this timestamp was updated in both branches (active and idle). Without this update, after the first idle detection, every subsequent call will pass the interval check because last_detection_time remains stale, breaking the rate-limiting behavior. This causes print_warning_if_idle_for_too_long to be called repeatedly and _should_unblock_streaming_output_backpressure to always return True for idle operators.

Fix in Cursor Fix in Web

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect. I've added a test covering expected behavior for IdleDetector

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
@alexeykudinkin alexeykudinkin enabled auto-merge (squash) January 30, 2026 23:28
@alexeykudinkin alexeykudinkin merged commit b570dc2 into master Jan 31, 2026
7 checks passed
@alexeykudinkin alexeykudinkin deleted the ak/rm-div-rec branch January 31, 2026 00:05
400Ping pushed a commit to 400Ping/ray that referenced this pull request Feb 1, 2026
…tor` (ray-project#60273)

## Description

Cleaning up `ResourceManager`
 - Cleaning up methods duplication
 - Fixing `_should_unblock_streaming_output_backpressure` semantic
- Abstracting common `_is_blocking_materializing_op` util to determine
if operation is a blocking materializing op

Cleaning up `ReservationOpResourceAllocator`
- Adjusting `can_submit_new_task` to check for available Object Store
when launching tasks

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: 400Ping <jiekaichang@apache.org>
rayhhome pushed a commit to rayhhome/ray that referenced this pull request Feb 4, 2026
…tor` (ray-project#60273)

## Description

Cleaning up `ResourceManager`
 - Cleaning up methods duplication
 - Fixing `_should_unblock_streaming_output_backpressure` semantic
- Abstracting common `_is_blocking_materializing_op` util to determine
if operation is a blocking materializing op

Cleaning up `ReservationOpResourceAllocator`
- Adjusting `can_submit_new_task` to check for available Object Store
when launching tasks

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Sirui Huang <ray.huang@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ray fails to serialize self-reference objects

2 participants