Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -72,9 +72,10 @@ def _get_node_resource_spec_and_count() -> Dict[_NodeResourceSpec, int]:
for node in ray.nodes()
if node["Alive"] and "node:__internal_head__" not in node["Resources"]
]

for r in node_resources:
node_resource_spec = _NodeResourceSpec.of(
cpu=r["CPU"], gpu=r.get("GPU", 0), mem=r["memory"]
cpu=r.get("CPU", 0), gpu=r.get("GPU", 0), mem=r.get("memory", 0)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a regression test? e.g., a test where ray.nodes() returns nodes with no logical resources?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey yes that makes sense , sorry for missing that , I will add that first thing tomorrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I am sorry I know this is the right platform for all these questions,

  1. Is there a specific Python version (e.g., 3.9, 3.10) that is most stable/recommended for development right now?
  2. I noticed the dev guide covers Linux, macOS, and Windows. Given that Ray has a complex C++ backend with OS-level dependencies, how does the project ensure changes on one OS don't break others? Is there anything specific I should watch out for on macOS?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, no need to apologize for asking questions!

I'd recommend using 3.10, because it's the lowest supported version.

I think we test against different operating systems as part of our release process. AFAIK most Ray Data devs use Mac, so you should be good on that front

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the test as needed please review again, sorry for the delay was facing issues with python 3.13 for deps installation for testing but switching to 3.10 helped. Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad for using 2 github accounts will be using this one from now on

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nw!

nodes_resource_spec_count[node_resource_spec] += 1

Expand Down
41 changes: 38 additions & 3 deletions python/ray/data/tests/test_default_cluster_autoscaler_v2.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,9 +109,12 @@ def test_get_node_resource_spec_and_count(self):
}

# Patch cluster config to return None
with patch("ray.nodes", return_value=node_table), patch(
"ray._private.state.state.get_cluster_config",
return_value=None,
with (
patch("ray.nodes", return_value=node_table),
patch(
"ray._private.state.state.get_cluster_config",
return_value=None,
),
):
assert _get_node_resource_spec_and_count() == expected

Expand Down Expand Up @@ -312,6 +315,38 @@ def test_get_node_resource_spec_and_count_skips_max_count_zero(self):
result = _get_node_resource_spec_and_count()
assert result == expected

def test_get_node_resource_spec_and_count_missing_all_resources(self):
"""Regression test for nodes with empty resources (ie missing CPU, GPU, and memory keys entirely)."""

# Simulate a node with no standard resources defined
node_empty_resources = {
"Alive": True,
"Resources": {
"dummy_resource": 1,
},
}

node_table = [
{
"Resources": self._head_node,
"Alive": True,
},
node_empty_resources,
]

# Expect everything to default to 0
expected = {_NodeResourceSpec.of(cpu=0, gpu=0, mem=0): 1}

with (
patch("ray.nodes", return_value=node_table),
patch(
"ray._private.state.state.get_cluster_config",
return_value=None,
),
):
result = _get_node_resource_spec_and_count()
assert result == expected


if __name__ == "__main__":
import sys
Expand Down