Skip to content

Implementing Null Handling(is_finite, is_inf, is_nan)#59876

Open
Rob12312368 wants to merge 7 commits intoray-project:masterfrom
Rob12312368:issue_58674
Open

Implementing Null Handling(is_finite, is_inf, is_nan)#59876
Rob12312368 wants to merge 7 commits intoray-project:masterfrom
Rob12312368:issue_58674

Conversation

@Rob12312368
Copy link
Contributor

@Rob12312368 Rob12312368 commented Jan 6, 2026

Description

as title suggested

Related issues

"Related to #58674".

Cursor Bugbot reviewed your changes and found no issues for commit 58b3306

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the is_nan expression for null handling, along with a corresponding test case. The implementation is a good start, but I've identified a high-severity issue with the return type of the is_nan function that could lead to schema inconsistencies. I've also suggested an improvement to the test coverage to handle more edge cases and a fix for an incorrect docstring in the test file. Addressing these points will improve the correctness and robustness of the new functionality.

expected_results,
test_id,
):
"""Test arithmetic helper expressions: negate, sign, power, abs."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The docstring for this test function appears to be copied from another test and is incorrect. This test is for null handling operations, not arithmetic expressions. Please update the docstring to accurately describe its purpose.

Suggested change
"""Test arithmetic helper expressions: negate, sign, power, abs."""
"""Test null handling helper expressions."""

Comment on lines 1668 to 1030
pytest.param(
[{"x": float("Nan")}, {"x": -3}, {"x": 0}],
lambda: col("x").is_nan(),
[True, False, False],
"is_nan",
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The test case for is_nan is a good start, but it could be more comprehensive. I suggest expanding it to include positive floats, infinity, negative infinity, and None (null) values to ensure the behavior is correct across more edge cases. pc.is_nan should return False for infinities and propagate nulls.

        pytest.param(
            [
                {"x": float("nan")},
                {"x": -3.0},
                {"x": 0.0},
                {"x": 3.14},
                {"x": float("inf")},
                {"x": float("-inf")},
                {"x": None},
            ],
            lambda: col("x").is_nan(),
            [True, False, False, False, False, False, None],
            "is_nan",
        ),

…or discussing fill_null

Signed-off-by: Rob12312368 <rob12312368@gmail.com>
@Rob12312368 Rob12312368 marked this pull request as ready for review January 19, 2026 04:06
@Rob12312368 Rob12312368 requested a review from a team as a code owner January 19, 2026 04:06
@Rob12312368 Rob12312368 changed the title Implementing Null Handling(fill_null, is_finite, is_inf, is_nan) Implementing Null Handling(is_finite, is_inf, is_nan) Jan 19, 2026
@Rob12312368
Copy link
Contributor Author

Hi, the ci did not pass, but I think it is unrelated to my patch. Should I do anything to proceed? @goutamvenkat-anyscale

@Rob12312368
Copy link
Contributor Author

Rob12312368 commented Jan 19, 2026

I am aware fill_null is missing. Will open another pr draft to discuss the approach.

@ray-gardener ray-gardener bot added data Ray Data-related issues community-contribution Contributed by the community labels Jan 19, 2026
@goutamvenkat-anyscale
Copy link
Contributor

Hi, the ci did not pass, but I think it is unrelated to my patch. Should I do anything to proceed? @goutamvenkat-anyscale

Thanks. Just merged master into your branch, hopefully there were some fixes to those affected tests.

The cursor and gemini comments are valid and should be addressed.

Rob12312368 and others added 2 commits January 24, 2026 08:09
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Tsao-Ching, Kao <56261402+Rob12312368@users.noreply.github.com>
Signed-off-by: Rob12312368 <rob12312368@gmail.com>
@Rob12312368
Copy link
Contributor Author

Hello @goutamvenkat-anyscale , I changed the code based on the suggestion. The CI failure does not seem to be related, though different this time.

@@ -1014,6 +1014,63 @@ async def __call__(self, x):
assert rows_same(result_df, expected_after_fix)


@pytest.mark.skipif(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please move these tests to python/ray/data/tests/expressions/test_arithmetic.py

Copy link
Contributor

@goutamvenkat-anyscale goutamvenkat-anyscale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, please just move the tests

@goutamvenkat-anyscale
Copy link
Contributor

Hi, the ci did not pass, but I think it is unrelated to my patch. Should I do anything to proceed? @goutamvenkat-anyscale

I just merged master into your branch. It should resolve

@goutamvenkat-anyscale goutamvenkat-anyscale added the go add ONLY when ready to merge, run all tests label Feb 6, 2026
Signed-off-by: Rob12312368 <rob12312368@gmail.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

def is_inf(self) -> "UDFExpr":
return _create_pyarrow_compute_udf(pc.is_inf, return_dtype=DataType.bool())(
self
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing docstrings on new public API methods

Low Severity

The new is_nan, is_finite, and is_inf methods are the only public methods on this class without docstrings. Every other method — including short ones like ceil, floor, and trunc — has at least a one-line docstring. This inconsistency makes these methods harder to discover and use, especially since they're part of a public API.

Fix in Cursor Fix in Web

Signed-off-by: Rob12312368 <rob12312368@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants