Skip to content

test: div by zero returns nan #2636

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

jrw34
Copy link

@jrw34 jrw34 commented Jun 3, 2025

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

  • The floordiv test is raising a 'div by zero' error with the pyarrow backend
  • This div by zero error is reproducible with the following code and indicates that dividing by zero will not return NaN when a floordiv is applied with the pyarrow backend
>>> import narwhals as nw
>>> import pyarrow as pa
>>> data = {"a":[3], "b":[0]}
>>> test_pa_table = pa.table(data)
>>> test_df = nw.from_native(test_pa_table)
>>> test_df["a"]  / / test_df["b"]

returns
pyarrow.lib.ArrowInvalid: divide by zero

@jrw34 jrw34 changed the title Add test for div by zero [test]: div by zero returns nan Jun 3, 2025
@jrw34 jrw34 changed the title [test]: div by zero returns nan test: div by zero returns nan Jun 3, 2025
@jrw34
Copy link
Author

jrw34 commented Jun 4, 2025

@FBruzzesi, given that the division by zero causes the pyarrow backend to error, should I incorporate that behavior into the tests? The original description of the issue stated that all divisions by zero returned NaN or None but that is not true for all backend cases (e.g. pyarrow and pandas[pyarrow]).

@FBruzzesi
Copy link
Member

Hey @jrw34 here is how I would go about this:

  • Add a test for expressions as well, in order to test lazy constructors as well
  • xfail pyarrow backends by adding:
    if "pyarrow" in str(constructor_eager):
        request.applymarker(pytest.mark.xfail)
  • Open an issue to track this inconsistency and eventually fix it - this was the main point for adding this such test i.e. figure out inconsistencies and eventually fix them

@jrw34
Copy link
Author

jrw34 commented Jun 4, 2025

@FBruzzesi, I avoided using assert_equal_data because for every division by zero there are at least three different outputs (depending upon the numerator [+n, 0, -n]). Due to this constraint I called the .collect() method on the LazyFrames, are there any problems with doing that? One alternative would be to create three different expected LazyFrames instead and use the assert_equal_data for each of those 3 conditions. I am happy to try that instead if you think it will be a more robust testing structure.

I will also document all of the inconsistencies and create a new issue after this PR is closed. Thank you very much for walking me through all of this, it has been a great experience thus far!

Copy link
Member

@FBruzzesi FBruzzesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jrw34 , sorry for the slow feedback, I have been sick in the last few days.

I left a couple of comments in the code. The new approach seems a bit too convoluted in my opinion but we are not too far from the end goal.

Regarding collect-ing, I left a suggestion for how to use assert_data_equal, which internally takes care of collecting whenever necessary

Comment on lines 173 to 176
truediv_result = df["a"] / df["b"] # truediv
floordiv_result = df["a"] // df["b"] # floordiv
assert truediv_result[0] in div_by_zero_results or truediv_result.is_nan()[0]
assert floordiv_result[0] in div_by_zero_results or floordiv_result.is_nan()[0]
Copy link
Member

@FBruzzesi FBruzzesi Jun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can skip any operation and check if a branch is xfailing

Comment on lines 179 to 180
assert truediv_result[0] in div_by_zero_results or truediv_result.is_nan()[0]
assert floordiv_result[0] in div_by_zero_results or floordiv_result.is_nan()[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about having an expected parameter together with left and right so that you can specify the exact expected output for each pair instead of having it generic? Then you can use assert_data_equal with manually constructed dicts:

assert_data_equal({"x": truediv_result}, {"x": [expected]}



@pytest.mark.parametrize(("left", "right"), [(-2, 0), (0, 0), (2, 0)])
def test_div_by_zero(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same ideas applied here, and using assert_data_equal should be even simpler since you can simply pass a frame as first object and that will take care of collecting when necessary

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That drastically simplified everything, thanks for all the comments. I just pushed my changes and ended up separating the tests by their respective operation.

@FBruzzesi
Copy link
Member

FBruzzesi commented Jun 8, 2025

Hey @jrw34 there are a few tests failing, from what I can see there are many edge cases and different behaviors, would you be interested in trying to align the behavior with the latest polars version?

I know it's a shift from the original issue and PR, but I feel somehow uneasy to know that (almost) each backend has a different output

@jrw34
Copy link
Author

jrw34 commented Jun 8, 2025

@FBruzzesi, the tests were all passing on my machine before I pushed this so I thought I was in the clear. I will resolve all of the failures before the next push.

For the polars behavior, I will add that in.

I also share the same sense of uneasiness pertaining to the discrepancies in backend behavior. It is interesting that floor division is causing all of the errors but not true division. I am currently documenting all of the differences for each backend and will include that in a new issue.

I imagine down the road it would be a nice feature to standardize the behavior of division by zero for all backends once they are ingested into Narwhals (everything either returns None or nan). Once this testing business gets settled I would be interesting in claiming that feature addition if it is something worth doing.

@jrw34
Copy link
Author

jrw34 commented Jun 9, 2025

@FBruzzesi, I included the polars specific behavior in the floordiv tests. Should I rebase down to a single commit before this gets merged?

@jrw34 jrw34 marked this pull request as ready for review June 9, 2025 15:57
@jrw34
Copy link
Author

jrw34 commented Jun 10, 2025

@FBruzzesi, how should I go about running the tests for all versions and platforms? When I run the test locally it passes but fails on many of the windows tests when it is on github. The failures are coming from the polars floordiv.

To provide a sanity check, I am attaching what it looks like when I run it locally below.

(narwhals-dev) *****@*****:~/narwhals-dev$ pytest tests/expr_and_series/arithmetic_test.py
================================================= test session starts ==================================================
platform linux -- Python 3.12.11, pytest-8.4.0, pluggy-1.6.0
Using --randomly-seed=1637843562
rootdir: /*****/*****/narwhals-dev
configfile: pyproject.toml
plugins: cov-6.1.1, randomly-3.16.0, env-1.1.5, hypothesis-6.135.4
collected 336 items

tests/expr_and_series/arithmetic_test.py .....x.........x........x..x...x.........................x......x...... [ 21%]
............x..x............x.x........................x...............................x.......s.............s.. [ 54%]
x........s...............x.....s..s..............s.....................................................x....x... [ 87%]
....xx.............xx...xx.............x.                                                                        [100%]

====================================== 306 passed, 6 skipped, 24 xfailed in 3.36s ======================================

@FBruzzesi
Copy link
Member

@jrw34 We used to have nox setup, but it has gone a bit out of sync (an attempt to get it back on track is #2358)
I would suggest to tweak that to run multiple python and package versions locally. Sadly this feature seems to be very version dependent, so kudos for starting with it

@jrw34
Copy link
Author

jrw34 commented Jun 10, 2025

@FBruzzesi, I will delve into running multiple versions/packages. It looks like skipping some pandas versions dwindled down the number of failures, so we are on the right track.

This has been an unexpected adventure after the very welcoming issue title, but it has definitely enhanced my appreciation for floor division by zero.

Also, would it be acceptable to add a conditional for pandas[nullable], because after reviewing the failures it appears that this constructor is likely the last edge case without coverage.
Example:

elif "pandas_nullable" in str(constructor):
     assert_equal_data(floordiv_result, {"a": [0]})

I feel as thought it will add even more bulk to test but it would help rule out more of the edge cases, let me know your thoughts!

@FBruzzesi
Copy link
Member

This has been an unexpected adventure after the very welcoming issue title, but it has definitely enhanced my appreciation for floor division by zero.

I can 100% related - I have been there sooo many times πŸ˜‚

would it be acceptable to add a conditional for pandas[nullable], because after reviewing the failures it appears that this constructor is likely the last edge case without coverage.

Sure! I think it's fine to catch all the cases, so that we know what we need to address afterwards


Also I forgot to address the following:

Should I rebase down to a single commit before this gets merged?

Don't worry about it, we will squash and merge as one commit at the end πŸ‘Œ

Comment on lines +195 to +197
elif all(x in str(constructor_eager) for x in ["pandas", "nullable"]):
floordiv_result = df["a"] // df["b"]
assert_equal_data({"a": floordiv_result}, {"a": [0]})
Copy link
Member

@FBruzzesi FBruzzesi Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcoGorelli this might be worth reporting upstream as well. WDYT? Is this expected?
Repro:

pd.Series([-1, 0, 1]).convert_dtypes("numpy_nullable")//0
Out[4]: 
0    0
1    0
2    0
dtype: Int64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test: add test for division by zero
3 participants