Make np.digitize unit-aware#644
Open
gaoflow wants to merge 1 commit into
Open
Conversation
np.digitize was listed in NOOP_FUNCTIONS on the assumption that it only returns plain indices, but it compares the data against the bin edges and so must take units into account. Without a handler the bins were compared as raw numbers, so e.g. digitizing grams against kilogram bins gave the wrong indices, and dimensionally incompatible bins were silently accepted. Add a handler that converts the bins to the data's units via the same _sanitize_bins helper used by the histogram functions, which also raises on incompatible dimensions. Closes yt-project#633.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #633.
Problem
np.digitizewas listed inNOOP_FUNCTIONSwith the note "returns pure numbers", on the assumption that because it returns plain indices it needs no unit handling. Butdigitizecompares the data against the bin edges, so the units of both matter. With no handler the two were compared as raw numbers:and dimensionally incompatible bins were silently accepted instead of raising:
Fix
Add an
@implements(np.digitize)handler that converts the bins to the data arrays units before delegating to NumPy, reusing the existing_sanitize_binshelper that the histogram functions already use. Converting through the unit system means compatible units are reconciled and dimensionally incompatible bins raiseUnitConversionError:np.digitizeis removed fromNOOP_FUNCTIONSaccordingly (thetest_wrapping_completenessmeta-test enforces that handled and not-handled sets stay disjoint).A unit-less
binsargument is coerced to an array first so the existing dimensionless-data path keeps working (a plain Python list does not support the division done inside_sanitize_bins).Tests
Added
test_digitize_converts_bins,test_digitize_right,test_digitize_dimensionless_plain_binsandtest_digitize_mixed_units. Fulltest_array_functions.py(494 passed) and the whole suite (785 passed) are green; ruff lint/format clean.