Skip to content

Make np.digitize unit-aware#644

Open
gaoflow wants to merge 1 commit into
yt-project:mainfrom
gaoflow:fix-633-digitize-units
Open

Make np.digitize unit-aware#644
gaoflow wants to merge 1 commit into
yt-project:mainfrom
gaoflow:fix-633-digitize-units

Conversation

@gaoflow

@gaoflow gaoflow commented Jun 2, 2026

Copy link
Copy Markdown

Closes #633.

Problem

np.digitize was listed in NOOP_FUNCTIONS with the note "returns pure numbers", on the assumption that because it returns plain indices it needs no unit handling. But digitize compares the data against the bin edges, so the units of both matter. With no handler the two were compared as raw numbers:

>>> np.digitize([1.1, 2.2, 3.3] * u.g, [0, 1, 2, 3] * u.kg)
array([2, 3, 4])   # wrong — all three values are < 1 kg, so should be array([1, 1, 1])

and dimensionally incompatible bins were silently accepted instead of raising:

>>> np.digitize([1.1, 2.2, 3.3] * u.g, [0, 1, 2, 3] * u.s)
array([2, 3, 4])   # should raise

Fix

Add an @implements(np.digitize) handler that converts the bins to the data arrays units before delegating to NumPy, reusing the existing _sanitize_bins helper that the histogram functions already use. Converting through the unit system means compatible units are reconciled and dimensionally incompatible bins raise UnitConversionError:

>>> np.digitize([1.1, 2.2, 3.3] * u.g, [0, 1, 2, 3] * u.kg)
array([1, 1, 1])
>>> np.digitize([1.1, 2.2, 3.3] * u.g, [0, 1, 2, 3] * u.s)
UnitConversionError: ...

np.digitize is removed from NOOP_FUNCTIONS accordingly (the test_wrapping_completeness meta-test enforces that handled and not-handled sets stay disjoint).

A unit-less bins argument is coerced to an array first so the existing dimensionless-data path keeps working (a plain Python list does not support the division done inside _sanitize_bins).

Tests

Added test_digitize_converts_bins, test_digitize_right, test_digitize_dimensionless_plain_bins and test_digitize_mixed_units. Full test_array_functions.py (494 passed) and the whole suite (785 passed) are green; ruff lint/format clean.

np.digitize was listed in NOOP_FUNCTIONS on the assumption that it only
returns plain indices, but it compares the data against the bin edges and
so must take units into account. Without a handler the bins were compared
as raw numbers, so e.g. digitizing grams against kilogram bins gave the
wrong indices, and dimensionally incompatible bins were silently accepted.

Add a handler that converts the bins to the data's units via the same
_sanitize_bins helper used by the histogram functions, which also raises
on incompatible dimensions. Closes yt-project#633.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

np.digitize does not convert units when required

1 participant