Fix/negative values#17
Merged
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes incorrect rank-sum/tie handling in sparse MWU routines when non-zero values can be negative by computing the true insertion position of implicit zeros and adjusting rank sums accordingly.
Changes:
- Extend ranking utilities (Rust + Numba) to account for implicit zeros via a
zero_values_offsetand returnzero_pos. - Update sparse OVR/OVO kernels to use
zero_poswhen adding zero contributions to rank sums (instead of assuming zeros come first). - Expand test coverage/fixtures to include negative-value scenarios; bump version + changelog entry.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
src/ranking.rs |
Adds zero-offset support and returns zero_pos from ranking helpers. |
src/sparse_ovr.rs |
Uses returned zero_pos to correctly add zero rank contributions for sparse OVR. |
src/sparse_ovo.rs |
Uses zero_pos to compute zero ranks correctly for sparse OVO. |
src/dense_ovr.rs |
Adapts to new return type from accumulate_rank_and_tie_sums_from_argsort (currently with a compile issue). |
src/dense_ovo.rs |
Updates call site for new rank_sum_and_ties signature. |
illico/utils/ranking.py |
Adds zero-offset handling and returns zero_pos from Numba ranking helpers. |
illico/ovr/sparse_ovr.py |
Adjusts sparse OVR zero rank contribution using zero_pos. |
illico/ovr/dense_ovr.py |
Updates unpacking for new _accumulate_group_ranksums_from_argsort return. |
illico/ovo/sparse_ovo.py |
Adjusts sparse OVO zero rank contribution using zero_pos. |
illico/ovo/dense_ovo.py |
Updates unpacking for new rank_sum_and_ties_from_sorted return. |
tests/utils/test_ranking.py |
Adds parametrized tests covering dense vs sparse (implicit zeros) with negative values. |
tests/conftest.py |
Modifies randomized fixture to include negative-value columns/group patterns. |
pyproject.toml |
Bumps project version to 0.5.1. |
changelog.md |
Documents the negative-values sparse bugfix in 0.5.1. |
Comments suppressed due to low confidence (1)
illico/utils/ranking.py:84
- The type hint/docstring for this function still suggests it returns a 2-tuple (and even
np.ndarray), but it now returns three scalars:(ranksum_B, tie_sum, zero_pos). Please update the return annotation and the "Returns" section of the docstring to reflect the new API (e.g.,tuple[float, float, int]).
@njit(nogil=True)
def rank_sum_and_ties_from_sorted(A: np.ndarray, B: np.ndarray, zero_values_offset: int = 0) -> tuple[np.ndarray]:
"""Compute rank sums and tie sums from two 1-d sorted arrays.
This routine is similar to the leetcode "merge two sorted arrays", except it
never returns to sorted array, instead it accumulate rank sums of the second array
and tie sums for the combined arrays.
This routine sits at the core of the one-versus-one (or one-versus-control) asymptotic
wilcoxon rank sum test as it allows to sort controls only once.
Args:
A (np.ndarray): The first sorted array (controls)
B (np.ndarray): The second sorted array (perturbed)
zero_values_offset (int): If > 0, it means that there are zeros not present in the input arrays but that
they should be accounted for. This is only used when the input adata is sparse, and ranksum is computed
on non zero values.
Returns:
tuple[np.ndarray]: Ranks sum from the second array, and tie sums for the combined
arrays.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR fixes a bug in the sparse routines when input data contains negative values. Now, instead of artificially adding ranksum of zero values considering zeros are positioned before any other value, zero position is computed and ranksums is adjusted accordingly, cf this change