PERF: Big slowdown of searchsorted on pd.Series#65840
Conversation
| """ | ||
| if self._hasna: | ||
| ndarray = self._ndarray | ||
| if len(ndarray) and libmissing.checknull(ndarray[-1]): |
There was a problem hiding this comment.
While NA values do generally get sorted to the end, they can also be sorted to the front via na_position. I think we should consider either front or back as being sorted, so need to also check the front of the array.
Can you also add a test for this.
There was a problem hiding this comment.
@rhshadrach Thank you for review! Added code changes and tests that account for na_position.
rhshadrach
left a comment
There was a problem hiding this comment.
lgtm
@jbrockmendel - this PR replaces the O(n) check for NA values with an O(1) under the assumption that the input is sorted; this assumption is already being made about the non-NA values. Are you okay with that?
|
Yes |
| ndarray = self._ndarray | ||
| if len(ndarray) and ( | ||
| libmissing.checknull(ndarray[0]) or libmissing.checknull(ndarray[-1]) | ||
| ): |
There was a problem hiding this comment.
comment about why this isn't using self._hasna?
doc/source/whatsnew/vX.X.X.rstfile if fixing a bug or adding a new feature.AGENTS.md.