Skip to content

Commit 84f8291

Browse files
committed
Change sorts to stable instead of mergesort
1 parent 6b4a45d commit 84f8291

File tree

3 files changed

+10
-7
lines changed

3 files changed

+10
-7
lines changed

doc/source/whatsnew/v3.0.0.rst

+3
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,7 @@ Other enhancements
6868
- :meth:`DataFrameGroupBy.transform`, :meth:`SeriesGroupBy.transform`, :meth:`DataFrameGroupBy.agg`, :meth:`SeriesGroupBy.agg`, :meth:`RollingGroupby.apply`, :meth:`ExpandingGroupby.apply`, :meth:`Rolling.apply`, :meth:`Expanding.apply`, :meth:`DataFrame.apply` with ``engine="numba"`` now supports positional arguments passed as kwargs (:issue:`58995`)
6969
- :meth:`Rolling.agg`, :meth:`Expanding.agg` and :meth:`ExponentialMovingWindow.agg` now accept :class:`NamedAgg` aggregations through ``**kwargs`` (:issue:`28333`)
7070
- :meth:`Series.map` can now accept kwargs to pass on to func (:issue:`59814`)
71+
- :meth:`Series.nlargest` has improved performance when there are duplicate values in the index (:issue:`55767`)
7172
- :meth:`Series.str.get_dummies` now accepts a ``dtype`` parameter to specify the dtype of the resulting DataFrame (:issue:`47872`)
7273
- :meth:`pandas.concat` will raise a ``ValueError`` when ``ignore_index=True`` and ``keys`` is not ``None`` (:issue:`59274`)
7374
- :py:class:`frozenset` elements in pandas objects are now natively printed (:issue:`60690`)
@@ -148,6 +149,8 @@ These improvements also fixed certain bugs in groupby:
148149
- :meth:`.DataFrameGroupBy.sum` would have incorrect values when there are multiple groupings, unobserved groups, and non-numeric data (:issue:`43891`)
149150
- :meth:`.DataFrameGroupBy.value_counts` would produce incorrect results when used with some categorical and some non-categorical groupings and ``observed=False`` (:issue:`56016`)
150151

152+
- :meth:`Series.nlargest`
153+
151154
.. _whatsnew_300.notable_bug_fixes.notable_bug_fix2:
152155

153156
notable_bug_fix2

pandas/core/methods/selectn.py

+5-5
Original file line numberDiff line numberDiff line change
@@ -119,18 +119,18 @@ def compute(self, method: str) -> Series:
119119
original_index: Index = self.obj.index
120120
cur_series = self.obj.reset_index(drop=True)
121121

122-
dropped = cur_series.dropna()
123-
nan_index = cur_series.drop(dropped.index)
124-
125122
# slow method
126123
if n >= len(cur_series):
127124
ascending = method == "nsmallest"
128125
final_series = cur_series.sort_values(
129-
ascending=ascending, kind="mergesort"
126+
ascending=ascending, kind="stable"
130127
).head(n)
131128
final_series.index = original_index.take(final_series.index)
132129
return final_series
133130

131+
dropped = cur_series.dropna()
132+
nan_index = cur_series.drop(dropped.index)
133+
134134
# fast method
135135
new_dtype = dropped.dtype
136136

@@ -291,4 +291,4 @@ def get_indexer(current_indexer: Index, other_indexer: Index) -> Index:
291291

292292
ascending = method == "nsmallest"
293293

294-
return frame.sort_values(columns, ascending=ascending, kind="mergesort")
294+
return frame.sort_values(columns, ascending=ascending, kind="stable")

pandas/tests/frame/methods/test_nlargest.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -153,11 +153,11 @@ def test_nlargest_n_duplicate_index(self, n, order, request):
153153
index=[0, 0, 1, 1, 1],
154154
)
155155
result = df.nsmallest(n, order)
156-
expected = df.sort_values(order, kind="mergesort").head(n)
156+
expected = df.sort_values(order, kind="stable").head(n)
157157
tm.assert_frame_equal(result, expected)
158158

159159
result = df.nlargest(n, order)
160-
expected = df.sort_values(order, ascending=False, kind="mergesort").head(n)
160+
expected = df.sort_values(order, ascending=False, kind="stable").head(n)
161161
if Version(np.__version__) >= Version("1.25") and (
162162
(order == ["a"] and n in (1, 2, 3, 4)) or ((order == ["a", "b"]) and n == 5)
163163
):

0 commit comments

Comments
 (0)