Fix bug in `Series.describe` where the median is included any time the `percentiles` argument is not None #61158

MartinBraquet · 2025-03-21T03:03:30Z

closes ENH: Passing a single value to .describe(percentiles = [0.25]) returns 25th- and 50th-percentile #60550 (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

This PR aims at fixing the bug mentioned in the issue referenced above. Multiple PRs (#61024, #61023 , #60986 , #60557) have already attempted to resolve it, yet all of them got closed or stale for more than 2 weeks. So I allowed myself to apply here all the comments in the issue and previous PRs.

One last thing left hanging pertains to backward incompatibility. The only case in which the result will change is when percentiles is a list that does not contain 0.5. Before the PR, it adds 0.5 to the result; after the PR, it does not include it.
If we add a warning message in such case, it would be considered as:

useful for the people who read the median even though they didn't include it in the list of percentiles
spamming / undesirable for the rest

So, what is pandas' policy regarding warning messages and its potentially large number of false alerts?

…n the `percentiles` argument is passed

MartinBraquet · 2025-03-21T03:12:21Z

pandas/io/formats/format.py

+    if len(percentiles) == 0:
+        return []
+


This is backward-compatible as it is only extending the range of values that the input parameter can take.

mroeschke · 2025-03-21T15:56:03Z

pandas/core/generic.py

-            fall between 0 and 1. The default is
-            ``[.25, .5, .75]``, which returns the 25th, 50th, and
-            75th percentiles.
+            fall between 0 and 1. Here are the options:
+
+            - A list-like of numbers : To include the percentiles listed. If
+              that list is empty, no percentiles will be returned.
+            - None (default) : To include the default percentiles, which are the
+              25th, 50th, and 75th ones.


I think you can just keep the old description but clarify The default, ``None``, will automatically return the 25th, 50th, and 75th percentiles.

mroeschke · 2025-03-21T15:57:13Z

pandas/tests/frame/methods/test_describe.py

+                **{f"{p:.0%}": df.a.quantile(p) for p in percentiles},
+                "max": df.a.max(),
+            },
+        ).to_frame(name="a")


Can you just create expected = DataFrame(...) instead of using to_frame?

Is the new formulation appropriate? I passed index=['a'] and had to transpose. Maybe there is a cleaner way such as to avoid the transpose?

You can do:

DataFrame( [len(df.a), df.a.mean(), ..., **[df.a.quantile(p) for p in percentiles], df.a.max()], index=pd.Index(["count", ..., **[f"{p:.0%}" for p in percentiles], ...]), column=["a"] )

Good point, although not sure it's really cleaner, as the list exhaustion is duplicated and this removes the clarity of the mapping from key to value. Is it fine as currently is, or would you rather prefer to have it as you mentioned just above?

I would prefer the suggested method to not exercise other extraneous pandas APIs in this test like transpose or to_frame

That makes total sense; will update it right now.

doc/source/whatsnew/v3.0.0.rst

Co-authored-by: Matthew Roeschke <[email protected]>

MartinBraquet · 2025-03-21T16:22:56Z

@mroeschke Thanks for the feedback! I applied your comments. Lmk if there's anything else to update.

mroeschke · 2025-03-21T21:13:10Z

Thanks @MartinBraquet

MartinBraquet added 2 commits March 21, 2025 09:32

Fix bug in ~Series.describe where median percentile is included whe…

e3b0b5d

…n the `percentiles` argument is passed

Refine docstrings

8bb3cf3

MartinBraquet mentioned this pull request Mar 21, 2025

ENH: Passing a single value to .describe(percentiles = [0.25]) returns 25th- and 50th-percentile #60550

Closed

3 tasks

MartinBraquet commented Mar 21, 2025

View reviewed changes

Update test_describe in groupby

a0a0c63

mroeschke reviewed Mar 21, 2025

View reviewed changes

doc/source/whatsnew/v3.0.0.rst Outdated Show resolved Hide resolved

mroeschke added the Series Series data structure label Mar 21, 2025

MartinBraquet and others added 2 commits March 21, 2025 23:14

Minor fixes

bf1effa

Update doc/source/whatsnew/v3.0.0.rst

28756ad

Co-authored-by: Matthew Roeschke <[email protected]>

MartinBraquet requested a review from mroeschke March 21, 2025 16:18

MartinBraquet added 2 commits March 22, 2025 02:01

Refactor expected df to avoid transpose

5ed786c

Merge remote-tracking branch 'origin/describe' into describe

c57dabb

mroeschke added this to the 3.0 milestone Mar 21, 2025

mroeschke approved these changes Mar 21, 2025

View reviewed changes

mroeschke merged commit dc8401a into pandas-dev:main Mar 21, 2025
42 checks passed

MartinBraquet deleted the describe branch March 22, 2025 05:38

TomAugspurger mentioned this pull request Mar 24, 2025

DataFrame.describe(percentiles=[]) still returns 50% percentile. #11866

Closed

This was referenced Mar 24, 2025

ENH: Passing a single value to .describe(percentiles = [0.25]) returns 25th- and 50th-percentile #61109

Closed

Update describe.py #60986

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix bug in `Series.describe` where the median is included any time the `percentiles` argument is not None #61158

Fix bug in `Series.describe` where the median is included any time the `percentiles` argument is not None #61158

Uh oh!

MartinBraquet commented Mar 21, 2025 •

edited

Loading

Uh oh!

MartinBraquet Mar 21, 2025

Uh oh!

mroeschke Mar 21, 2025

Uh oh!

mroeschke Mar 21, 2025

Uh oh!

MartinBraquet Mar 21, 2025

Uh oh!

mroeschke Mar 21, 2025

Uh oh!

MartinBraquet Mar 21, 2025 •

edited

Loading

Uh oh!

mroeschke Mar 21, 2025

Uh oh!

MartinBraquet Mar 21, 2025

Uh oh!

Uh oh!

MartinBraquet commented Mar 21, 2025

Uh oh!

Uh oh!

mroeschke commented Mar 21, 2025

Uh oh!

Uh oh!

Uh oh!

Fix bug in Series.describe where the median is included any time the percentiles argument is not None #61158

Fix bug in Series.describe where the median is included any time the percentiles argument is not None #61158

Uh oh!

Conversation

MartinBraquet commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MartinBraquet Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

mroeschke Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

mroeschke Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

MartinBraquet Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

mroeschke Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

MartinBraquet Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mroeschke Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

MartinBraquet Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MartinBraquet commented Mar 21, 2025

Uh oh!

Uh oh!

mroeschke commented Mar 21, 2025

Uh oh!

Uh oh!

Fix bug in `Series.describe` where the median is included any time the `percentiles` argument is not None #61158

Fix bug in `Series.describe` where the median is included any time the `percentiles` argument is not None #61158

MartinBraquet commented Mar 21, 2025 •

edited

Loading

MartinBraquet Mar 21, 2025 •

edited

Loading