@@ -79,7 +79,7 @@ or sample.
79
79
:func: `median ` Median (middle value) of data.
80
80
:func: `median_low ` Low median of data.
81
81
:func: `median_high ` High median of data.
82
- :func: `median_grouped ` Median, or 50th percentile, of grouped data.
82
+ :func: `median_grouped ` Median ( 50th percentile) of grouped data.
83
83
:func: `mode ` Single mode (most common value) of discrete or nominal data.
84
84
:func: `multimode ` List of modes (most common values) of discrete or nominal data.
85
85
:func: `quantiles ` Divide data into intervals with equal probability.
@@ -329,55 +329,56 @@ However, for reading convenience, most of the examples show sorted sequences.
329
329
be an actual data point rather than interpolated.
330
330
331
331
332
- .. function :: median_grouped(data, interval=1)
332
+ .. function :: median_grouped(data, interval=1.0 )
333
333
334
- Return the median of grouped continuous data, calculated as the 50th
335
- percentile, using interpolation. If * data * is empty, :exc: ` StatisticsError `
336
- is raised. * data * can be a sequence or iterable .
334
+ Estimates the median for numeric data that has been ` grouped or binned
335
+ <https://en.wikipedia.org/wiki/Data_binning> `_ around the midpoints
336
+ of consecutive, fixed-width intervals .
337
337
338
- .. doctest ::
338
+ The *data * can be any iterable of numeric data with each value being
339
+ exactly the midpoint of a bin. At least one value must be present.
339
340
340
- >>> median_grouped([52 , 52 , 53 , 54 ])
341
- 52.5
341
+ The *interval * is the width of each bin.
342
342
343
- In the following example, the data are rounded, so that each value represents
344
- the midpoint of data classes, e.g. 1 is the midpoint of the class 0.5--1.5, 2
345
- is the midpoint of 1.5--2.5, 3 is the midpoint of 2.5--3.5, etc. With the data
346
- given, the middle value falls somewhere in the class 3.5--4.5, and
347
- interpolation is used to estimate it:
343
+ For example, demographic information may have been summarized into
344
+ consecutive ten-year age groups with each group being represented
345
+ by the 5-year midpoints of the intervals:
348
346
349
347
.. doctest ::
350
348
351
- >>> median_grouped([1 , 2 , 2 , 3 , 4 , 4 , 4 , 4 , 4 , 5 ])
352
- 3.7
353
-
354
- Optional argument *interval * represents the class interval, and defaults
355
- to 1. Changing the class interval naturally will change the interpolation:
349
+ >>> from collections import Counter
350
+ >>> demographics = Counter({
351
+ ... 25 : 172 , # 20 to 30 years old
352
+ ... 35 : 484 , # 30 to 40 years old
353
+ ... 45 : 387 , # 40 to 50 years old
354
+ ... 55 : 22 , # 50 to 60 years old
355
+ ... 65 : 6 , # 60 to 70 years old
356
+ ... })
357
+ ...
358
+
359
+ The 50th percentile (median) is the 536th person out of the 1071
360
+ member cohort. That person is in the 30 to 40 year old age group.
361
+
362
+ The regular :func: `median ` function would assume that everyone in the
363
+ tricenarian age group was exactly 35 years old. A more tenable
364
+ assumption is that the 484 members of that age group are evenly
365
+ distributed between 30 and 40. For that, we use
366
+ :func: `median_grouped `:
356
367
357
368
.. doctest ::
358
369
359
- >>> median_grouped([1 , 3 , 3 , 5 , 7 ], interval = 1 )
360
- 3.25
361
- >>> median_grouped([1 , 3 , 3 , 5 , 7 ], interval = 2 )
362
- 3.5
363
-
364
- This function does not check whether the data points are at least
365
- *interval * apart.
366
-
367
- .. impl-detail ::
368
-
369
- Under some circumstances, :func: `median_grouped ` may coerce data points to
370
- floats. This behaviour is likely to change in the future.
371
-
372
- .. seealso ::
370
+ >>> data = list (demographics.elements())
371
+ >>> median(data)
372
+ 35
373
+ >>> round (median_grouped(data, interval = 10 ), 1 )
374
+ 37.5
373
375
374
- * "Statistics for the Behavioral Sciences", Frederick J Gravetter and
375
- Larry B Wallnau (8th Edition).
376
+ The caller is responsible for making sure the data points are separated
377
+ by exact multiples of *interval *. This is essential for getting a
378
+ correct result. The function does not check this precondition.
376
379
377
- * The `SSMEDIAN
378
- <https://help.gnome.org/users/gnumeric/stable/gnumeric.html#gnumeric-function-SSMEDIAN> `_
379
- function in the Gnome Gnumeric spreadsheet, including `this discussion
380
- <https://mail.gnome.org/archives/gnumeric-list/2011-April/msg00018.html> `_.
380
+ Inputs may be any numeric type that can be coerced to a float during
381
+ the interpolation step.
381
382
382
383
383
384
.. function :: mode(data)
0 commit comments