-
Notifications
You must be signed in to change notification settings - Fork 60
Percentile 1 method
The percentile method is used to find a percentile value in the dataset. This algorithm is also used by PERCENTILE or PERCENTILE.INC functions in Microsoft Excel and PERCENTILE function in Google Docs Sheets. This method is the same as the 7th sample quantile method from the Hyndman and Fan paper (1996).
The function receives two parameters:
-
values: array of values in the dataset. -
percentile: percentile between 0 and 1 inclusive.
For example, suppose we want to calculate the 40th percentile for the following measurements: 35, 20, 50, 40, 15. We call the percentile and pass the measurements and the percentile as decimal value 0.4.
Sigma.percentile([35, 20, 50, 40, 15], percentile: 0.4)
// Result: 29We use the following algorithm to calculate the percentile value:
Firstly, we sort the dataset from lowest to highest values.
sortedValues = [15, 20, 35, 40, 50]
Secondly, we find the rank of 40th percentile. Rank is the position of an element in the dataset. For example, rank 1 is the first element, 2 is the second. This value can be a decimal, for example, 3.35 will be used to find the value between the third and fourth elements.
Equation 1
rank = percentile * (count - 1) + 1
Where:
-
percentileis the percentile argument of the function. -
countis the size of the dataset which is equal to the size of the array passed as values argument.
We substitute the arguments into Equation 1:
rank = 0.4 * (5 - 1) = 2.6
Next we calculate the integer and fractional parts of the rank. The integer part of 2.6 is 2 and the fractional part of 2.6 is 0.6.
rankInteger = 2
rankFraction = 0.6
Then, we look at the ordered dataset {15, 20, 35, 40, 50} and find the element corresponding to the rankInteger and rankInteger + 1. In our example, rankInteger is 2, therefore, we need to find the second and third elements which are 20 and 35.
elementValue = 20
elementPlusOneValue = 35
Finally, we calculate the resulting percentile value by interpolating between elementValue and elementPlusOneValue values according to the rankFraction.
percentileValue = elementValue + rankFraction * (elementPlusOneValue - elementValue)
percentileValue = 20 + 0.6 * (35 - 20) = 29
The algorithm is undefined and returns nil in the following situations.
- The supplied
valuesarray is empty. - The supplied
percentilevalue is negative or greater than 1.
- The
percentileargument of 0 will return the minimum value in the dataset. - The
percentileargument of 0.5 returns the median value. - The
percentileargument of 1 returns the maximum value from the dataset.
The algorithm described above found the 40th percentile to be 29 for the set of measurements: 35, 20, 50, 40, 15.
- Percentiles from NIST/SEMATECH e-Handbook of Statistical Methods.
- Percentile Wikipedia article.