@@ -1089,7 +1089,7 @@ The final prediction goes to the largest posterior. This is known as the
1089
1089
Kernel density estimation
1090
1090
*************************
1091
1091
1092
- It is possible to estimate a continuous probability density function
1092
+ It is possible to estimate a continuous probability distribution
1093
1093
from a fixed number of discrete samples.
1094
1094
1095
1095
The basic idea is to smooth the data using `a kernel function such as a
@@ -1100,14 +1100,27 @@ which is called the *bandwidth*.
1100
1100
1101
1101
.. testcode ::
1102
1102
1103
- def kde_normal(sample, h):
1104
- "Create a continuous probability density function from a sample."
1105
- # Smooth the sample with a normal distribution kernel scaled by h.
1106
- kernel_h = NormalDist(0.0, h).pdf
1107
- n = len(sample)
1103
+ from random import choice, random
1104
+
1105
+ def kde_normal(data, h):
1106
+ "Create a continuous probability distribution from discrete samples."
1107
+
1108
+ # Smooth the data with a normal distribution kernel scaled by h.
1109
+ K_h = NormalDist(0.0, h)
1110
+
1108
1111
def pdf(x):
1109
- return sum(kernel_h(x - x_i) for x_i in sample) / n
1110
- return pdf
1112
+ 'Probability density function. P(x <= X < x+dx) / dx'
1113
+ return sum(K_h.pdf(x - x_i) for x_i in data) / len(data)
1114
+
1115
+ def cdf(x):
1116
+ 'Cumulative distribution function. P(X <= x)'
1117
+ return sum(K_h.cdf(x - x_i) for x_i in data) / len(data)
1118
+
1119
+ def rand():
1120
+ 'Random selection from the probability distribution.'
1121
+ return choice(data) + K_h.inv_cdf(random())
1122
+
1123
+ return pdf, cdf, rand
1111
1124
1112
1125
`Wikipedia has an example
1113
1126
<https://en.wikipedia.org/wiki/Kernel_density_estimation#Example> `_
@@ -1117,15 +1130,38 @@ a probability density function estimated from a small sample:
1117
1130
.. doctest ::
1118
1131
1119
1132
>>> sample = [- 2.1 , - 1.3 , - 0.4 , 1.9 , 5.1 , 6.2 ]
1120
- >>> f_hat = kde_normal(sample, h = 1.5 )
1133
+ >>> pdf, cdf, rand = kde_normal(sample, h = 1.5 )
1121
1134
>>> xarr = [i/ 100 for i in range (- 750 , 1100 )]
1122
- >>> yarr = [f_hat (x) for x in xarr]
1135
+ >>> yarr = [pdf (x) for x in xarr]
1123
1136
1124
1137
The points in ``xarr `` and ``yarr `` can be used to make a PDF plot:
1125
1138
1126
1139
.. image :: kde_example.png
1127
1140
:alt: Scatter plot of the estimated probability density function.
1128
1141
1142
+ `Resample <https://en.wikipedia.org/wiki/Resampling_(statistics) >`_
1143
+ the data to produce 100 new selections:
1144
+
1145
+ .. doctest ::
1146
+
1147
+ >>> new_selections = [rand() for i in range (100 )]
1148
+
1149
+ Determine the probability of a new selection being below ``2.0 ``:
1150
+
1151
+ .. doctest ::
1152
+
1153
+ >>> round (cdf(2.0 ), 4 )
1154
+ 0.5794
1155
+
1156
+ Add a new sample data point and find the new CDF at ``2.0 ``:
1157
+
1158
+ .. doctest ::
1159
+
1160
+ >>> sample.append(4.9 )
1161
+ >>> round (cdf(2.0 ), 4 )
1162
+ 0.5005
1163
+
1164
+
1129
1165
..
1130
1166
# This modelines must appear within the last ten lines of the file.
1131
1167
kate: indent-width 3; remove-trailing-space on; replace-tabs on; encoding utf-8;
0 commit comments