|
190 | 190 | " - **Bonus**: Calculate the standard deviation and explain what it tells you about the variability in monthly rainfall patterns" |
191 | 191 | ] |
192 | 192 | }, |
| 193 | + { |
| 194 | + "cell_type": "markdown", |
| 195 | + "id": "c0fa8e75", |
| 196 | + "metadata": {}, |
| 197 | + "source": [ |
| 198 | + "### Probability of rain calculation" |
| 199 | + ] |
| 200 | + }, |
193 | 201 | { |
194 | 202 | "cell_type": "code", |
195 | | - "execution_count": 7, |
| 203 | + "execution_count": null, |
196 | 204 | "id": "996e854d", |
197 | 205 | "metadata": {}, |
198 | 206 | "outputs": [ |
|
205 | 213 | } |
206 | 214 | ], |
207 | 215 | "source": [ |
208 | | - "# 1. Calculate probability of rain\n", |
209 | 216 | "rainy_days = len(df[df['weather_condition'] == 'Rainy'])\n", |
210 | 217 | "total_days = len(df)\n", |
211 | 218 | "p_rain = rainy_days / total_days\n", |
212 | 219 | "\n", |
213 | 220 | "print(f\"Based on our data, there's a {p_rain*100:.1f}% chance of rain on any given day\")" |
214 | 221 | ] |
215 | 222 | }, |
| 223 | + { |
| 224 | + "cell_type": "markdown", |
| 225 | + "id": "32786f54", |
| 226 | + "metadata": {}, |
| 227 | + "source": [ |
| 228 | + "### Binomial distribution calculation" |
| 229 | + ] |
| 230 | + }, |
216 | 231 | { |
217 | 232 | "cell_type": "code", |
218 | 233 | "execution_count": 8, |
|
226 | 241 | "probabilities = stats.binom.pmf(k_values, n_days, p_rain)" |
227 | 242 | ] |
228 | 243 | }, |
| 244 | + { |
| 245 | + "cell_type": "markdown", |
| 246 | + "id": "22f972af", |
| 247 | + "metadata": {}, |
| 248 | + "source": [ |
| 249 | + "### Binomial distribution plot" |
| 250 | + ] |
| 251 | + }, |
229 | 252 | { |
230 | 253 | "cell_type": "code", |
231 | 254 | "execution_count": 9, |
|
270 | 293 | "**Probability of 15+ rainy days**: We can calculate this using the cumulative distribution function." |
271 | 294 | ] |
272 | 295 | }, |
| 296 | + { |
| 297 | + "cell_type": "markdown", |
| 298 | + "id": "0ecbe5a9", |
| 299 | + "metadata": {}, |
| 300 | + "source": [ |
| 301 | + "### Probability of >= 15 rainy days" |
| 302 | + ] |
| 303 | + }, |
273 | 304 | { |
274 | 305 | "cell_type": "code", |
275 | | - "execution_count": 13, |
| 306 | + "execution_count": null, |
276 | 307 | "id": "5bc3a33e", |
277 | 308 | "metadata": {}, |
278 | 309 | "outputs": [ |
|
285 | 316 | } |
286 | 317 | ], |
287 | 318 | "source": [ |
288 | | - "# Calculate probability of 15 or more rainy days\n", |
289 | 319 | "prob_15_or_more = 1 - stats.binom.cdf(14, n_days, p_rain)\n", |
290 | 320 | "\n", |
291 | 321 | "print(f\"Probability of 15+ rainy days: {prob_15_or_more:.4f} ({prob_15_or_more*100:.2f}%)\")" |
292 | 322 | ] |
293 | 323 | }, |
| 324 | + { |
| 325 | + "cell_type": "markdown", |
| 326 | + "id": "981ac816", |
| 327 | + "metadata": {}, |
| 328 | + "source": [ |
| 329 | + "### Extra: Binomial cumulative distribution function (CDF) visualization" |
| 330 | + ] |
| 331 | + }, |
294 | 332 | { |
295 | 333 | "cell_type": "code", |
296 | 334 | "execution_count": 14, |
|
401 | 439 | " - **Bonus**: Repeat the experiment with different sample sizes (n=5, n=10, n=50). How does sample size affect the spread and normality of the sampling distribution?" |
402 | 440 | ] |
403 | 441 | }, |
| 442 | + { |
| 443 | + "cell_type": "markdown", |
| 444 | + "id": "c879505c", |
| 445 | + "metadata": {}, |
| 446 | + "source": [ |
| 447 | + "### Population distribution" |
| 448 | + ] |
| 449 | + }, |
404 | 450 | { |
405 | 451 | "cell_type": "code", |
406 | | - "execution_count": 16, |
| 452 | + "execution_count": null, |
407 | 453 | "id": "da2ffd4c", |
408 | 454 | "metadata": {}, |
409 | 455 | "outputs": [ |
|
427 | 473 | } |
428 | 474 | ], |
429 | 475 | "source": [ |
430 | | - "# 1. Examine population distribution\n", |
431 | 476 | "population_mean = df['rainfall_inches'].mean()\n", |
432 | 477 | "population_std = df['rainfall_inches'].std()\n", |
433 | 478 | "\n", |
|
449 | 494 | "The population distribution is highly right-skewed with many zero values (no rain) and a long tail of higher rainfall amounts." |
450 | 495 | ] |
451 | 496 | }, |
| 497 | + { |
| 498 | + "cell_type": "markdown", |
| 499 | + "id": "d2b5ce7c", |
| 500 | + "metadata": {}, |
| 501 | + "source": [ |
| 502 | + "### Sampling" |
| 503 | + ] |
| 504 | + }, |
452 | 505 | { |
453 | 506 | "cell_type": "code", |
454 | | - "execution_count": 17, |
| 507 | + "execution_count": null, |
455 | 508 | "id": "9b1cacd5", |
456 | 509 | "metadata": {}, |
457 | 510 | "outputs": [], |
458 | 511 | "source": [ |
459 | | - "# 2. Create sampling distribution\n", |
460 | 512 | "n_samples = 1000\n", |
461 | 513 | "sample_size = 30\n", |
462 | 514 | "sample_means = []\n", |
|
468 | 520 | "sample_means = np.array(sample_means)" |
469 | 521 | ] |
470 | 522 | }, |
| 523 | + { |
| 524 | + "cell_type": "markdown", |
| 525 | + "id": "af68a64a", |
| 526 | + "metadata": {}, |
| 527 | + "source": [ |
| 528 | + "### Sampling distribution plot" |
| 529 | + ] |
| 530 | + }, |
471 | 531 | { |
472 | 532 | "cell_type": "code", |
473 | | - "execution_count": 24, |
| 533 | + "execution_count": null, |
474 | 534 | "id": "6dd93618", |
475 | 535 | "metadata": {}, |
476 | 536 | "outputs": [ |
|
486 | 546 | } |
487 | 547 | ], |
488 | 548 | "source": [ |
489 | | - "# 3. Visualize sampling distribution\n", |
490 | 549 | "standard_error = population_std / np.sqrt(sample_size)\n", |
491 | 550 | "\n", |
492 | 551 | "# Normal curve\n", |
|
502 | 561 | "plt.show()" |
503 | 562 | ] |
504 | 563 | }, |
| 564 | + { |
| 565 | + "cell_type": "markdown", |
| 566 | + "id": "ebf6dae2", |
| 567 | + "metadata": {}, |
| 568 | + "source": [ |
| 569 | + "### Sampling distribution versus population comparison" |
| 570 | + ] |
| 571 | + }, |
505 | 572 | { |
506 | 573 | "cell_type": "code", |
507 | 574 | "execution_count": 23, |
|
0 commit comments