Skip to content

Commit c830051

Browse files
committed
chore: code snippets
1 parent 42f9c06 commit c830051

File tree

1 file changed

+192
-0
lines changed

1 file changed

+192
-0
lines changed

_posts/2024-10-28-understanding normality tests a deep dive into their power and limitations.md

Lines changed: 192 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,9 @@ keywords:
1717
- Statistics
1818
- Data Analysis
1919
- QQ Plots
20+
- python
21+
- r
22+
- ruby
2023
seo_description: An in-depth exploration of normality tests, their limitations, and the importance of visual inspection for assessing whether data follow a normal distribution.
2124
seo_title: 'Understanding Normality Tests: A Deep Dive'
2225
seo_type: article
@@ -25,6 +28,9 @@ tags:
2528
- Normality Tests
2629
- Statistical Methods
2730
- Data Visualization
31+
- python
32+
- r
33+
- ruby
2834
title: 'Understanding Normality Tests: A Deep Dive into Their Power and Limitations'
2935
---
3036

@@ -289,3 +295,189 @@ Assessing normality is a nuanced process that requires more than a one-size-fits
289295
6. **Understand the Limitations:** Be aware of sample size effects, test assumptions, and the potential for Type II errors.
290296

291297
In practice, a comprehensive approach that combines statistical tests with visual inspection and a thorough understanding of the data will lead to more robust and reliable conclusions. By appreciating the strengths and limitations of various normality tests, statisticians and data analysts can make informed decisions that enhance the quality of their analyses.
298+
299+
## Appendix: Python Code for Normality Tests
300+
301+
```python
302+
# Import necessary libraries
303+
import numpy as np
304+
import scipy.stats as stats
305+
import matplotlib.pyplot as plt
306+
import seaborn as sns
307+
308+
# Generate the special bimodal distribution
309+
def generate_bimodal_distribution(size=1000):
310+
mean1, mean2 = 0, 3
311+
std1, std2 = 1, 0.5
312+
data1 = np.random.normal(mean1, std1, size // 2)
313+
data2 = np.random.normal(mean2, std2, size // 2)
314+
return np.concatenate([data1, data2])
315+
316+
# Generate data
317+
data = generate_bimodal_distribution()
318+
319+
# Plot QQ plot
320+
plt.figure(figsize=(8, 6))
321+
stats.probplot(data, dist="norm", plot=plt)
322+
plt.title('QQ Plot')
323+
plt.show()
324+
325+
# Plot empirical CDF
326+
plt.figure(figsize=(8, 6))
327+
sns.ecdfplot(data, label='Empirical CDF')
328+
x = np.linspace(min(data), max(data), 1000)
329+
plt.plot(x, stats.norm.cdf(x, np.mean(data), np.std(data)), label='Theoretical CDF', linestyle='--')
330+
plt.title('Empirical CDF vs Theoretical CDF')
331+
plt.legend()
332+
plt.show()
333+
334+
# Shapiro-Wilk test
335+
shapiro_stat, shapiro_p = stats.shapiro(data)
336+
print(f"Shapiro-Wilk Test: W = {shapiro_stat}, p-value = {shapiro_p}")
337+
338+
# Kolmogorov-Smirnov test
339+
ks_stat, ks_p = stats.kstest(data, 'norm', args=(np.mean(data), np.std(data)))
340+
print(f"Kolmogorov-Smirnov Test: D = {ks_stat}, p-value = {ks_p}")
341+
342+
# Anderson-Darling test
343+
ad_result = stats.anderson(data, dist='norm')
344+
print(f"Anderson-Darling Test: A² = {ad_result.statistic}, critical values = {ad_result.critical_values}")
345+
346+
# Jarque-Bera test
347+
jb_stat, jb_p = stats.jarque_bera(data)
348+
print(f"Jarque-Bera Test: JB = {jb_stat}, p-value = {jb_p}")
349+
350+
# Geary's Kurtosis (using MAD and Standard Deviation)
351+
mad = np.median(np.abs(data - np.median(data)))
352+
sd = np.std(data)
353+
geary_ratio = mad / sd
354+
print(f"Geary's Kurtosis: {geary_ratio}")
355+
```
356+
357+
# Appendix: R Code for Normality Tests
358+
359+
```r
360+
# Load necessary libraries
361+
library(MASS)
362+
library(nortest)
363+
library(moments)
364+
library(ggplot2)
365+
366+
# Generate the special bimodal distribution
367+
generate_bimodal_distribution <- function(size = 1000) {
368+
mean1 <- 0
369+
mean2 <- 3
370+
std1 <- 1
371+
std2 <- 0.5
372+
data1 <- rnorm(size / 2, mean = mean1, sd = std1)
373+
data2 <- rnorm(size / 2, mean = mean2, sd = std2)
374+
c(data1, data2)
375+
}
376+
377+
# Generate data
378+
data <- generate_bimodal_distribution()
379+
380+
# QQ Plot
381+
qqnorm(data, main = "QQ Plot")
382+
qqline(data, col = "blue")
383+
384+
# Empirical CDF vs Theoretical CDF
385+
ggplot(data.frame(x = data), aes(x)) +
386+
stat_ecdf(geom = "step", color = "blue") +
387+
stat_function(fun = pnorm, args = list(mean = mean(data), sd = sd(data)),
388+
color = "red", linetype = "dashed") +
389+
labs(title = "Empirical CDF vs Theoretical CDF")
390+
391+
# Shapiro-Wilk Test
392+
shapiro_test <- shapiro.test(data)
393+
print(paste("Shapiro-Wilk Test: W =", shapiro_test$statistic, ", p-value =", shapiro_test$p.value))
394+
395+
# Kolmogorov-Smirnov Test
396+
ks_test <- ks.test(data, "pnorm", mean(data), sd(data))
397+
print(paste("Kolmogorov-Smirnov Test: D =", ks_test$statistic, ", p-value =", ks_test$p.value))
398+
399+
# Anderson-Darling Test
400+
ad_test <- ad.test(data)
401+
print(paste("Anderson-Darling Test: A² =", ad_test$statistic, ", p-value =", ad_test$p.value))
402+
403+
# Jarque-Bera Test
404+
jb_test <- jarque.test(data)
405+
print(paste("Jarque-Bera Test: JB =", jb_test$statistic, ", p-value =", jb_test$p.value))
406+
407+
# Geary's Kurtosis (using MAD and Standard Deviation)
408+
mad <- mad(data)
409+
sd <- sd(data)
410+
geary_ratio <- mad / sd
411+
print(paste("Geary's Kurtosis: ", geary_ratio))
412+
```
413+
414+
# Appendix: Ruby Code for Normality Tests
415+
416+
```ruby
417+
# Load necessary libraries
418+
require 'distribution'
419+
require 'gnuplotrb'
420+
include GnuplotRB
421+
422+
# Generate the special bimodal distribution
423+
def generate_bimodal_distribution(size = 1000)
424+
mean1, mean2 = 0, 3
425+
std1, std2 = 1, 0.5
426+
data1 = Array.new(size / 2) { Distribution::Normal.rng(mean1, std1).call }
427+
data2 = Array.new(size / 2) { Distribution::Normal.rng(mean2, std2).call }
428+
data1 + data2
429+
end
430+
431+
# Generate data
432+
data = generate_bimodal_distribution
433+
434+
# QQ plot (using Gnuplot)
435+
x = Distribution::Normal.rng(0, 1).call
436+
qq_plot = Plot.new([x.sort, data.sort], with: 'points', title: 'QQ Plot', style: 'points')
437+
qq_plot.to_png('qq_plot.png')
438+
439+
# Empirical CDF vs Theoretical CDF (using Gnuplot)
440+
sorted_data = data.sort
441+
ecdf = sorted_data.each_with_index.map { |val, i| [val, (i + 1).to_f / sorted_data.size] }
442+
cdf_plot = Plot.new(
443+
[ecdf, with: 'lines', title: 'Empirical CDF'],
444+
[sorted_data, sorted_data.map { |x| Distribution::Normal.cdf(x, data.mean, data.standard_deviation) },
445+
with: 'lines', title: 'Theoretical CDF', style: 'dashed']
446+
)
447+
cdf_plot.to_png('cdf_plot.png')
448+
449+
# Shapiro-Wilk Test (using R integration through RinRuby)
450+
require 'rinruby'
451+
452+
R.eval <<-EOF
453+
shapiro_test <- shapiro.test(c(#{data.join(',')}))
454+
shapiro_stat <- shapiro_test$statistic
455+
shapiro_p_value <- shapiro_test$p.value
456+
EOF
457+
458+
puts "Shapiro-Wilk Test: W = #{R.shapiro_stat}, p-value = #{R.shapiro_p_value}"
459+
460+
# Kolmogorov-Smirnov Test
461+
ks_test = Distribution::Normal.kstest(data)
462+
puts "Kolmogorov-Smirnov Test: D = #{ks_test[:statistic]}, p-value = #{ks_test[:p_value]}"
463+
464+
# Anderson-Darling Test (using R integration)
465+
R.eval <<-EOF
466+
library(nortest)
467+
ad_test <- ad.test(c(#{data.join(',')}))
468+
ad_stat <- ad_test$statistic
469+
ad_p_value <- ad_test$p.value
470+
EOF
471+
472+
puts "Anderson-Darling Test: A² = #{R.ad_stat}, p-value = #{R.ad_p_value}"
473+
474+
# Jarque-Bera Test
475+
jb_test = Distribution::Normal.jarque_bera(data)
476+
puts "Jarque-Bera Test: JB = #{jb_test[:statistic]}, p-value = #{jb_test[:p_value]}"
477+
478+
# Geary's Kurtosis (using MAD and Standard Deviation)
479+
mad = data.map { |x| (x - data.median).abs }.median
480+
sd = Math.sqrt(data.map { |x| (x - data.mean) ** 2 }.sum / data.size)
481+
geary_ratio = mad / sd
482+
puts "Geary's Kurtosis: #{geary_ratio}"
483+
```

0 commit comments

Comments
 (0)