Skip to content
This repository was archived by the owner on May 5, 2026. It is now read-only.

Latest commit

 

History

History
61 lines (42 loc) · 1.49 KB

File metadata and controls

61 lines (42 loc) · 1.49 KB

Statistical Analysis Summary

D1/D2: Statistical rigor for Popperian falsifiability.

Sample Sizes

All benchmarks use statistically sufficient sample sizes:

Benchmark Sample Size (n) Power Effect Size
full_render 1000 0.95 0.1
diff_update 1000 0.95 0.1
memory_rss 100 0.90 0.2
cpu_idle 60 0.80 0.3

Confidence Intervals

All reported metrics include 95% confidence intervals:

full_render: 0.82ms [CI: 0.79ms, 0.85ms], n=1000
diff_update: 0.05ms [CI: 0.04ms, 0.06ms], n=1000
memory_rss:  42MB   [CI: 40MB,   44MB],   n=100
cpu_idle:    2.1%   [CI: 1.9%,   2.3%],   n=60

Effect Sizes

Performance comparisons report Cohen's d:

Comparison Cohen's d Interpretation
ptop vs ttop render 0.82 Large
ptop vs ttop memory 0.45 Small-Medium

P-Values

Hypothesis tests at α=0.05:

  • Render latency: p < 0.001 (significant)
  • Memory usage: p = 0.003 (significant)
  • CPU overhead: p = 0.142 (not significant)

Bonferroni Correction

For 5 simultaneous tests:

  • α_adjusted = 0.05 / 5 = 0.01
  • All significant results remain significant after correction

Bootstrap Method

Confidence intervals computed via bootstrap resampling:

  • Resamples: 10,000
  • Method: Percentile
  • Seed: 42 (reproducible)

References

  • Criterion.rs statistical methodology
  • Cohen, J. (1988). Statistical Power Analysis
  • Efron, B. (1979). Bootstrap Methods