|
| 1 | +# StaMojo <!-- omit in toc --> |
| 2 | + |
| 3 | + |
| 4 | + |
| 5 | +A statistical computing library for [Mojo](https://www.modular.com/mojo), inspired by `scipy.stats` and `statsmodels` in Python. |
| 6 | + |
| 7 | +**[Repository on GitHub»](https://github.com/mojomath/stamojo)** | **[Discord channel»](https://discord.gg/3rGH87uZTk)** |
| 8 | + |
| 9 | +- [Overview](#overview) |
| 10 | +- [Status](#status) |
| 11 | +- [Background](#background) |
| 12 | +- [Installation](#installation) |
| 13 | +- [Examples](#examples) |
| 14 | +- [Architecture](#architecture) |
| 15 | +- [Roadmap](#roadmap) |
| 16 | +- [License](#license) |
| 17 | + |
| 18 | +## Overview |
| 19 | + |
| 20 | +StaMojo (Statistics + Mojo) brings comprehensive statistical computing to the Mojo ecosystem. The library is organized into two parts: |
| 21 | + |
| 22 | +| Part | Scope | Dependencies | |
| 23 | +| --------------------------------------------- | --------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- | |
| 24 | +| **Part I — Statistical Computing Foundation** | Special functions, distributions, descriptive statistics, hypothesis tests, correlation | Mojo stdlib only | |
| 25 | +| **Part II — Statistical Modeling** | OLS, GLM, logistic regression, and model diagnostics | [NuMojo](https://github.com/Mojo-Numerics-and-Algorithms-group/NuMojo) + [MatMojo](https://github.com/mojomath/matmojo) | |
| 26 | + |
| 27 | +Part I is **available now** with zero external dependencies. Part II will begin once the upstream linear-algebra ecosystem stabilizes on a compatible Mojo release. |
| 28 | + |
| 29 | +### Why a separate library? <!-- omit in toc --> |
| 30 | + |
| 31 | +In the Python ecosystem, `scipy` bundles statistics, optimization, signal processing, integration, interpolation, and more into one giant package. For Mojo, a modular approach is more appropriate: |
| 32 | + |
| 33 | +| Python package | Mojo equivalent | Focus | |
| 34 | +| ------------------------------- | -------------------------------------------------------------------------- | --------------------------------- | |
| 35 | +| `numpy` | [**NuMojo**](https://github.com/Mojo-Numerics-and-Algorithms-group/NuMojo) | N-dimensional arrays, basic math | |
| 36 | +| `decimal` / `mpmath` | [**DeciMojo**](https://github.com/mojomath/decimojo) | Arbitrary-precision arithmetic | |
| 37 | +| `numpy.linalg` / `scipy.linalg` | [**MatMojo**](https://github.com/mojomath/matmojo) | Linear algebra | |
| 38 | +| `scipy.stats` | **StaMojo** (distributions + tests) | Statistical distributions & tests | |
| 39 | +| `statsmodels` | **StaMojo** (models) | Statistical models & econometrics | |
| 40 | + |
| 41 | +Placing `scipy.stats`-like functionality and `statsmodels`-like regression in **one library** is intentional: regression models inherently depend on distribution functions (for p-values, confidence intervals, etc.), so co-locating them avoids circular dependencies, simplifies versioning, and provides a cohesive API. |
| 42 | + |
| 43 | +## Status |
| 44 | + |
| 45 | +**v0.1** — Part I is complete and ready for use. The current release provides: |
| 46 | + |
| 47 | +| Category | Functions | |
| 48 | +| ----------------- | ----------------------------------------------------------------------------------------------- | |
| 49 | +| Special functions | `gammainc`, `gammaincc`, `beta`, `lbeta`, `betainc`, `erfinv`, `ndtri` | |
| 50 | +| Distributions | `Normal`, `StudentT`, `ChiSquared`, `FDist` — each with PDF, log-PDF, CDF, SF, PPF, `rvs` | |
| 51 | +| Descriptive stats | `mean`, `variance`, `std`, `median`, `quantile`, `skewness`, `kurtosis`, `data_min`, `data_max` | |
| 52 | +| Correlation | `pearsonr`, `spearmanr`, `kendalltau` (with p-values) | |
| 53 | +| Hypothesis tests | `ttest_1samp`, `ttest_ind`, `ttest_rel`, `chi2_gof`, `chi2_ind`, `ks_1samp`, `f_oneway` | |
| 54 | + |
| 55 | +All 30 functions are self-contained (Mojo stdlib only) and covered by unit tests validated against SciPy reference values. |
| 56 | + |
| 57 | +> **What about Part II (statistical models)?** |
| 58 | +> OLS regression and GLMs require matrix operations that depend on [NuMojo](https://github.com/Mojo-Numerics-and-Algorithms-group/NuMojo) and [MatMojo](https://github.com/mojomath/matmojo). These upstream libraries are still fast evolving. Part II will resume once the ecosystem catches up. See the [full roadmap](docs/roadmap.md) for details. |
| 59 | +
|
| 60 | +## Background |
| 61 | + |
| 62 | +Due to my academic and professional background, I work extensively with hypothesis testing and regression models on a daily basis, and have been a long-time user of Stata and `statsmodels`. It has been two years since Mojo first appeared, and [NuMojo](https://github.com/Mojo-Numerics-and-Algorithms-group/NuMojo) now has its core functionality in place. Driven by my enthusiasm for Mojo, I felt it was time to start migrating some of my personal research projects to the Mojo ecosystem — and that is precisely how StaMojo was born. |
| 63 | + |
| 64 | +The library is designed around two pillars: |
| 65 | + |
| 66 | +1. **Part I — Statistical computing foundation** (self-contained) — special functions, probability distributions, descriptive statistics, hypothesis tests, and correlation. |
| 67 | +2. **Part II — Statistical modeling** (depends on NuMojo and MatMojo) — OLS, GLM, logistic regression, and related diagnostics. |
| 68 | + |
| 69 | +At the moment I am still building out the project scaffolding and solidifying the core functionality. Because Mojo has not yet reached v1.0, breaking changes are frequent across compiler releases, so **pull requests are not preferred at this time**. If you have any suggestions, questions, or feedback, please feel free to open an [issue](https://github.com/mojomath/stamojo/issues), start a [discussion](https://github.com/mojomath/stamojo/discussions), or reach out on our [Discord channel](https://discord.gg/3rGH87uZTk). Thank you for your understanding! |
| 70 | + |
| 71 | +## Installation |
| 72 | + |
| 73 | +StaMojo is available in the modular-community `https://repo.prefix.dev/modular-community` package repository. To access this repository, add it to your `channels` list in your `pixi.toml` file: |
| 74 | + |
| 75 | +```toml |
| 76 | +channels = ["https://conda.modular.com/max", "https://repo.prefix.dev/modular-community", "conda-forge"] |
| 77 | +``` |
| 78 | + |
| 79 | +Then, you can install StaMojo using any of these methods: |
| 80 | + |
| 81 | +1. From the `pixi` CLI, run the command ```pixi add stamojo```. This fetches the latest version and makes it immediately available for import. |
| 82 | + |
| 83 | +1. In the `mojoproject.toml` file of your project, add the following dependency: |
| 84 | + |
| 85 | + ```toml |
| 86 | + stamojo = "==0.1.0" |
| 87 | + ``` |
| 88 | + |
| 89 | + Then run `pixi install` to download and install the package. |
| 90 | + |
| 91 | +1. For the latest development version in the `main` branch, clone [this GitHub repository](https://github.com/mojomath/stamojo) and build the package locally using the command `pixi run package`. |
| 92 | + |
| 93 | + ```bash |
| 94 | + git clone https://github.com/mojomath/stamojo.git |
| 95 | + cd stamojo |
| 96 | + pixi install |
| 97 | + pixi run package |
| 98 | + ``` |
| 99 | + |
| 100 | +The following table summarizes the package versions and their corresponding Mojo versions: |
| 101 | + |
| 102 | +| `stamojo` | `mojo` | package manager | |
| 103 | +| --------- | -------- | --------------- | |
| 104 | +| v0.1.0 | ==0.26.1 | pixi | |
| 105 | + |
| 106 | +## Examples |
| 107 | + |
| 108 | +The file [`examples/examples.mojo`](examples/examples.mojo) demonstrates the key APIs available in Part I. Run it with: |
| 109 | + |
| 110 | +```bash |
| 111 | +mojo run -I src examples/examples.mojo |
| 112 | +``` |
| 113 | + |
| 114 | +```mojo |
| 115 | +from stamojo.special import gammainc, gammaincc, beta, lbeta, betainc, erfinv, ndtri |
| 116 | +from stamojo.distributions import Normal, StudentT, ChiSquared, FDist |
| 117 | +from stamojo.stats import ( |
| 118 | + mean, variance, std, median, quantile, skewness, kurtosis, |
| 119 | + pearsonr, spearmanr, kendalltau, |
| 120 | + ttest_1samp, ttest_ind, ttest_rel, |
| 121 | + chi2_gof, chi2_ind, ks_1samp, f_oneway, |
| 122 | +) |
| 123 | +
|
| 124 | +
|
| 125 | +fn main() raises: |
| 126 | + # --- Special functions --------------------------------------------------- |
| 127 | + print("gammainc(1, 2) =", gammainc(1.0, 2.0)) # 0.8646647167628346 |
| 128 | + print("gammaincc(1, 2) =", gammaincc(1.0, 2.0)) # 0.13533528323716537 |
| 129 | + print("beta(2, 3) =", beta(2.0, 3.0)) # 0.08333333333323925 |
| 130 | + print("betainc(2, 3, 0.5) =", betainc(2.0, 3.0, 0.5)) # 0.6875000000000885 |
| 131 | + print("erfinv(0.5) =", erfinv(0.5)) # 0.4769362762044701 |
| 132 | + print("ndtri(0.975) =", ndtri(0.975)) # 1.9599639845400543 |
| 133 | +
|
| 134 | + # --- Distributions ------------------------------------------------------- |
| 135 | + var n = Normal(0.0, 1.0) |
| 136 | + print("Normal(0,1).pdf(0) =", n.pdf(0.0)) # 0.3989422804014327 |
| 137 | + print("Normal(0,1).cdf(1.96)=", n.cdf(1.96)) # 0.9750021048517795 |
| 138 | + print("Normal(0,1).ppf(0.975)=", n.ppf(0.975)) # 1.9599639845400543 |
| 139 | + print("Normal(0,1).sf(1.96) =", n.sf(1.96)) # 0.02499789514822043 |
| 140 | +
|
| 141 | + var t = StudentT(10.0) |
| 142 | + print("StudentT(10).cdf(2.0)=", t.cdf(2.0)) # 0.9633059826444078 |
| 143 | + print("StudentT(10).ppf(0.975)=", t.ppf(0.975)) # 2.2281388540534057 |
| 144 | +
|
| 145 | + var c = ChiSquared(5.0) |
| 146 | + print("ChiSquared(5).cdf(11.07)=", c.cdf(11.07)) # 0.9499903814759155 |
| 147 | +
|
| 148 | + var f = FDist(5.0, 10.0) |
| 149 | + print("FDist(5,10).cdf(3.33)=", f.cdf(3.33)) # 0.9501687242532277 |
| 150 | +
|
| 151 | + # --- Descriptive statistics ---------------------------------------------- |
| 152 | + var data: List[Float64] = [2.0, 4.0, 4.0, 4.0, 5.0, 5.0, 7.0, 9.0] |
| 153 | + print("mean =", mean(data)) # 5.0 |
| 154 | + print("variance=", variance(data, ddof=0)) # 4.0 |
| 155 | + print("std =", std(data, ddof=0)) # 2.0 |
| 156 | + print("median =", median(data)) # 4.5 |
| 157 | + print("Q(0.25) =", quantile(data, 0.25)) # 4.0 |
| 158 | + print("skewness=", skewness(data)) # 0.8184875533567997 |
| 159 | + print("kurtosis=", kurtosis(data)) # 0.940625 |
| 160 | +
|
| 161 | + # --- Correlation --------------------------------------------------------- |
| 162 | + var x: List[Float64] = [1.0, 2.0, 3.0, 4.0, 5.0] |
| 163 | + var y: List[Float64] = [2.1, 3.8, 6.0, 7.9, 10.1] |
| 164 | + var pr = pearsonr(x, y) |
| 165 | + print("pearsonr r=", pr[0], " p=", pr[1]) # 0.9991718425080479, 2.8605484175113625e-05 |
| 166 | + var sr = spearmanr(x, y) |
| 167 | + print("spearmanr ρ=", sr[0], " p=", sr[1]) # 1.0, 0.0 |
| 168 | + var kt = kendalltau(x, y) |
| 169 | + print("kendalltau τ=", kt[0], " p=", kt[1]) # 1.0, 0.014305878435429659 |
| 170 | +
|
| 171 | + # --- Hypothesis tests ---------------------------------------------------- |
| 172 | + var sample: List[Float64] = [5.1, 4.8, 5.3, 5.0, 4.9, 5.2] |
| 173 | + var res = ttest_1samp(sample, 5.0) |
| 174 | + print("ttest_1samp t=", res[0], " p=", res[1]) # 0.654653670707975, 0.5416045608507769 |
| 175 | +
|
| 176 | + var a: List[Float64] = [1.0, 2.0, 3.0, 4.0, 5.0] |
| 177 | + var b: List[Float64] = [4.0, 5.0, 6.0, 7.0, 8.0] |
| 178 | + var res2 = ttest_ind(a, b) |
| 179 | + print("ttest_ind t=", res2[0], " p=", res2[1]) # -3.0, 0.0170716812337895 |
| 180 | +
|
| 181 | + var obs: List[Float64] = [16.0, 18.0, 16.0, 14.0, 12.0, 14.0] |
| 182 | + var exp: List[Float64] = [15.0, 15.0, 15.0, 15.0, 15.0, 15.0] |
| 183 | + var res3 = chi2_gof(obs, exp) |
| 184 | + print("chi2_gof χ²=", res3[0], " p=", res3[1]) # 1.4666666666666666, 0.9168841203537823 |
| 185 | +
|
| 186 | + var g1: List[Float64] = [3.0, 4.0, 5.0] |
| 187 | + var g2: List[Float64] = [6.0, 7.0, 8.0] |
| 188 | + var g3: List[Float64] = [9.0, 10.0, 11.0] |
| 189 | + var groups = List[List[Float64]]() |
| 190 | + groups.append(g1^) |
| 191 | + groups.append(g2^) |
| 192 | + groups.append(g3^) |
| 193 | + var res4 = f_oneway(groups) |
| 194 | + print("f_oneway F=", res4[0], " p=", res4[1]) # 27.0, 0.0010000000005315757 |
| 195 | +``` |
| 196 | + |
| 197 | +## Architecture |
| 198 | + |
| 199 | +```txt |
| 200 | +src/stamojo/ |
| 201 | +├── __init__.mojo # Package root (re-exports distributions & stats) |
| 202 | +├── prelude.mojo # Convenient re-exports |
| 203 | +├── special/ # Special mathematical functions (cf. scipy.special) |
| 204 | +│ ├── __init__.mojo |
| 205 | +│ ├── _gamma.mojo # gammainc, gammaincc |
| 206 | +│ ├── _beta.mojo # beta, lbeta, betainc |
| 207 | +│ └── _erf.mojo # erfinv, ndtri |
| 208 | +├── distributions/ # Probability distributions |
| 209 | +│ ├── __init__.mojo |
| 210 | +│ ├── normal.mojo # Normal (Gaussian) — PDF, logPDF, CDF, SF, PPF, rvs |
| 211 | +│ ├── t.mojo # Student's t |
| 212 | +│ ├── chi2.mojo # Chi-squared |
| 213 | +│ └── f.mojo # F-distribution |
| 214 | +├── stats/ # Descriptive stats & hypothesis tests |
| 215 | +│ ├── __init__.mojo |
| 216 | +│ ├── descriptive.mojo # mean, variance, std, median, quantile, skewness, kurtosis |
| 217 | +│ ├── correlation.mojo # pearsonr, spearmanr, kendalltau |
| 218 | +│ └── tests.mojo # ttest_1samp, ttest_ind, ttest_rel, chi2_gof, chi2_ind, ks_1samp, f_oneway |
| 219 | +└── models/ # Statistical models (planned) |
| 220 | + ├── __init__.mojo |
| 221 | + └── ols.mojo # Ordinary Least Squares (stub) |
| 222 | +tests/ |
| 223 | +├── test_all.sh # Run all test suites |
| 224 | +├── test_special.mojo # tests — special functions |
| 225 | +├── test_distributions.mojo # tests — Normal, t, χ², F |
| 226 | +├── test_stats.mojo # tests — descriptive statistics |
| 227 | +└── test_hypothesis.mojo # tests — hypothesis tests, correlation, ANOVA |
| 228 | +``` |
| 229 | + |
| 230 | +## Roadmap |
| 231 | + |
| 232 | +The project is organized into **Part I** (scipy.stats-equivalent, no external dependencies) and **Part II** (statsmodels-equivalent, requires NuMojo + MatMojo). Phases 0–2 are complete; see the [full roadmap](docs/roadmap.md) for all planned phases. |
| 233 | + |
| 234 | +| | Phase | Status | |
| 235 | +| ----------- | ------------------------------------------------ | ------------------------- | |
| 236 | +| **Part I** | Phase 0 — Special Functions | ✓ | |
| 237 | +| | Phase 1 — Core Distributions & Descriptive Stats | ✓ | |
| 238 | +| | Phase 2 — Hypothesis Testing & Correlation | ✓ | |
| 239 | +| | Phase 3 — Extended Distributions | planned | |
| 240 | +| | Phase 4 — Extended Tests & Utilities | planned | |
| 241 | +| **Part II** | Phase 5 — OLS Regression | awaiting NuMojo / MatMojo | |
| 242 | +| | Phase 6 — Generalized Linear Models | awaiting NuMojo / MatMojo | |
| 243 | +| | Phase 7 — Extended Models | planned | |
| 244 | +| | Phase 8 — Advanced Topics | planned | |
| 245 | + |
| 246 | +## License |
| 247 | + |
| 248 | +This repository and its contributions are licensed under the Apache License v2.0. |
0 commit comments