Skip to content

Commit 37ba334

Browse files
committed
Add stamojo
1 parent 70a6842 commit 37ba334

File tree

3 files changed

+285
-0
lines changed

3 files changed

+285
-0
lines changed

recipes/stamojo/README.md

Lines changed: 248 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,248 @@
1+
# StaMojo <!-- omit in toc -->
2+
3+
![icon](stamojo.png)
4+
5+
A statistical computing library for [Mojo](https://www.modular.com/mojo), inspired by `scipy.stats` and `statsmodels` in Python.
6+
7+
**[Repository on GitHub»](https://github.com/mojomath/stamojo)** | **[Discord channel»](https://discord.gg/3rGH87uZTk)**
8+
9+
- [Overview](#overview)
10+
- [Status](#status)
11+
- [Background](#background)
12+
- [Installation](#installation)
13+
- [Examples](#examples)
14+
- [Architecture](#architecture)
15+
- [Roadmap](#roadmap)
16+
- [License](#license)
17+
18+
## Overview
19+
20+
StaMojo (Statistics + Mojo) brings comprehensive statistical computing to the Mojo ecosystem. The library is organized into two parts:
21+
22+
| Part | Scope | Dependencies |
23+
| --------------------------------------------- | --------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- |
24+
| **Part I — Statistical Computing Foundation** | Special functions, distributions, descriptive statistics, hypothesis tests, correlation | Mojo stdlib only |
25+
| **Part II — Statistical Modeling** | OLS, GLM, logistic regression, and model diagnostics | [NuMojo](https://github.com/Mojo-Numerics-and-Algorithms-group/NuMojo) + [MatMojo](https://github.com/mojomath/matmojo) |
26+
27+
Part I is **available now** with zero external dependencies. Part II will begin once the upstream linear-algebra ecosystem stabilizes on a compatible Mojo release.
28+
29+
### Why a separate library? <!-- omit in toc -->
30+
31+
In the Python ecosystem, `scipy` bundles statistics, optimization, signal processing, integration, interpolation, and more into one giant package. For Mojo, a modular approach is more appropriate:
32+
33+
| Python package | Mojo equivalent | Focus |
34+
| ------------------------------- | -------------------------------------------------------------------------- | --------------------------------- |
35+
| `numpy` | [**NuMojo**](https://github.com/Mojo-Numerics-and-Algorithms-group/NuMojo) | N-dimensional arrays, basic math |
36+
| `decimal` / `mpmath` | [**DeciMojo**](https://github.com/mojomath/decimojo) | Arbitrary-precision arithmetic |
37+
| `numpy.linalg` / `scipy.linalg` | [**MatMojo**](https://github.com/mojomath/matmojo) | Linear algebra |
38+
| `scipy.stats` | **StaMojo** (distributions + tests) | Statistical distributions & tests |
39+
| `statsmodels` | **StaMojo** (models) | Statistical models & econometrics |
40+
41+
Placing `scipy.stats`-like functionality and `statsmodels`-like regression in **one library** is intentional: regression models inherently depend on distribution functions (for p-values, confidence intervals, etc.), so co-locating them avoids circular dependencies, simplifies versioning, and provides a cohesive API.
42+
43+
## Status
44+
45+
**v0.1** — Part I is complete and ready for use. The current release provides:
46+
47+
| Category | Functions |
48+
| ----------------- | ----------------------------------------------------------------------------------------------- |
49+
| Special functions | `gammainc`, `gammaincc`, `beta`, `lbeta`, `betainc`, `erfinv`, `ndtri` |
50+
| Distributions | `Normal`, `StudentT`, `ChiSquared`, `FDist` — each with PDF, log-PDF, CDF, SF, PPF, `rvs` |
51+
| Descriptive stats | `mean`, `variance`, `std`, `median`, `quantile`, `skewness`, `kurtosis`, `data_min`, `data_max` |
52+
| Correlation | `pearsonr`, `spearmanr`, `kendalltau` (with p-values) |
53+
| Hypothesis tests | `ttest_1samp`, `ttest_ind`, `ttest_rel`, `chi2_gof`, `chi2_ind`, `ks_1samp`, `f_oneway` |
54+
55+
All 30 functions are self-contained (Mojo stdlib only) and covered by unit tests validated against SciPy reference values.
56+
57+
> **What about Part II (statistical models)?**
58+
> OLS regression and GLMs require matrix operations that depend on [NuMojo](https://github.com/Mojo-Numerics-and-Algorithms-group/NuMojo) and [MatMojo](https://github.com/mojomath/matmojo). These upstream libraries are still fast evolving. Part II will resume once the ecosystem catches up. See the [full roadmap](docs/roadmap.md) for details.
59+
60+
## Background
61+
62+
Due to my academic and professional background, I work extensively with hypothesis testing and regression models on a daily basis, and have been a long-time user of Stata and `statsmodels`. It has been two years since Mojo first appeared, and [NuMojo](https://github.com/Mojo-Numerics-and-Algorithms-group/NuMojo) now has its core functionality in place. Driven by my enthusiasm for Mojo, I felt it was time to start migrating some of my personal research projects to the Mojo ecosystem — and that is precisely how StaMojo was born.
63+
64+
The library is designed around two pillars:
65+
66+
1. **Part I — Statistical computing foundation** (self-contained) — special functions, probability distributions, descriptive statistics, hypothesis tests, and correlation.
67+
2. **Part II — Statistical modeling** (depends on NuMojo and MatMojo) — OLS, GLM, logistic regression, and related diagnostics.
68+
69+
At the moment I am still building out the project scaffolding and solidifying the core functionality. Because Mojo has not yet reached v1.0, breaking changes are frequent across compiler releases, so **pull requests are not preferred at this time**. If you have any suggestions, questions, or feedback, please feel free to open an [issue](https://github.com/mojomath/stamojo/issues), start a [discussion](https://github.com/mojomath/stamojo/discussions), or reach out on our [Discord channel](https://discord.gg/3rGH87uZTk). Thank you for your understanding!
70+
71+
## Installation
72+
73+
StaMojo is available in the modular-community `https://repo.prefix.dev/modular-community` package repository. To access this repository, add it to your `channels` list in your `pixi.toml` file:
74+
75+
```toml
76+
channels = ["https://conda.modular.com/max", "https://repo.prefix.dev/modular-community", "conda-forge"]
77+
```
78+
79+
Then, you can install StaMojo using any of these methods:
80+
81+
1. From the `pixi` CLI, run the command ```pixi add stamojo```. This fetches the latest version and makes it immediately available for import.
82+
83+
1. In the `mojoproject.toml` file of your project, add the following dependency:
84+
85+
```toml
86+
stamojo = "==0.1.0"
87+
```
88+
89+
Then run `pixi install` to download and install the package.
90+
91+
1. For the latest development version in the `main` branch, clone [this GitHub repository](https://github.com/mojomath/stamojo) and build the package locally using the command `pixi run package`.
92+
93+
```bash
94+
git clone https://github.com/mojomath/stamojo.git
95+
cd stamojo
96+
pixi install
97+
pixi run package
98+
```
99+
100+
The following table summarizes the package versions and their corresponding Mojo versions:
101+
102+
| `stamojo` | `mojo` | package manager |
103+
| --------- | -------- | --------------- |
104+
| v0.1.0 | ==0.26.1 | pixi |
105+
106+
## Examples
107+
108+
The file [`examples/examples.mojo`](examples/examples.mojo) demonstrates the key APIs available in Part I. Run it with:
109+
110+
```bash
111+
mojo run -I src examples/examples.mojo
112+
```
113+
114+
```mojo
115+
from stamojo.special import gammainc, gammaincc, beta, lbeta, betainc, erfinv, ndtri
116+
from stamojo.distributions import Normal, StudentT, ChiSquared, FDist
117+
from stamojo.stats import (
118+
mean, variance, std, median, quantile, skewness, kurtosis,
119+
pearsonr, spearmanr, kendalltau,
120+
ttest_1samp, ttest_ind, ttest_rel,
121+
chi2_gof, chi2_ind, ks_1samp, f_oneway,
122+
)
123+
124+
125+
fn main() raises:
126+
# --- Special functions ---------------------------------------------------
127+
print("gammainc(1, 2) =", gammainc(1.0, 2.0)) # 0.8646647167628346
128+
print("gammaincc(1, 2) =", gammaincc(1.0, 2.0)) # 0.13533528323716537
129+
print("beta(2, 3) =", beta(2.0, 3.0)) # 0.08333333333323925
130+
print("betainc(2, 3, 0.5) =", betainc(2.0, 3.0, 0.5)) # 0.6875000000000885
131+
print("erfinv(0.5) =", erfinv(0.5)) # 0.4769362762044701
132+
print("ndtri(0.975) =", ndtri(0.975)) # 1.9599639845400543
133+
134+
# --- Distributions -------------------------------------------------------
135+
var n = Normal(0.0, 1.0)
136+
print("Normal(0,1).pdf(0) =", n.pdf(0.0)) # 0.3989422804014327
137+
print("Normal(0,1).cdf(1.96)=", n.cdf(1.96)) # 0.9750021048517795
138+
print("Normal(0,1).ppf(0.975)=", n.ppf(0.975)) # 1.9599639845400543
139+
print("Normal(0,1).sf(1.96) =", n.sf(1.96)) # 0.02499789514822043
140+
141+
var t = StudentT(10.0)
142+
print("StudentT(10).cdf(2.0)=", t.cdf(2.0)) # 0.9633059826444078
143+
print("StudentT(10).ppf(0.975)=", t.ppf(0.975)) # 2.2281388540534057
144+
145+
var c = ChiSquared(5.0)
146+
print("ChiSquared(5).cdf(11.07)=", c.cdf(11.07)) # 0.9499903814759155
147+
148+
var f = FDist(5.0, 10.0)
149+
print("FDist(5,10).cdf(3.33)=", f.cdf(3.33)) # 0.9501687242532277
150+
151+
# --- Descriptive statistics ----------------------------------------------
152+
var data: List[Float64] = [2.0, 4.0, 4.0, 4.0, 5.0, 5.0, 7.0, 9.0]
153+
print("mean =", mean(data)) # 5.0
154+
print("variance=", variance(data, ddof=0)) # 4.0
155+
print("std =", std(data, ddof=0)) # 2.0
156+
print("median =", median(data)) # 4.5
157+
print("Q(0.25) =", quantile(data, 0.25)) # 4.0
158+
print("skewness=", skewness(data)) # 0.8184875533567997
159+
print("kurtosis=", kurtosis(data)) # 0.940625
160+
161+
# --- Correlation ---------------------------------------------------------
162+
var x: List[Float64] = [1.0, 2.0, 3.0, 4.0, 5.0]
163+
var y: List[Float64] = [2.1, 3.8, 6.0, 7.9, 10.1]
164+
var pr = pearsonr(x, y)
165+
print("pearsonr r=", pr[0], " p=", pr[1]) # 0.9991718425080479, 2.8605484175113625e-05
166+
var sr = spearmanr(x, y)
167+
print("spearmanr ρ=", sr[0], " p=", sr[1]) # 1.0, 0.0
168+
var kt = kendalltau(x, y)
169+
print("kendalltau τ=", kt[0], " p=", kt[1]) # 1.0, 0.014305878435429659
170+
171+
# --- Hypothesis tests ----------------------------------------------------
172+
var sample: List[Float64] = [5.1, 4.8, 5.3, 5.0, 4.9, 5.2]
173+
var res = ttest_1samp(sample, 5.0)
174+
print("ttest_1samp t=", res[0], " p=", res[1]) # 0.654653670707975, 0.5416045608507769
175+
176+
var a: List[Float64] = [1.0, 2.0, 3.0, 4.0, 5.0]
177+
var b: List[Float64] = [4.0, 5.0, 6.0, 7.0, 8.0]
178+
var res2 = ttest_ind(a, b)
179+
print("ttest_ind t=", res2[0], " p=", res2[1]) # -3.0, 0.0170716812337895
180+
181+
var obs: List[Float64] = [16.0, 18.0, 16.0, 14.0, 12.0, 14.0]
182+
var exp: List[Float64] = [15.0, 15.0, 15.0, 15.0, 15.0, 15.0]
183+
var res3 = chi2_gof(obs, exp)
184+
print("chi2_gof χ²=", res3[0], " p=", res3[1]) # 1.4666666666666666, 0.9168841203537823
185+
186+
var g1: List[Float64] = [3.0, 4.0, 5.0]
187+
var g2: List[Float64] = [6.0, 7.0, 8.0]
188+
var g3: List[Float64] = [9.0, 10.0, 11.0]
189+
var groups = List[List[Float64]]()
190+
groups.append(g1^)
191+
groups.append(g2^)
192+
groups.append(g3^)
193+
var res4 = f_oneway(groups)
194+
print("f_oneway F=", res4[0], " p=", res4[1]) # 27.0, 0.0010000000005315757
195+
```
196+
197+
## Architecture
198+
199+
```txt
200+
src/stamojo/
201+
├── __init__.mojo # Package root (re-exports distributions & stats)
202+
├── prelude.mojo # Convenient re-exports
203+
├── special/ # Special mathematical functions (cf. scipy.special)
204+
│ ├── __init__.mojo
205+
│ ├── _gamma.mojo # gammainc, gammaincc
206+
│ ├── _beta.mojo # beta, lbeta, betainc
207+
│ └── _erf.mojo # erfinv, ndtri
208+
├── distributions/ # Probability distributions
209+
│ ├── __init__.mojo
210+
│ ├── normal.mojo # Normal (Gaussian) — PDF, logPDF, CDF, SF, PPF, rvs
211+
│ ├── t.mojo # Student's t
212+
│ ├── chi2.mojo # Chi-squared
213+
│ └── f.mojo # F-distribution
214+
├── stats/ # Descriptive stats & hypothesis tests
215+
│ ├── __init__.mojo
216+
│ ├── descriptive.mojo # mean, variance, std, median, quantile, skewness, kurtosis
217+
│ ├── correlation.mojo # pearsonr, spearmanr, kendalltau
218+
│ └── tests.mojo # ttest_1samp, ttest_ind, ttest_rel, chi2_gof, chi2_ind, ks_1samp, f_oneway
219+
└── models/ # Statistical models (planned)
220+
├── __init__.mojo
221+
└── ols.mojo # Ordinary Least Squares (stub)
222+
tests/
223+
├── test_all.sh # Run all test suites
224+
├── test_special.mojo # tests — special functions
225+
├── test_distributions.mojo # tests — Normal, t, χ², F
226+
├── test_stats.mojo # tests — descriptive statistics
227+
└── test_hypothesis.mojo # tests — hypothesis tests, correlation, ANOVA
228+
```
229+
230+
## Roadmap
231+
232+
The project is organized into **Part I** (scipy.stats-equivalent, no external dependencies) and **Part II** (statsmodels-equivalent, requires NuMojo + MatMojo). Phases 0–2 are complete; see the [full roadmap](docs/roadmap.md) for all planned phases.
233+
234+
| | Phase | Status |
235+
| ----------- | ------------------------------------------------ | ------------------------- |
236+
| **Part I** | Phase 0 — Special Functions ||
237+
| | Phase 1 — Core Distributions & Descriptive Stats ||
238+
| | Phase 2 — Hypothesis Testing & Correlation ||
239+
| | Phase 3 — Extended Distributions | planned |
240+
| | Phase 4 — Extended Tests & Utilities | planned |
241+
| **Part II** | Phase 5 — OLS Regression | awaiting NuMojo / MatMojo |
242+
| | Phase 6 — Generalized Linear Models | awaiting NuMojo / MatMojo |
243+
| | Phase 7 — Extended Models | planned |
244+
| | Phase 8 — Advanced Topics | planned |
245+
246+
## License
247+
248+
This repository and its contributions are licensed under the Apache License v2.0.

recipes/stamojo/recipe.yaml

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
context:
2+
version: "0.1.0"
3+
mojo_version: "=0.26.1"
4+
5+
package:
6+
name: "stamojo"
7+
version: ${{ version }}
8+
9+
source:
10+
- git: https://github.com/mojomath/stamojo.git
11+
rev: 86d5b3ca37c90cab5e0c7c691422856d259d9aed
12+
13+
build:
14+
number: 0
15+
script:
16+
- mkdir -p ${PREFIX}/lib/mojo
17+
- mojo package src/stamojo -o ${{ PREFIX }}/lib/mojo/stamojo.mojopkg
18+
19+
requirements:
20+
host:
21+
- mojo-compiler ${{ mojo_version }}
22+
build:
23+
- mojo-compiler ${{ mojo_version }}
24+
run:
25+
- mojo-compiler ${{ mojo_version }}
26+
27+
about:
28+
homepage: https://github.com/mojomath/stamojo
29+
license: Apache-2.0
30+
license_file: LICENSE
31+
summary: A statistical computing library for Mojo
32+
repository: https://github.com/mojomath/stamojo.git
33+
34+
extra:
35+
project_name: stamojo
36+
maintainers:
37+
- forfudan

recipes/stamojo/stamojo.png

99.6 KB
Loading

0 commit comments

Comments
 (0)