reionemu

A modular Python package for building machine-learning emulators of the kinetic Sunyaev-Zel'dovich (kSZ) angular power spectrum from kSZ 2LPT reionization simulations. It includes tools to condense simulation outputs, compute flat-sky power spectra, assemble training datasets, train neural networks that predict binned rescaled kSZ power spectra from reionization parameters, and save lightweight experiment artifacts for reproducibility.

The goal is to learn a fast surrogate model that maps reionization parameters → binned kSZ power spectrum, enabling rapid exploration of cosmological parameter space without re-running expensive simulations.

Installation

pip install reionemu

Or from source (editable):

git clone https://github.com/RobertxPearce/reionization-emulator.git
cd reionization-emulator
python -m pip install -e .

Requirements: Python 3.10+, NumPy, HDF5, PyTorch, and Ray Tune.

Quick start

After installing, you can load a processed HDF5 training dataset, create dataloaders, and train the baseline deterministic 4-parameter emulator:

from pathlib import Path
import torch
import reionemu

# Path to a condensed HDF5 that already has /training (X, Y, ell)
h5_path = Path("path/to/condensed.h5")

# Dataloaders with train/val split and optional normalization
loaders, normalizers, ell = reionemu.make_dataloaders(
    h5_path,
    split={"train": 0.8, "val": 0.2},
    config=reionemu.DataLoaderConfig(batch_size=32, seed=42),
)

# Baseline 4-parameter model, optimizer, loss
model = reionemu.FourParamEmulator()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
loss_fn = torch.nn.MSELoss()

# Train for a few epochs
history = reionemu.fit(
    model,
    loaders["train"],
    loaders["val"],
    optimizer,
    loss_fn,
    config=reionemu.FitConfig(epochs=10, device="cpu"),
)

# Validation loss per epoch
print(history["val_loss"])

# Save a lightweight experiment artifact
artifact_dir = reionemu.save_artifact(
    "baseline_four_param",
    Path("artifacts"),
    dataset_path=h5_path,
    dataloader_config=reionemu.DataLoaderConfig(batch_size=32, seed=42),
    fit_config=reionemu.FitConfig(epochs=10, device="cpu"),
    model_config={
        "class_name": "FourParamEmulator",
        "input_dim": 4,
        "output_dim": 5,
    },
    optimizer_config={"name": "Adam", "lr": 1e-3},
    history=history,
    normalizers=normalizers,
    checkpoint=model.state_dict(),
)

For MC-dropout experiments, use MCDropoutEmulator with the MC evaluation path:

model = reionemu.MCDropoutEmulator(dropout_rate=0.2)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)

history = reionemu.fit(
    model,
    loaders["train"],
    loaders["val"],
    optimizer,
    torch.nn.MSELoss(),
    config=reionemu.FitConfig(epochs=10, device="cpu"),
    evaluation="evaluate_mc_metrics",
    n_mc_samples=50,
)

print(history["val_mean_predictive_std"])

If you want to tune the four-parameter architecture with Ray Tune before training a final model, you can work directly with the loaded arrays:

from pathlib import Path

import reionemu
from ray import tune

h5_path = Path("path/to/condensed.h5")
X, Y, ell = reionemu.load_training_arrays(h5_path)

split_idx = int(0.8 * len(X))
X_train, X_val = X[:split_idx], X[split_idx:]
Y_train, Y_val = Y[:split_idx], Y[split_idx:]

param_space = {
    "hidden_dim": tune.choice([20, 32, 64]),
    "num_hidden_layers": tune.choice([1, 2, 3]),
    "activation": tune.choice(["relu", "silu", "tanh"]),
    "optimizer": tune.choice(["adam", "adamw"]),
    "lr": tune.loguniform(3e-4, 2e-3),
    "weight_decay": tune.loguniform(1e-8, 1e-4),
    "batch_size": tune.choice([16, 32, 64]),
    "epochs": 150,
    "early_stopping_patience": tune.choice([10, 15]),
    "gradient_clipping": tune.choice([None, 0.5, 1.0]),
    "normalize_X": True,
    "normalize_Y": False,
}

results = reionemu.run_tune_four_param(
    X_train=X_train,
    Y_train=Y_train,
    X_val=X_val,
    Y_val=Y_val,
    param_space=param_space,
    num_samples=20,
    max_concurrent_trials=2,
    device="cpu",
    storage_path="ray_results",
    experiment_name="four_param_search",
)

best = results.get_best_result(metric="val_loss", mode="min")
print(best.config)
print(best.metrics["best_val_loss"])

For a full pipeline example (condense → compute power spectra → build training data → tune/train/evaluate), scientific context, and complete usage examples, see the full documentation: Homepage

Scientific context

The kinetic Sunyaev-Zel'dovich (kSZ) effect arises from the scattering of CMB photons by free electrons with bulk motion, generating secondary temperature anisotropies. The kSZ angular power spectrum carries information about the timing, duration, and structure of reionization. This emulator provides a fast surrogate that maps reionization parameters (zmean_zre, alpha_zre, kb_zre, b0_zre) to binned, rescaled kSZ power spectra, making parameter-space exploration much faster than rerunning the full simulations.

Repository structure

Path	Description
`src/reionemu/`	Core library (pip-installable package)
`src/reionemu/simio/`	Simulation I/O, power spectrum computation, training-array building
`src/reionemu/data/`	Dataloaders, normalization
`src/reionemu/artifact/`	JSON experiment manifests, config/results saving, normalizer and checkpoint sidecars
`src/reionemu/models/`	Baseline and experimental emulator architectures
`src/reionemu/training/`	Training loop, K-fold cross-validation, metrics, and model builders
`src/reionemu/tuning/`	Ray Tune integration for hyperparameter search
`scripts/`	Dataset builder, HPC runners, sampling (environment-specific)
`notebooks/`	Analysis and training examples
`docs/`	Documentation source code
`datasets/`	Raw and processed datasets (not tracked)
`results/`	Visualizations for simulation checks, parameter-space validation, and model evaluation

The core API is in src/reionemu/. Scripts under scripts/hpc/ and scripts/sampling/ are for cluster and sampling workflows and may use machine-specific paths; the library itself is portable.

Main public API

Import from the top-level package after pip install reionemu:

Simulation I/O: condense_sim_root, CondenseConfig, add_cl_to_condensed_h5, ClConfig, build_and_write_training, build_training_arrays, BuildXYConfig, BuildStats, CondenseStats
Data: make_dataloaders, load_training_arrays, DataLoaderConfig, Normalizer
Artifacts: create_artifact_dir, save_artifact, save_configs, save_results, save_info, save_normalizers, load_normalizers, save_model_checkpoint, dataset_summary, file_fingerprint, read_json
Models: FourParamEmulator, MCDropoutEmulator (experimental variants live in reionemu.models.experimental)
Training: fit, FitConfig, train_one_epoch, evaluate, evaluate_metrics, evaluate_mc_metrics, kfold_cross_validate, KFoldConfig
Training helpers: build_four_param_model, build_mc_dropout_model, build_optimizer, mse, rmse, mean_relative_error, physical_mean_relative_error
Tuning: train_four_param_tune, default_param_space, run_tune_four_param

For full API reference, module documentation, and usage guides, visit: Homepage

Typical workflow

Parameter sampling - Latin Hypercube Sampling over the 4D reionization parameter space.
Simulation (HPC) - Run Zreion (or compatible) simulations; outputs per sim in HDF5.
Dataset construction - Use condense_sim_root → add_cl_to_condensed_h5 → build_and_write_training to produce a single condensed HDF5 with /sims and /training.
Hyperparameter search (optional) - Use load_training_arrays and run_tune_four_param to search over model and optimizer settings with Ray Tune.
Training and evaluation - Use make_dataloaders and fit (or kfold_cross_validate) to train and evaluate the selected emulator configuration.
Artifact saving - Use save_artifact to record JSON configs/results plus optional .npz normalizers and .pt model checkpoints.

Acknowledgments

This research is conducted in the LEADS Lab at the University of Nevada, Las Vegas, under Dr. Paul La Plante, with computing resources from the Pittsburgh Supercomputing Center (Bridges-2).

Name		Name	Last commit message	Last commit date
Latest commit History 118 Commits
.github/workflows		.github/workflows
datasets		datasets
docs		docs
notebooks		notebooks
results		results
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

reionemu

Installation

Quick start

Scientific context

Repository structure

Main public API

Typical workflow

Acknowledgments

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

reionemu

Installation

Quick start

Scientific context

Repository structure

Main public API

Typical workflow

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages