A deep learning framework for model-agnostic anomaly detection in particle physics collider data
ADDF-HEP implements unsupervised anomaly detection methods designed to identify potential new physics signatures in high-energy particle collision data. The framework trains on known Standard Model backgrounds and flags events that deviate from learned patterns—enabling discovery without prior signal hypotheses.
- Reconstruction-based Detection: Deep Autoencoders and Variational Autoencoders learn compressed representations of "normal" physics
- Physics-validated Metrics: Evaluation includes signal efficiency at fixed false-positive rates, common in HEP analyses
- Production-ready: Modular architecture with configurable hyperparameters, checkpointing, and early stopping
- Benchmark Dataset Support: Compatible with LHCO 2020 R&D dataset for standardized evaluation
git clone https://github.com/GauravG-Work/addf-hep.git
cd addf-hep
pip install -e .- Python ≥ 3.9
- PyTorch ≥ 2.0
- NumPy, Pandas, scikit-learn, matplotlib
- h5py (for HDF5 data loading)
from src.data.loader import SyntheticGenerator, create_dataloaders
from src.data.preprocessing import FeatureScaler
from src.models.autoencoder import VariationalAutoencoder
from src.engine.trainer import Trainer
from src.engine.evaluator import Evaluator
# Generate synthetic collision data
data = SyntheticGenerator.generate(n_background=10000, n_signal=500)
# Preprocess
scaler = FeatureScaler(method="robust")
data.train = scaler.fit_transform(data.train)
data.val = scaler.transform(data.val)
data.test = scaler.transform(data.test)
# Create dataloaders
loaders = create_dataloaders(data, batch_size=256)
# Initialize and train VAE
model = VariationalAutoencoder(input_dim=16, latent_dim=4)
trainer = Trainer(model)
history = trainer.fit(loaders["train"], loaders["val"])
# Evaluate
scores = trainer.score(loaders["test"]).numpy()
evaluator = Evaluator(fpr_targets=[0.01, 0.001])
results = evaluator.evaluate(scores, data.labels_test)
print(f"ROC-AUC: {results.roc_auc:.4f}")
print(f"Signal Efficiency @ 1% FPR: {results.efficiency_at_fpr[0.01]:.2%}")# Download LHCO 2020 R&D dataset
python scripts/download_lhco_data.py
# Train on real data
python scripts/train_on_lhco.py --model vae --epochs 50Input Features → Encoder → Latent Space (z) → Decoder → Reconstruction
↓
Anomaly Score = ||x - x̂||²
| Model | Description | Use Case |
|---|---|---|
DeepAutoencoder |
Deterministic MLP encoder-decoder | Fast baseline |
VariationalAutoencoder |
Probabilistic with KL regularization | Better generalization |
addf-hep/
├── src/
│ ├── data/ # Data loading, preprocessing
│ ├── models/ # Neural network architectures
│ └── engine/ # Training and evaluation
├── configs/ # YAML hyperparameter configs
├── scripts/ # Training and utility scripts
├── notebooks/ # Jupyter/Colab notebooks
└── tests/ # Unit tests
Trained on 800k QCD dijet background events, evaluated on 100k background + 100k W'→XY signal events.
| Model | ROC-AUC | Efficiency @ 1% FPR | Efficiency @ 0.1% FPR |
|---|---|---|---|
| Deep AE | 0.84 | 15.8% | 4.6% |
| VAE (β=1) | 0.93 | 39.0% | 16.4% |
Training hyperparameters are managed via YAML:
model:
type: "vae"
latent_dim: 8
encoder_layers: [128, 64, 32]
training:
epochs: 100
learning_rate: 1.0e-3
early_stopping:
patience: 10For GPU-accelerated training:
pytest tests/ -v- LHC Olympics 2020 - Anomaly detection challenge
- LHCO R&D Dataset - Zenodo dataset
- Kingma & Welling (2014) - Auto-Encoding Variational Bayes
MIT License - see LICENSE for details.
If you use this code in your research, please cite:
@software{addf_hep,
title = {ADDF-HEP: Anomaly Detection Framework for High-Energy Physics},
year = {2025},
url = {https://github.com/YOUR_USERNAME/addf-hep}
}