Skip to content

Deep learning framework for model-agnostic anomaly detection in LHC collision data. Uses Variational Autoencoders to identify potential new physics signatures without prior signal hypotheses. Trained on LHCO 2020 benchmark dataset. PyTorch implementation with physics-validated evaluation metrics.

License

Notifications You must be signed in to change notification settings

GauravG-Work/ADDF-HEP-Anomaly-Driven-Discovery-Framework-for-High-Energy-Physics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ADDF-HEP: Anomaly Detection Framework for High-Energy Physics

Python PyTorch License

A deep learning framework for model-agnostic anomaly detection in particle physics collider data

InstallationQuick StartDocumentationResults


Overview

ADDF-HEP implements unsupervised anomaly detection methods designed to identify potential new physics signatures in high-energy particle collision data. The framework trains on known Standard Model backgrounds and flags events that deviate from learned patterns—enabling discovery without prior signal hypotheses.

Key Features

  • Reconstruction-based Detection: Deep Autoencoders and Variational Autoencoders learn compressed representations of "normal" physics
  • Physics-validated Metrics: Evaluation includes signal efficiency at fixed false-positive rates, common in HEP analyses
  • Production-ready: Modular architecture with configurable hyperparameters, checkpointing, and early stopping
  • Benchmark Dataset Support: Compatible with LHCO 2020 R&D dataset for standardized evaluation

Installation

git clone https://github.com/GauravG-Work/addf-hep.git
cd addf-hep
pip install -e .

Requirements

  • Python ≥ 3.9
  • PyTorch ≥ 2.0
  • NumPy, Pandas, scikit-learn, matplotlib
  • h5py (for HDF5 data loading)

Quick Start

Using Synthetic Data

from src.data.loader import SyntheticGenerator, create_dataloaders
from src.data.preprocessing import FeatureScaler
from src.models.autoencoder import VariationalAutoencoder
from src.engine.trainer import Trainer
from src.engine.evaluator import Evaluator

# Generate synthetic collision data
data = SyntheticGenerator.generate(n_background=10000, n_signal=500)

# Preprocess
scaler = FeatureScaler(method="robust")
data.train = scaler.fit_transform(data.train)
data.val = scaler.transform(data.val)
data.test = scaler.transform(data.test)

# Create dataloaders
loaders = create_dataloaders(data, batch_size=256)

# Initialize and train VAE
model = VariationalAutoencoder(input_dim=16, latent_dim=4)
trainer = Trainer(model)
history = trainer.fit(loaders["train"], loaders["val"])

# Evaluate
scores = trainer.score(loaders["test"]).numpy()
evaluator = Evaluator(fpr_targets=[0.01, 0.001])
results = evaluator.evaluate(scores, data.labels_test)

print(f"ROC-AUC: {results.roc_auc:.4f}")
print(f"Signal Efficiency @ 1% FPR: {results.efficiency_at_fpr[0.01]:.2%}")

Using LHCO Dataset

# Download LHCO 2020 R&D dataset
python scripts/download_lhco_data.py

# Train on real data
python scripts/train_on_lhco.py --model vae --epochs 50

Architecture

Input Features → Encoder → Latent Space (z) → Decoder → Reconstruction
                              ↓
                    Anomaly Score = ||x - x̂||²

Models

Model Description Use Case
DeepAutoencoder Deterministic MLP encoder-decoder Fast baseline
VariationalAutoencoder Probabilistic with KL regularization Better generalization

Directory Structure

addf-hep/
├── src/
│   ├── data/           # Data loading, preprocessing
│   ├── models/         # Neural network architectures
│   └── engine/         # Training and evaluation
├── configs/            # YAML hyperparameter configs
├── scripts/            # Training and utility scripts
├── notebooks/          # Jupyter/Colab notebooks
└── tests/              # Unit tests

Results

LHCO 2020 R&D Dataset

Trained on 800k QCD dijet background events, evaluated on 100k background + 100k W'→XY signal events.

Model ROC-AUC Efficiency @ 1% FPR Efficiency @ 0.1% FPR
Deep AE 0.84 15.8% 4.6%
VAE (β=1) 0.93 39.0% 16.4%

Configuration

Training hyperparameters are managed via YAML:

model:
  type: "vae"
  latent_dim: 8
  encoder_layers: [128, 64, 32]

training:
  epochs: 100
  learning_rate: 1.0e-3
  early_stopping:
    patience: 10

Google Colab

For GPU-accelerated training:

Open In Colab

Testing

pytest tests/ -v

References

License

MIT License - see LICENSE for details.

Citation

If you use this code in your research, please cite:

@software{addf_hep,
  title = {ADDF-HEP: Anomaly Detection Framework for High-Energy Physics},
  year = {2025},
  url = {https://github.com/YOUR_USERNAME/addf-hep}
}

About

Deep learning framework for model-agnostic anomaly detection in LHC collision data. Uses Variational Autoencoders to identify potential new physics signatures without prior signal hypotheses. Trained on LHCO 2020 benchmark dataset. PyTorch implementation with physics-validated evaluation metrics.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published