Note: This project is ongoing and subject to continuous updates.
This repository presents E-PCN — the Explainable Particle Chebyshev Network — a graph neural network for jet tagging in high-energy physics. Jet tagging refers to the task of identifying and classifying collimated sprays of particles (jets) produced in high-energy collisions and associating them with their originating particles or decay processes.
E-PCN extends the base PCN architecture by constructing four parallel graph representations per jet, each weighted by a distinct physics-motivated kinematic variable: angular separation (
This is a project of the Center for Computational and Data Sciences (CCDS), Independent University, Bangladesh, in collaboration with the Department of Theoretical Physics, University of Dhaka.
| Metric | PCN (baseline) | E-PCN (ours) | Improvement |
|---|---|---|---|
| Macro-Accuracy | 0.9249 | 0.9467 | +2.36% |
| Macro-AUC | 0.9294 | 0.9678 | +4.13% |
| Macro-AUPR | 0.6599 | 0.8241 | +24.88% |
E-PCN achieves the highest classification accuracy among all compared models, surpassing the Particle Transformer (ParT) at 93.12% by 1.55 percentage points. The most dramatic gains are in heavy-flavor channels: AUPR for
| Model | Macro-Accuracy | Macro-AUC |
|---|---|---|
| PFN | 0.8521 | 0.9103 |
| P-CNN | 0.8847 | 0.9312 |
| ParticleNet | 0.9015 | 0.9521 |
| ParT | 0.9312 | 0.9687 |
| PCN (baseline) | 0.9249 | 0.9294 |
| E-PCN (ours) | 0.9467 | 0.9678 |
Evaluated on the Aspen Open Jets dataset of real CMS proton-proton collision data using unsupervised clustering metrics:
| Metric | PCN | E-PCN | Improvement |
|---|---|---|---|
| Davies-Bouldin Index ↓ | 0.8395 | 0.4017 | −52.15% |
| Dunn Index ↑ | 0.0189 | 0.0269 | +42.33% |
-
Physics-informed multi-graph architecture: Four parallel GNN branches, each processing a graph weighted by one of
$\Delta$ ,$k_T$ ,$z$ , or$m^2$ , enabling the network to learn specialized representations for complementary aspects of QCD jet dynamics simultaneously. -
Grad-CAM explainability for multi-graph GNNs: Angular separation (
$\Delta$ , 40.72%) and relative transverse momentum ($k_T$ , 35.67%) together account for ~76% of classification decisions, consistent with soft-collinear factorization in perturbative QCD. - State-of-the-art classification performance on the JetClass benchmark across 9 signal classes.
- Generalization to real collider data: Evaluated on the Aspen Open Jets dataset of real CMS collision data, demonstrating robust representations under detector effects, pileup, and reconstruction uncertainties.
For each pair of connected particles
| Feature | Formula | Physical Meaning |
|---|---|---|
| Angular separation; encodes angular ordering and collinear emissions | ||
| Relative transverse momentum; sets the scale for |
||
| Momentum fraction; quantifies energy sharing via the DGLAP splitting functions |
||
| Lorentz-invariant mass squared; provides mass-scale sensitivity essential for heavy-flavor jet identification |
| Symbol | Definition |
|---|---|
| Rapidity: |
|
| Azimuthal angle of particle |
|
| Azimuthal angle difference wrapped to |
|
| Transverse momentum: |
|
| Three-momentum of particle |
|
| Energy of particle |
The emission probability in perturbative QCD factorizes as:
This factorization makes (
Dreyer, Salam & Soyez (2018). The Lund Jet Plane. arXiv:1807.04758 Dreyer & Qu (2021). Jet tagging in the Lund plane with graph networks. arXiv:2012.08526
E-PCN processes each jet through four parallel GNN branches — one per kinematic variable — each consisting of alternating Chebyshev graph convolutions (ChebConv) and edge convolutions (EdgeConv) (ChebConv → EdgeConv → ChebConv → EdgeConv → ChebConv). Each branch produces a 64-dimensional jet-level embedding via mean pooling. The four embeddings are stacked into a 4×64 matrix and combined by a 1×1 convolution, which learns to weight the kinematic representations adaptively. Two fully connected layers with dropout (rate 0.1) produce the final class probabilities via softmax.
We adapt Grad-CAM to the multi-graph setting by computing, for each graph branch, the product of gradient magnitude and embedding magnitude averaged over the 64 embedding dimensions. This yields a scalar importance score per kinematic variable, normalized to percentages summing to 100%.
Global feature importance (averaged over all jet classes):
| Variable | Importance | Role |
|---|---|---|
|
|
40.72% | Dominant; encodes collinear structure |
|
|
35.67% | Strong; encodes soft radiation scale |
|
|
14.06% | Moderate; encodes energy splitting |
|
|
9.54% | Lowest global; elevated for heavy flavor |
The 76% combined importance of
A large-scale benchmark comprising 100M jets across 10 classes (9 signal + 1 background), generated with Pythia 8.230. Signal classes include Higgs boson decays (
Approximately 178 million high-pT jets from the CMS 2016 JetHT proton-proton collision Open Data are used.
Since ground-truth class labels are not publicly available, representation quality is assessed using unsupervised clustering metrics, including the Davies-Bouldin Index (DBI) and Dunn Index, after training with the DeepCluster algorithm.
git clone https://github.com/ccdsiub/E-PCN.git
cd E-PCN
pip install -r requirements.txtKey hyperparameters from the paper:
| Parameter | Value |
|---|---|
| Optimizer | AdamW |
| Learning Rate | 1e-3 |
| LR Scheduler | OneCycleLR |
| Batch Size | 256 |
| Hidden Dimension | 64 |
| Graph Branches | 4 |
| k-NN neighbors | 3 |
| Conv. Layers | 5 (per branch) |
| Dropout Rate | 0.1 |
| Max Epochs | 500 |
| Early Stop patience | 10 epochs |
E-PCN/
├── raqib-pcn-experiments/ # Main experiment scripts and notebooks
├── pythia-data-gen.md # Data generation tutorial for Pythia
├── pythia-installation.md # Pythia installation guide
├── pythia-jet-tagging-data-generation-tutorial.md # Jet tagging data generation walkthrough
├── pythia-python-guide.md # Python interface guide for Pythia
├── requirements.txt # Python dependencies
└── README.md # This file
| # | Title | Venue |
|---|---|---|
| 1 | E-PCN: Jet Tagging with Explainable Particle Chebyshev Networks | arXiv:2512.07420 |
| 2 | PCN: A Deep Learning Approach to Jet Tagging Using Chebyshev Graph Convolutions | JHEP 2024 |
| 3 | The Lund Jet Plane | JHEP 2018 |
| 4 | Jet Tagging in the Lund Plane with Graph Networks | JHEP 2021 |
| 5 | Particle Transformer for Jet Tagging (ParT) | ICML 2022 |
| 6 | ParticleNet: Jet Tagging via Particle Clouds | Phys. Rev. D 2020 |
| 7 | JetClass Dataset | Zenodo |
| 8 | Aspen Open Jets | ML: Sci. Tech. 2025 |
| 9 | Grad-CAM | ICCV 2017 |
We thank the CERN Open Data Portal for providing high-quality collision data, and the original PCN authors for the base architecture. This research is partially supported by research grants from Independent University, Bangladesh (IUB).
This project is licensed under the MIT License. See LICENSE for details.
If you use this work, please cite:
@article{islam2025epcn,
title = {E-PCN: Jet Tagging with Explainable Particle Chebyshev Networks Using Kinematic Features},
author = {Islam, Md Raqibul and Khan, Adrita and Hossain, Mir Sazzat and Siddiqui, Choudhury Ben Yamin and Hossain, Md. Zakir and Khan, Tanjib and Momen, M. Arshad and Ali, Amin Ahsan and Rahman, AKM Mahbubur},
journal = {arXiv preprint arXiv:2512.07420},
year = {2025}
}