This repository provides tools and scripts for implementing and comparing various machine learning models—RetNet, XGBoost (XGB), Long Short-Term Memory (LSTM), and Multi-Layer Perceptron (MLP)—for b-jet tagging in high-energy physics experiments.
B-tagging is a technique used in particle physics to identify jets originating from bottom quarks (b-quarks). Accurate b-tagging is crucial for analyses involving processes like Higgs boson decays and top quark studies. This repository explores the implementation of several machine learning models to enhance b-jet tagging performance.
-
Clone the repository:
git clone https://github.com/asugu/B-Tagging.git cd B-Tagging -
Set up a virtual environment (optional but recommended):
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install the required packages:
pip install -r requirements.txt
Ensure that your dataset is in the appropriate format for training and evaluation. The jet_processor.py script provides functions to preprocess raw data into a suitable format for model training. Additionally, concat_pickle.py can be used to concatenate multiple pickle files containing processed jet data.
The repository includes scripts for training different models:
-
XGBoost: Use
XGB_train.pyto train an XGBoost model. -
RetNet, LSTM, MLP: The
model_trainer.pyscript facilitates training these models. Model architectures are defined inmodels.py.
Configure hyperparameters and training settings within the respective scripts before execution.
After training, evaluate model performance using the histograms.ipynb Jupyter Notebook, which provides tools to visualize metrics such as accuracy, precision, recall, and ROC curves.
-
Preprocess the data:
python jet_processor.py --input data/raw_data.csv --output data/processed_data.pkl
-
Train a model (e.g., XGBoost):
python XGB_train.py --data data/processed_data.pkl --model_output models/xgb_model.pkl
-
Evaluate the model:
Open
histograms.ipynbin Jupyter Notebook and follow the instructions to load the trained model and visualize performance metrics.
Contributions are welcome! Please fork the repository and create a pull request with your enhancements. Ensure that your code adheres to the existing style and includes appropriate tests.
This project is licensed under the MIT License. See the LICENSE file for details.
For more information on b-tagging and its significance in particle physics, refer to the B-tagging Wikipedia page.