Search for two boosted (high transverse momentum) Higgs bosons (H) decaying to two beauty quarks (b) and two tau leptons.
- HHbbtautau
First, create a virtual environment (micromamba is recommended):
# Clone the repository
git clone --recursive https://github.com/LPC-HH/bbtautau.git
cd bbtautau
# Download the micromamba setup script (change if needed for your machine https://mamba.readthedocs.io/en/latest/installation/micromamba-installation.html)
# Install: (the micromamba directory can end up taking O(1-10GB) so make sure the directory you're using allows that quota)
"${SHELL}" <(curl -L micro.mamba.pm/install.sh)
# You may need to restart your shell
micromamba env create -f environment.yaml
micromamba activate hhRemember to install this in your mamba environment.
# Clone the repsitory as above if you haven't already
# Perform an editable installation
pip install -e .
# for committing to the repository
pip install pre-commit
pre-commit install
# Install as well the common HH utilities
cd boostedhh
pip install -e .
cd ..-
If your default
pythonin your environment is not Python 3, make sure to usepip3andpython3commands instead. -
You may also need to upgrade
pipto perform the editable installation:
python3 -m pip install -e .For submitting to condor, all you need is python >= 3.7.
For running locally, follow the same virtual environment setup instructions above and activate the environment.
micromamba activate hhClone the repository:
git clone https://github.com/LPC-HH/bbtautau/
pip install -e .
For testing, e.g.:
python src/run.py --samples HHbbtt --subsamples GluGlutoHHto2B2Tau_kl-1p00_kt-1p00_c2-0p00 --starti 0 --endi 1 --year 2022 --processor skimmerA single sample / subsample:
python src/condor/submit.py --analysis bbtautau --git-branch BRANCH-NAME --site ucsd --save-sites ucsd lpc --processor skimmer --samples HHbbtt --subsamples GluGlutoHHto2B2Tau_kl-1p00_kt-1p00_c2-0p00 --files-per-job 5 --tag 24Nov7Signal [--submit]Or from a YAML:
python src/condor/submit.py --yaml src/condor/submit_configs/25Apr5All.yaml --analysis bbtautau --git-branch addmc --site lpc --save-sites ucsd lpc --processor skimmer --tag 25Apr5AddVars --year 2022 [--submit]e.g.
python boostedhh/condor/check_jobs.py --analysis bbtautau --tag 25Apr24_v12_private_signal --processor skimmer --check-running --year 2022EETrigger efficiency studies can be performed using the src/bbtautau/postprocessing/TriggerStudy.py script. The main execution logic is within the if __name__ == "__main__" block, where you can configure the years and signal samples to process.
The script will:
- Load the specified signal samples.
- Define trigger sets and tagger configurations.
- Calculate and plot trigger efficiencies for different channels (
hh,hm,he). - Generate N-1 efficiency tables to study the impact of individual triggers.
To run the study, configure the desired years and SIGNALS inside the script and then execute it:
python src/bbtautau/postprocessing/TriggerStudy.pyOutput plots and tables will be saved in the plots/TriggerStudy/ directory.
python SensitivityStudy.py --actions compute_rocs plot_mass sensitivity --years 2022 2023 --channels hh hmArguments
--years (list, default: 2022 2022EE 2023 2023BPix): List of years to include in the analysis.
--channels (list, default: hh hm he): List of channels to run (default: all).
--test-mode (flag, default: False): Run in test mode (reduced data size).
--use-bdt (flag, default: False): Use BDT model for sensitivity study.
--modelname (str, default: 28May25_baseline): Name of the BDT model to use.
--at-inference (flag, default: False): Compute BDT predictions at inference time.
--actions (list, required): Actions to perform. Choose one or more: compute_rocs, plot_mass, sensitivity, time-methods.
Example Commands
Run an optimization analysis for all years and all channels, with the GloParT tautau tagger:
python SensitivityStudy.py --actions sensitivity
Run a full analysis for all years and all channels, using the BDT for the tautau jet:
python SensitivityStudy.py --actions compute_rocs plot_mass sensitivity
Run only on selected years/channels in test mode:
--test-mode will reduce the data loading time significantly. Practical for testing.
python SensitivityStudy.py --actions sensitivity --years 2022 --channels hh --test-mode
Notes:
- by default uses ABCD background estimation method, and FOM =
$\sqrt{b+\sigma_b}/s$ - by default uses parallel thread data loading and optimization
@Billy - convert into script and add instructions here
src/bbtautau/postprocessing/bdt.py trains, evaluates, compares, and studies multiclass BDT models. Model settings are normally looked up by name under src/bbtautau/postprocessing/bdt_configs/, but training can also be driven by an explicit config file via --config-file.
Basic invocation:
python src/bbtautau/postprocessing/bdt.py [mode] --model <modelname>or, for explicit config injection:
python src/bbtautau/postprocessing/bdt.py [mode] --config-file path/to/config_my_model.pyImportant flags:
--years: years to use, e.g.--years 2022 2023or--years all--model: model configuration name to load frombdt_configs/--config-file: explicit Python config file definingCONFIG; if passed together with--model, the two model names must match--output-dir: output directory for trained models, plots, comparisons, or prediction files--data-path: input data directory key/path--force-reload: force reloading the input data--train/--load: train a new model or load an existing one--study-rescaling: run the balance/rescaling study workflow--eval-bdt-preds: write BDT predictions for selected samples--compare-models: compare trained models using ROC overlays and CSV outputs--compare-light: compare trained models using only existingmetrics_summary.csvfiles--inputs: paths to model JSON files and/or directories for comparison modes--samples: samples to evaluate with--eval-bdt-preds
One of --model or --config-file must be provided.
Example: train a new model by name
python src/bbtautau/postprocessing/bdt.py \
--train \
--years all \
--model 26Mar26_optimizedExample: train from an explicit config file
python src/bbtautau/postprocessing/bdt.py \
--train \
--years all \
--config-file src/bbtautau/postprocessing/bdt_configs/standard/config_26Mar26_optimized.pyEvaluate predictions
python src/bbtautau/postprocessing/bdt.py \
--eval-bdt-preds \
--years 2022 \
--samples dyjets qcd ttbarhad ttbarll ttbarsl \
--model 26Mar26_optimized \
--output-dir /writable/outputThis writes BDT_predictions/<year>/<sample>/<model>_preds.npy under --output-dir (or the default DATA_DIR).
Compare multiple trained models
python src/bbtautau/postprocessing/bdt.py \
--compare-models \
--years 2022 \
--inputs \
/bbtautauvol/bdt/training/no_presel/model_a \
/bbtautauvol/bdt/training/no_presel/model_b \
--output-dir comparison_outThis produces:
- Overlay ROC plots per signal in
comparison_out/rocs/ - A consolidated CSV
comparison_out/comparison_metrics.csv - An index JSON
comparison_out/comparison_index.json
Notes:
- Headless/containers: plotting uses a non-interactive backend (
Agg), so no display server is needed. - The model config determines the signal setup (
signalsfield); that is no longer configured via a separate CLI flag. - If Python cannot resolve internal modules, run from the repo root inside the project environment.
Use src/bbtautau/kubernetes/jobs/make_from_template.py to generate Kubernetes job YAMLs for training, comparison, lightweight comparison, or rescaling studies. It fills one of the templates in src/bbtautau/kubernetes/jobs/ and writes YAMLs under src/bbtautau/kubernetes/bdt_trainings/<job_type>/<presel>/<tag>/<job_name>.yml.
Key flags:
--modelname: training or rescaling model name--config-file: optional local config file to embed into a training job and pass tobdt.py --config-file--compare-models: switch to comparison mode (usestemplate_compare.yaml)--compare-light: switch to lightweight metrics-only comparison mode--study-rescaling: switch to rescaling-study mode--inputs: model JSON files and/or directories to compare--years: years to use for training/comparison (space-separated)--samples: samples to use in comparison/evaluation--datapath: data subdirectory on the PVC (joined to/bbtautauvol)--train-args: extra CLI args forwarded tobdt.py(quote this string)--tt-preselection: append flag intotrain-args--job-name: override auto-generated name (auto-generated names are lowercased)--tag: grouping tag used in the output path--overwrite: allow overwriting an existing YAML--submit: immediatelykubectl create -f <yaml>in namespacecms-ml--from-json: load all args from a JSON file (keys match the CLI flags)
Training mode example:
python src/bbtautau/kubernetes/jobs/make_from_template.py \
--modelname 26Mar26_optimized \
--config-file src/bbtautau/postprocessing/bdt_configs/standard/config_26Mar26_optimized.py \
--tag kfold5 \
--datapath 26Mar5All_v12_private_signal \
--train-args "--years 2022 2023" \
--submitThis writes a training job YAML under src/bbtautau/kubernetes/bdt_trainings/training/no_presel/kfold5/ and submits it. When --config-file is provided, the config is embedded in the generated job and passed to bdt.py inside the pod, so the config does not need to exist in the cloned repo checkout.
Comparison mode example:
python src/bbtautau/kubernetes/jobs/make_from_template.py \
--compare-models \
--inputs \
training/no_presel/model_a \
training/no_presel/model_b \
--compare-tag july_vs_aug \
--job-name cmp_july_aug_nopresel \
--submitThe script auto-generates job_name when not provided:
- Training: based on
modelname - Comparison/light comparison: based on the compared input names or
compare_tag - Rescaling:
rescaling_<modelname>
Generated YAMLs are grouped by job type and preselection state:
training/no_presel/<tag>/...training/tt_presel/<tag>/...comparisons/no_presel/<tag>/...rescaling/no_presel/<tag>/...
You can also place all arguments in a JSON file and run:
python src/bbtautau/kubernetes/jobs/make_from_template.py --from-json my_job.json --submitWhere my_job.json can contain fields matching the CLI flags, such as modelname, config_file, inputs, compare_models, compare_light, study_rescaling, years, tag, samples, datapath, and train_args.
These are made using the postprocessing/postprocessing.py script with the --templates option.
See postprocessing/bash_scripts/MakeTemplates.sh for an example.
Foreword: when dealing with multiple signals and signal regions:
- to specify one or more signal processes to be included in the cards (e.g. ggf + SM vbf or just BSM vbf), specify the argument `--sigs [ggfbbtt, vbfbbtt, vbfbbttk2v0]
- to specify the strategy according to what we do in the
SensitivityStudy.pystep, i.e. using one signal region per channel (ggf) or using two regions per channel (ggf and vbf), we use the--do-vbfargument inrun_blinded_bbtt.shwhen running combine. These past two items are independent: with either strategy, one can choose the signal samples to consider freely. (One should clearly not mix SM with BSM samples in the cards.)
Warning: this should be done outside of your conda/mamba environment!
source /cvmfs/cms.cern.ch/cmsset_default.sh
cmsrel CMSSW_14_1_0_pre4
cd CMSSW_14_1_0_pre4/src
cmsenv
scram-venv
cmsenv
git clone -b v10.1.0 https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit.git HiggsAnalysis/CombinedLimit
git clone -b v3.0.0-pre1 https://github.com/cms-analysis/CombineHarvester.git CombineHarvester
# Important: this scram has to be run from src dir
scramv1 b clean; scramv1 b
pip3 install --upgrade rhalphalibThen, install this repo as well:
```bash
cd /path/to/your/local/bbtautau/repo
pip3 install -e .After activating the above CMSSW environment (go inside the CMSSW folder and do cmsenv), you can use the CreateDatacard.py script as so (from your src/bbtautau folder):
python3 postprocessing/CreateDatacard.py --sigs ggfbbtt --templates-dir postprocessing/templates/25Apr25LudoCuts --model-name 25Apr25PassFixBy default, this will create datacards for all three channels summed across years in the cards/model-name directory.
As always, do the following to see a full list of options.
python3 postprocessing/CreateDatacard.py --helpAll combine commands while blinded can be run via the src/bbtautau/combine/run_blinded_bbtt.sh script.
e.g. (always from inside the cards folders), this will combine the cards, create a workspace, do a background-only fit, and calculate expected limits:
run_blinded_bbtt.sh --workspace --bfit --limitsAnother script, 'src/bbtautau/combine/run_blinded_bbtt_frzAllConstrainedNuisances.sh' can be used to fit with all constrained nuisances frozen.
See more comments inside the file.
I also add this to my .bashrc for convenience:
export PATH="$PATH:/home/user/rkansal/bbtautau/src/bbtautau/combine"
Run the following to run FitDiagnostics and save FitShapes:
run_blinded_bbtt.sh --workspace --dfitThen see postprocessing/PlotFits.ipynb for plotting. TODO: convert into script!
Set up Rucio following the Twiki. Then:
rucio add-rule cms:/Tau/Run2022F-22Sep2023-v1/MINIAOD 1 T1_US_FNAL_Disk --activity "User AutoApprove" --lifetime 15552000 --ask-approval