Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: introduce Particle Transformer inputs for ULv2 #48

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 44 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

This is a [NanoAOD](https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookNanoAOD) framework for advanced developments of jet algorithms.

The repository consists of multiple branches which are each dedicated to specific releases of [CMSSW](https://github.com/cms-sw/cmssw). If you came here to run over Run3 samples, please checkout the most up-to-date 12_4_8 branch (e.g. from the dropdown menu above). The master branch you are viewing right now is optimized to run over Run2 samples, using the 106X release cycle.
The repository consists of multiple branches which are each dedicated to specific releases of [CMSSW](https://github.com/cms-sw/cmssw). If you came here to run over Run3 samples, please checkout the most up-to-date 12_4_8 branch (e.g. from the dropdown menu above). The branch you are viewing right now is optimized to run over Run2 UL samples, using the 106X release cycle.


The current full content of this development branch can be seen [here](https://annika-stein.web.cern.ch/PFNano/AddDeepJetTagInfo_desc.html) and the size [here](https://annika-stein.web.cern.ch/PFNano/AddDeepJetTagInfo_size.html).
Expand All @@ -13,32 +13,52 @@ This format can be used with [fastjet](http://fastjet.fr) directly.

**THIS IS A DEVELOPMENT BRANCH**

For **UL** 2016, 2017 and 2018 data and MC **NanoAODv8** according to the [XPOG](https://gitlab.cern.ch/cms-nanoAOD/nanoaod-doc/-/wikis/Releases/NanoAODv8) and [PPD](https://twiki.cern.ch/twiki/bin/view/CMS/PdmVRun2LegacyAnalysisSummaryTable) recommendations:
For **UL** 2016, 2017 and 2018 data and MC **NanoAODv8/v9** according to the [XPOG](https://gitlab.cern.ch/cms-nanoAOD/nanoaod-doc/-/wikis/home) and [PPD](https://twiki.cern.ch/twiki/bin/view/CMS/PdmVRun2LegacyAnalysisSummaryTable) recommendations:

### Prerequisites
If not already done in your `.bashrc` or similar:
```
cmsrel CMSSW_10_6_20 # in principle not a constraint
cd CMSSW_10_6_20/src
source /cvmfs/grid.desy.de/etc/profile.d/grid-ui-env.sh or /cvmfs/grid.cern.ch/centos7-umd4-ui-4_200423/etc/profile.d/setup-c7-ui-example.sh
source /cvmfs/cms.cern.ch/common/crab-setup.sh prod
source /cvmfs/cms.cern.ch/cmsset_default.sh
```
### Setup PFNano
```
cmsrel CMSSW_10_6_30
cd CMSSW_10_6_30/src
cmsenv
git cms-rebase-topic andrzejnovak:614nosort
git clone https://github.com/cms-jet/PFNano.git PhysicsTools/PFNano
git clone https://github.com/AnnikaStein/PFNano.git PhysicsTools/PFNano #change once it's officially included: git clone https://github.com/cms-jet/PFNano.git PhysicsTools/PFNano
cd PhysicsTools/PFNano
git fetch
git switch ParT_106X
cd ../..
scram b -j 10
cd PhysicsTools/PFNano/test
voms-proxy-init --voms cms:/cms/dcms --valid 192:00 # cms:/cms/dcms if you have a German grid certificate, prio to run at German sites; use cms only otherwise
```
Note: When running over a new dataset you should check with [the nanoAOD workbook twiki](https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookNanoAOD#Running_on_various_datasets_from) to see if the era modifiers in the CRAB configuration files are correct. The jet correction versions are taken from the global tag.

## Local Usage:

There are python config files ready to run in `PhysicsTools/PFNano/test/` for the UL campaign of nanoAODv8, named `nano106Xv8_on_mini106X_201*_data_NANO.py`. Notice that the current version can create different types of files depending on the PF candidates content.
There are python config files ready to run in `PhysicsTools/PFNano/test/` for the UL campaign of nanoAODv8(v9), named `nano106Xv8_on_mini106X_201*_data_NANO.py` (`nano_data_2017_ULv2_allPF_ParT_NANO.py`, similar for MC). Notice that the current version can create different types of files depending on the PF candidates content. Run a configuration via `cmsRun nano_data_2017_ULv2_allPF_ParT_NANO.py` to test on one file locally. If that works, configure your crab submission (more below).

### Different use cases:

New since Pull Request [#39](https://github.com/cms-jet/PFNano/pull/39): Examples to include or exclude the input features for the DeepJet tagger are given in `nano106Xv8_on_mini106X_2017_mc_NANO.py`. Now the list of options that are currently implemented inside `pfnano_cff.py` (e.g. for MC) looks like that:
New since December 2022: Particle Transformer inputs. Particle Transformer variables can be added by specifying one of the suiting customizations `*add_DeepJet_ParT*`.

Now the list of options that are currently implemented inside `pfnano_cff.py` (e.g. for MC) looks like that:
```
process = PFnano_customizeMC(process)
#process = PFnano_customizeMC_add_DeepJet(process) ##### DeepJet inputs are added to the Jet collection
#process = PFnano_customizeMC_allPF(process) ##### PFcands will content ALL the PF Cands
#process = PFnano_customizeMC_allPF_add_DeepJet(process) ##### PFcands will content ALL the PF Cands; + DeepJet inputs for Jets
#process = PFnano_customizeMC_AK4JetsOnly(process) ##### PFcands will content only the AK4 jets PF cands
#process = PFnano_customizeMC_AK4JetsOnly_add_DeepJet(process) ##### PFcands will content only the AK4 jets PF cands; + DeepJet inputs for Jets
#process = PFnano_customizeMC_AK8JetsOnly(process) ##### PFcands will content only the AK8 jets PF cands
#process = PFnano_customizeMC_add_DeepJet_and_Truth(process) ##### DeepJet inputs as well as a truth branch with fine-grained labels
#process = PFnano_customizeMC_add_DeepJet_ParT_and_Truth(process) ##### DeepJet & ParT inputs as well as a truth branch with fine-grained labels
#process = PFnano_customizeMC_allPF(process) ##### PFcands will contain ALL the PF Cands
#process = PFnano_customizeMC_allPF_add_DeepJet(process) ##### PFcands will contain ALL the PF Cands; + DeepJet inputs for Jets
#process = PFnano_customizeMC_allPF_add_DeepJet_and_Truth(process) ##### PFcands will contain ALL the PF Cands; + DeepJet inputs + truth labels for Jets
#process = PFnano_customizeMC_allPF_add_DeepJet_ParT_and_Truth(process) ##### PFcands will contain ALL the PF Cands; + DeepJet & ParT inputs + truth labels for Jets
#process = PFnano_customizeMC_AK4JetsOnly(process) ##### PFcands will contain only the AK4 jets PF cands
#process = PFnano_customizeMC_AK4JetsOnly_add_DeepJet(process) ##### PFcands will contain only the AK4 jets PF cands; + DeepJet inputs for Jets
#process = PFnano_customizeMC_AK8JetsOnly(process) ##### PFcands will contain only the AK8 jets PF cands
#process = PFnano_customizeMC_noInputs(process) ##### No PFcands but all the other content is available.
```
In general, whenever `_add_DeepJet` is specified (does not apply to `AK8JetsOnly` and `noInputs`), the DeepJet inputs are added to the Jet collection. For all other cases that involve adding tagger inputs, only DeepCSV and / or DDX are taken into account as default (= the old behaviour when `keepInputs=True`). Internally, this is handled by selecting a list of taggers, namely choosing from `DeepCSV`, `DeepJet`, and `DDX` (or an empty list for the `noInputs`-case, formerly done by setting `keepInputs=False`, now set `keepInputs=[]`). This refers to a change of the logic inside `pfnano_cff.py` and `addBTV.py`. If one wants to use this new flexibility, one can also define new customization functions with other combinations of taggers. Currently, there are all configurations to reproduce the ones that were available previously, and all configuations that extend the old ones by adding DeepJet inputs. DeepJet outputs, on top of the discriminators already present in NanoAOD, are added in any case where AK4Jets are added, i.e. there is no need to require the full set of inputs to get the individual output nodes / probabilities. The updated description using `PFnano_customizeMC_add_DeepJet` can be viewed [here](https://annika-stein.web.cern.ch/PFNano/AddDeepJetTagInfo_desc.html) and the size [here](https://annika-stein.web.cern.ch/PFNano/AddDeepJetTagInfo_size.html).
Expand All @@ -47,10 +67,12 @@ The latest addition before moving to the Run3 recipe was the inclusion of a fine

### How to create python files using cmsDriver

(You can skip this step, if the existing configurations are sufficient for your use case.)

All python config files were produced with `cmsDriver.py`.

Two imporant parameters that one needs to verify in the central nanoAOD documentation are `--conditions` and `--era`.
- `--era` options from [WorkBookNanoAOD](https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookNanoAOD) or [XPOG](https://gitlab.cern.ch/cms-nanoAOD/nanoaod-doc/-/wikis/Releases/NanoAODv8)
- `--era` options from [WorkBookNanoAOD](https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookNanoAOD) or [XPOG](https://gitlab.cern.ch/cms-nanoAOD/nanoaod-doc/-/wikis/home)
- `--conditions` can be found here [PdMV](https://twiki.cern.ch/twiki/bin/view/CMS/PdmV)

Pre UL `cmsRun` python config files are generated by running `make_configs_preUL.sh`
Expand Down Expand Up @@ -79,14 +101,14 @@ submission yaml card `card_example.yml` are provided.
python crabby.py -c card.yml --make --submit
```
- `--make` and `--submit` calls are independent, allowing manual inspection of submit configs
- Add `--test` to disable publication on otherwise publishable config and produce a single file per dataset
- Add `--test True` to disable publication on otherwise publishable config and produce a single file per dataset

<details>
<summary>If experiencing problems with crab submission using the above instructions, e.g. on NAF-DESY</summary>


```
source /cvmfs/grid.cern.ch/centos7-umd4-ui-4_200423/etc/profile.d/setup-c7-ui-example.sh
source /cvmfs/grid.desy.de/etc/profile.d/grid-ui-env.sh or /cvmfs/grid.cern.ch/centos7-umd4-ui-4_200423/etc/profile.d/setup-c7-ui-example.sh
source /cvmfs/cms.cern.ch/common/crab-setup.sh prod
source /cvmfs/cms.cern.ch/cmsset_default.sh
< navigate to CMSSW_X_Y_Z/src >
Expand All @@ -101,11 +123,13 @@ submission yaml card `card_example.yml` are provided.
<details>
<summary>Useful commands to get paths to individual processed files</summary>

This is to get a list of files stored at the respective site.
```
xrdfs [insert redirector to site] ls /store/path/to/your/crab/output/serialnumber > filelist.txt
( if there is more than one serial number (more than 1k files processed) repeat command but append to textfile using >> instead of > )
( clean textfile for log entries )
( then append the redirector (needs modification by you for specific site) using this helper )
# (example for T2_DE_RWTH: redirector would be grid-cms-xrootd.physik.rwth-aachen.de)
# ( if there is more than one serial number (more than 1k files processed) repeat command but append to textfile using >> instead of > )
# ( clean textfile for log entries )
# ( then append the redirector (needs modification by you for specific site) using this helper: )
python dataset_paths.py name_of_txt_file T2_DE_RWTH
```

Expand Down Expand Up @@ -144,9 +168,9 @@ When processing data, a lumi mask should be applied. The so called golden JSON s

* Golden JSON, UL
```
# 2ß16: /afs/cern.ch/cms/CAF/CMSCOMM/COMM_DQM/certification/Collisions16/13TeV/Legacy_2016/Cert_271036-284044_13TeV_Legacy2016_Collisions16_JSON.txt
# 2017: /afs/cern.ch/cms/CAF/CMSCOMM/COMM_DQM/certification/Collisions17/13TeV/Legacy_2017/Cert_294927-306462_13TeV_UL2017_Collisions17_GoldenJSON.txt
# 2018: /afs/cern.ch/cms/CAF/CMSCOMM/COMM_DQM/certification/Collisions18/13TeV/Legacy_2018/Cert_314472-325175_13TeV_Legacy2018_Collisions18_JSON.txt
#
```

* Golden JSON, pre-UL
Expand Down
Loading