The following folders and scripts are useful.
| Folder / Script | Description |
|---|---|
| crab_setup/ | Processing NanoAOD files using RDataFrame-based framework. Follow the instructions in readme.txt inside. |
| input/ | Input sample list from CRAB job outputs in .py or .json format, organized by campaign. |
createInputSample.py |
Automates creation of input sample configs from CRAB output for running the analysis. |
createJson.py |
Automates creation of JSON files from DAS dataset strings using dasgoclient. |
database.py, database.yaml |
Sample database in Python or YAML format containing sample and dataset metadata, including DAS strings. |
printDatabase.py |
Prints the database.yaml file in a nice, colorful format on the terminal. |
nanoRDF.py |
Main NanoAOD analysis script using RDataFrame. |
submitCondor.py |
Condor submission script for launching jobs on the CERN Condor system. |
crab output path: /eos/user/a/alaha/nanoRDFjobs/
The database is a YAML file that consists of campaign and sample information, with a few additional pieces of information required for an analysis. Typically you need to update this file once new samples or campaigns are needed or released.
Follow the structure to create the sample database.
DYM50_AMC:
- dasno: 1010
samplename: DYto2L-M50-amc
das:
- Run3Summer22: /DYto2L-2Jets_MLL-50_TuneCP5_13p6TeV_amcatnloFXFX-pythia8/Run3Summer22NanoAODv12-130X_mcRun3_2022_realistic_v5-v2/NANOAODSIM
- Run3Summer22EE: None
- dasno: 1011
samplename: DYto2L-M50-amc_ext
das:
- Run3Summer22: /DYto2L-2Jets_MLL-50_TuneCP5_13p6TeV_amcatnloFXFX-pythia8/Run3Summer22NanoAODv12-130X_mcRun3_2022_realistic_v5_ext1-v1/NANOAODSIM
- Run3Summer22EE: None
Print the database in a nice format. Use it to extract information about the database.
python3 printDatabase.pyOutput of printDatabase.py on terminal:
No DASno Group Sample Campaign events/files/size DAS
1 1010 DYM50_AMC DYto2L-M50-amc Run3Summer22 65997137/115/90.5GB /DYto2L-2Jets_MLL-50_TuneCP5_13p6TeV_amcatnloFXFX-pythia8/Run3Summer22NanoAODv12-130X_mcRun3_2022_realistic_v5-v2/NANOAODSIM
2 1011 DYM50_AMC DYto2L-M50-amc_ext Run3Summer22 97500666/259/134.0GB /DYto2L-2Jets_MLL-50_TuneCP5_13p6TeV_amcatnloFXFX-pythia8/Run3Summer22NanoAODv12-130X_mcRun3_2022_realistic_v5_ext1-v1/NANOAODSIM
3 1020 TTto2L2Nu_POW TTto2L2Nu-pow Run3Summer22 23778148/67/54.3GB /TTto2L2Nu_TuneCP5_13p6TeV_powheg-pythia8/Run3Summer22NanoAODv12-130X_mcRun3_2022_realistic_v5-v2/NANOAODSIM
4 1030 TTtoLNu2Q_POW TTtoLNu2Q-pow Run3Summer22 76955324/716/180.0GB /TTtoLNu2Q_TuneCP5_13p6TeV_powheg-pythia8/Run3Summer22NanoAODv12-130X_mcRun3_2022_realistic_v5_ext1-v2/NANOAODSIM
5 1040 WtoLNu_AMC WtoLNu-2Jets_amc Run3Summer22 84739011/148/95.8GB /WtoLNu-2Jets_TuneCP5_13p6TeV_amcatnloFXFX-pythia8/Run3Summer22NanoAODv12-130X_mcRun3_2022_realistic_v5-v2/NANOAODSIM
6 1040 WZto3LNu_POW WZto3LNu-pow Run3Summer22 2776339/25/4.3GB /WZto3LNu_TuneCP5_13p6TeV_powheg-pythia8/Run3Summer22NanoAODv12-130X_mcRun3_2022_realistic_v5-v2/NANOAODSIM
7 1041 WZto3LNu_POW WZto3LNu-pow_ext Run3Summer22 8876662/84/13.7GB /WZto3LNu_TuneCP5_13p6TeV_powheg-pythia8/Run3Summer22NanoAODv12-130X_mcRun3_2022_realistic_v5_ext1-v3/NANOAODSIM
8 1050 WWto2L2Nu_POW WWto2L2Nu-pow Run3Summer22 6135192/40/9.6GB /WWto2L2Nu_TuneCP5_13p6TeV_powheg-pythia8/Run3Summer22NanoAODv12-130X_mcRun3_2022_realistic_v5-v2/NANOAODSIM
9 1061 ZZto4L_AMC ZZto4L-amc Run3Summer22 3554880/45/5.6GB /ZZto4L-1Jets_TuneCP5_13p6TeV_amcatnloFXFX-pythia8/Run3Summer22NanoAODv12-130X_mcRun3_2022_realistic_v5_ext1-v2/NANOAODSIM
10 1070 ZZto4L_POW ZZto4L-pow Run3Summer22 14629101/82/22.0GB /ZZto4L_TuneCP5_13p6TeV_powheg-pythia8/Run3Summer22NanoAODv12-130X_mcRun3_2022_realistic_v5-v2/NANOAODSIM
11 1071 ZZto4L_POW ZZto4L-pow_ext Run3Summer22 14458880/270/22.3GB /ZZto4L_TuneCP5_13p6TeV_powheg-pythia8/Run3Summer22NanoAODv12-130X_mcRun3_2022_realistic_v5_ext1-v2/NANOAODSIM
12 1080 ZZto2L2Nu_POW ZZto2L2Nu-pow Run3Summer22 14555802/118/20.7GB /ZZto2L2Nu_TuneCP5_13p6TeV_powheg-pythia8/Run3Summer22NanoAODv12-130X_mcRun3_2022_realistic_v5-v2/NANOAODSIM
13 1081 ZZto2L2Nu_POW ZZto2L2Nu-pow_ext Run3Summer22 16858024/277/24.4GB /ZZto2L2Nu_TuneCP5_13p6TeV_powheg-pythia8/Run3Summer22NanoAODv12-130X_mcRun3_2022_realistic_v5_ext1-v2/NANOAODSIM
14 100 SingleMuon Run3Summer22_EraC_SingleMuon Run3Summer22 20162441/35/16.9GB /SingleMuon/Run2022C-22Sep2023-v1/NANOAOD
15 110 Muon Run3Summer22_EraC_Muon Run3Summer22 138427345/124/113.7GB /Muon/Run2022C-22Sep2023-v1/NANOAOD
16 120 Muon Run3Summer22_EraD_Muon Run3Summer22 75468381/82/61.8GB /Muon/Run2022D-22Sep2023-v1/NANOAOD
dataset: {'DYM50_AMC', 'WWto2L2Nu_POW', 'TTto2L2Nu_POW', 'ZZto4L_POW', 'WtoLNu_AMC', 'SingleMuon', 'TTtoLNu2Q_POW', 'ZZto4L_AMC', 'Muon', 'ZZto2L2Nu_POW', 'WZto3LNu_POW'}
Cern condor jobs use this config file and submit jobs with the required information.
example file: input/Run3Summer22/DYto2L-M50-amc_Run3Summer22.py
samplegroup = 'DYM50_AMC'
samplename = 'DYto2L-M50-amc'
sampletype = 'MC'
dasno = 1010
campaign = 'Run3Summer22'
fileN = 58
files = [
'/eos/user/a/alaha/nanoRDFjobs/DYto2L-2Jets_MLL-50_TuneCP5_13p6TeV_amcatnloFXFX-pythia8/nanoRDF_DYto2L-M50-amc/250515_205342/0000/ntuple_skim_1.root',
'/eos/user/a/alaha/nanoRDFjobs/DYto2L-2Jets_MLL-50_TuneCP5_13p6TeV_amcatnloFXFX-pythia8/nanoRDF_DYto2L-M50-amc/250515_205342/0000/ntuple_skim_10.root',
'/eos/user/a/alaha/nanoRDFjobs/DYto2L-2Jets_MLL-50_TuneCP5_13p6TeV_amcatnloFXFX-pythia8/nanoRDF_DYto2L-M50-amc/250515_205342/0000/ntuple_skim_11.root',
'/eos/user/a/alaha/nanoRDFjobs/DYto2L-2Jets_MLL-50_TuneCP5_13p6TeV_amcatnloFXFX-pythia8/nanoRDF_DYto2L-M50-amc/250515_205342/0000/ntuple_skim_12.root',
'/eos/user/a/alaha/nanoRDFjobs/DYto2L-2Jets_MLL-50_TuneCP5_13p6TeV_amcatnloFXFX-pythia8/nanoRDF_DYto2L-M50-amc/250515_205342/0000/ntuple_skim_13.root',
.
.
.
]To create input sample configs
python3 createInputSample.py --crabdir /eos/user/a/alaha/nanoRDFjobs/ --campaign Run3Summer22 --database database.yaml --save TrueSetting up the environments: Add the following in your execution file to set all the required packages
source /cvmfs/sft.cern.ch/lcg/views/LCG_106/x86_64-el8-gcc11-opt/setup.shArguments of submitCondor.py macro are:
#Mandatory
parser.add_argument('--jobname' ,type=str,required=True ,default="condorjob" ,help='AnalysisName: Such as VLL2018_Mar1_v0')
parser.add_argument('--script' ,type=str,required=True ,help='Give path to your VLLAna.C directory')
parser.add_argument('--config' ,type=str,required=True ,help='Input sample config file')
#Optional
parser.add_argument('--treeN' ,type=int ,required=False,default=10000 ,help='Mention no of trees for simple test run')
parser.add_argument('--bunch' ,type=int ,required=False,default=1 ,help='No of root files per job')
parser.add_argument('--dryrun' ,type=bool,required=False ,help='Check before submitting jobs')
parser.add_argument('--cmsdir' ,type=str ,required=False ,help='Give path to your CMSSW_x/src/ directory')
parser.add_argument('--chainAdd' ,type=str ,required=False ,help='Chain any other config file to extend the filelist')To submit jobs on all input files:
python3 submitCondor.py --jobname test_condorjob --script nanoRDF.py --config input/Run3Summer22/DYto2L-M50-amc_Run3Summer22.pyTo submit jobs on a few input files:
python3 submitCondor.py --jobname test_condorjob --script nanoRDF.py --config input/Run3Summer22/DYto2L-M50-amc_Run3Summer22.py --treeN 5To submit jobs in chunks:
python3 submitCondor.py --jobname test_condorjob --script nanoRDF.py --config input/Run3Summer22/DYto2L-M50-amc_Run3Summer22.py --bunch 10In each condor job, 10 files will be processed (use --bunch to control this).
Add multiple additional config files using --chainAdd:
python3 submitCondor.py --jobname test_condorjob --script nanoRDF.py --config input/Run3Summer22/DYto2L-M50-amc_Run3Summer22.py --chainAdd input/Run3Summer22/DYto2L-M50-amc_ext_Run3Summer22.py,input/Run3Summer22/DYto2L-M50-amc_ext_ext_Run3Summer22.pyCheck parameters without submitting using --dryrun option:
python3 submitCondor.py --jobname test_condorjob --script nanoRDF.py --config input/Run3Summer22/DYto2L-M50-amc_Run3Summer22.py --chainAdd input/Run3Summer22/DYto2L-M50-amc_ext_Run3Summer22
.py --dryrun TrueinputConfig = importlib.import_module(f"{args.dataset.replace('/','.').split('.py')[0]}")
samplename = inputConfig.samplename # Unique sample name
sampletype = inputConfig.sampletype # Data or MC
dasno = inputConfig.dasno # dasno. A real number is associated with each sample to manipulate later (like PdgId). >1000: MC, <1000: Data
campaign = inputConfig.campaign # Campaigns, Run3Summer22, Run3Summer22EE, Run3Summer23, Run3Summer23BPix, 2018, 2017, 2016preVFP, 2016postVFP
files = inputConfig.files # List of input files with full pathNB: Extra sample config files added using --chainAdd arguments have no influence on setting the parameters mentioned above. They will always be taken from --config. Only files will be extended from the --chainAdd arguments.
To execute the condor jobs, the following command is executed in each job:
>> python3 nanoRDF.py $1 $2 $3 $4 $5
>> argumentString = f"{inputFiles} {outputFileName} {dasno} {samplename} {campaign}"To create a bash file submitting condor jobs in bulk for all configs from Run3Summer22/ campaign, as an example:
ls input/Run3Summer22/*.py | awk '{printf "python3 submitCondor.py --jobname condorjob --script nanoRDF.py --config %s --bunch 10\n", $1}' > submit_all.shoutput:
python3 submitCondor.py --jobname condorjob --script nanoRDF.py --config input/Run3Summer22/DYto2L-M50-amc_Run3Summer22.py --bunch 10
python3 submitCondor.py --jobname condorjob --script nanoRDF.py --config input/Run3Summer22/DYto2L-M50-amc_ext_Run3Summer22.py --bunch 10
python3 submitCondor.py --jobname condorjob --script nanoRDF.py --config input/Run3Summer22/Run3Summer22_EraC_Muon_Run3Summer22.py --bunch 10
python3 submitCondor.py --jobname condorjob --script nanoRDF.py --config input/Run3Summer22/ZZto4L-pow_ext_Run3Summer22.py --bunch 10