Skip to content

Commit 1d403b3

Browse files
authored
Add 20260511 - Icarus Production Meeting
1 parent 6c67ed7 commit 1d403b3

9 files changed

Lines changed: 181 additions & 0 deletions

File tree

Lines changed: 181 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,181 @@
1+
## mag 11, 2026 09:00 GMT-5 | ICARUS Production Meeting
2+
3+
### Attendees
4+
5+
Alessandro Maria Ricci, Daniel Carber, Tracy Usher, Giuseppe Cerati, Vito Di Benedetto, Promita Roy
6+
7+
### Monitoring resource usage
8+
9+
| User Grid Usage History of the *Running Jobs by User* for the last 7 days: [link](https://fifemon.fnal.gov/monitor/d/000000053/experiment-batch-details?orgId=1&viewPanel=9&from=now-7d&to=now&var-experiment=icarus&var-pool=dune-global&var-pool=fifebatch) ![](images/image1.png) | User Job Efficiency History of the User Job Efficiency for the last 7 days: [link](https://fifemon.fnal.gov/monitor/d/000000022/experiment-efficiency-details?from=now-7d&to=now&var-experiment=icarus&var-pool=dune-global&var-pool=fifebatch&orgId=1&viewPanel=2) ![](images/image2.png) |
10+
| ----- | ----- |
11+
| **Icaruspro Jobs Exit Code** History of the icaruspro job exit code for the last 7 days: [link](https://landscape.fnal.gov/kibana/app/kibana#/dashboard/ba047b90-b8ca-11e7-989a-91951b87e80a?_g=\(refreshInterval:\(pause:!t,value:0\),time:\(from:now-4d,mode:relative,to:now\)\)&_a=\(description:'View%20jobs%20exit%20code,%20where%20they%20ran,%20and%20logs',filters:!\(\('$state':\(store:appState\),meta:\(alias:!n,disabled:!f,index:'fifebatch-history-*',key:pool,negate:!f,params:\(query:fifebatch,type:phrase\),type:phrase,value:fifebatch\),query:\(match:\(pool:\(query:fifebatch,type:phrase\)\)\)\),\('$state':\(store:appState\),meta:\(alias:!n,disabled:!f,index:'fifebatch-history-*',key:User,negate:!f,params:\(query:'icaruspro@fnal.gov',type:phrase\),type:phrase,value:'icaruspro@fnal.gov'\),query:\(match:\(User:\(query:'icaruspro@fnal.gov',type:phrase\)\)\)\),\('$state':\(store:appState\),meta:\(alias:!n,disabled:!f,index:'fifebatch-history-*',key:Jobsub_Group,negate:!f,params:\(query:icarus,type:phrase\),type:phrase,value:icarus\),query:\(match:\(Jobsub_Group:\(query:icarus,type:phrase\)\)\)\)\),fullScreenMode:!f,options:\(darkTheme:!f\),panels:!\(\(embeddableConfig:\(vis:\(colors:\(Cancelled:%23967302,Fail:%23BF1B00,Success:%23629E51\),legendOpen:!t\)\),gridData:\(h:15,i:'1',w:40,x:0,y:0\),id:'2f40f420-b8ca-11e7-989a-91951b87e80a',panelIndex:'1',type:visualization,version:'6.8.23'\),\(gridData:\(h:10,i:'2',w:24,x:24,y:15\),id:'569cca30-b8ca-11e7-989a-91951b87e80a',panelIndex:'2',type:visualization,version:'6.8.23'\),\(gridData:\(h:10,i:'3',w:24,x:0,y:15\),id:'65759a00-b8ca-11e7-989a-91951b87e80a',panelIndex:'3',type:visualization,version:'6.8.23'\),\(embeddableConfig:\(columns:!\(JobsubJobId,Owner,ExitCode,ExitSignal,MATCH_GLIDEIN_Site,MachineAttrMachine0,stdout,stderr\),sort:!\('@timestamp',desc\)\),gridData:\(h:30,i:'4',w:48,x:0,y:25\),id:'7e94c3c0-b8cb-11e7-989a-91951b87e80a',panelIndex:'4',type:search,version:'6.8.23'\),\(gridData:\(h:15,i:'5',w:8,x:40,y:0\),id:AWZpvkXbLj3wKbt0N_Vp,panelIndex:'5',type:visualization,version:'6.8.23'\)\),query:\(language:lucene,query:\(match_all:\(\)\)\),timeRestore:!f,title:'Fifebatch%20History',viewMode:view\)) | **SBN Data Pools** [link](https://fifemon.fnal.gov/monitor/d/rflbgV-iz/dcache-by-poolgroup?orgId=1&var-PoolGroup=SbnData2Pools&from=now-3h&to=now&refresh=5m) |
12+
| ![](images/image3.png) | ![](images/image4.png) |
13+
| Dcache Persistent Usage per user Total is 114 TiB: [link](https://fifemon.fnal.gov/monitor/d/000000175/dcache-persistent-usage-by-vo?orgId=1&var-VO=icarus), Used space: 96.6 TiB (87.2%) | |
14+
| ![](images/image5.png) | |
15+
16+
### **Production requests**
17+
18+
| 2025 Ongoing/Pending Production Requests |
19+
| ----- |
20+
| ![](images/image6.png) |
21+
| **2026 Ongoing/Pending Production Requests** |
22+
| ![](images/image7.png) |
23+
24+
Link to [spreadsheet](https://docs.google.com/spreadsheets/d/1ffBp475tEzlRilFs7xLhbevSZHjsuk1Dm5FGFIPWsFM/edit?gid=1567393491#gid=1567393491)
25+
Link to [github project](https://github.com/orgs/SBNSoftware/projects/49)
26+
27+
###
28+
29+
POMS active campaigns [here](https://pomsgpvm02.fnal.gov/poms/show_campaigns/icarus/production)
30+
31+
### Notes
32+
33+
*
34+
35+
### Requests
36+
37+
* Assigned:
38+
* Request \#86 \[Manuel\]:
39+
1. See \#75. stage0 \-\> 100% complete
40+
2. Stage1\_caf stopped \-\> 100% complete
41+
3. Used g4 wrong FHiCL.
42+
4. Larcv saved in SBNDataPool: 12 TB, the rest in scratch
43+
5. 6% larcv lost
44+
6. TRANSFERRING TO POLARIS
45+
46+
* Request \#123 \[Fatima\]:
47+
1. 97% is complete
48+
49+
* Request \#6 \[Manuel\]:
50+
1. Reprocess 68% of the larcv
51+
2. TRANSFERRING TO POLARIS
52+
53+
54+
55+
* Request \#8 \[Alessandro\]: Perlmutter and FermiGrid
56+
1. Stage0: 74% complete
57+
2. Larcv: 56% complete
58+
3. TRANSFERRING in S3DF
59+
60+
61+
* Request \#17 \[Promita\]: 90% complete
62+
63+
64+
* Request \#32-33 \[Fatima\]:
65+
1. 32: running
66+
2. 33: submitted
67+
68+
69+
* Request \#38 \[Manuel\]: testing
70+
71+
72+
73+
* Request \#47 \[Thomas\]: Aurora
74+
75+
* Request \#49 \[Alessandro\]: started
76+
1. Larcv TRANSFERRED TO POLARIS
77+
78+
* Request \#50 \[Antonio\]: test complete
79+
80+
### Action Items and Open issue
81+
82+
* Link to [action items](https://github.com/orgs/SBNSoftware/projects/32)
83+
84+
* **Storage:** 438 TiB free on SBNDataPool.
85+
86+
* \[Matheus/Giuseppe\] SBND is using some space in SBNDataPool. Some SBND datasets can be deleted \-\> still 6 TiB can be recovered. Totally, we recovered **22 TiB**.
87+
88+
* \[Vito/Antonio\] **Transfer of Run2 compressed files to Tape** **(420 TB), some TBs in DataPool2 as well** 100% complete \-\> Deleting on disk ongoing
89+
The transfer to tape has been split by data stream, the selection was based on origin path, we can update the config to delete the BNB data streams selectively, we have
90+
run2\_compressed\_bnbmajority\_SBNDATA \-\> DELETED run2\_compressed\_bnbmajority\_SBNDATA2 \-\> DELETED
91+
run2\_compressed\_bnbminbias\_SBNDATA \-\> DELETED
92+
run2\_compressed\_bnbminbias\_SBNDATA2 \-\> DELETED
93+
run2\_compressed\_offbeambnbmajority\_SBNDATA \-\> DELETED
94+
run2\_compressed\_offbeambnbmajority\_SBNDATA2 \-\> DELETED
95+
run2\_compressed\_offbeambnbminbias\_SBNDATA \-\> DELETED
96+
run2\_compressed\_offbeambnbminbias\_SBNDATA2 \-\> DELETED
97+
SBNDATA/SBNDATA2 suffix is to select files from one of SBNDataPools/SBNData2Pools
98+
**Keep a subset of bnbmajority compressed raw data (run 9435\)**
99+
(25 files present a mismatch between tape and disk version, they have not been deleted)
100+
101+
* \[Alessandro\] Transfer of stage1 run2 to tape:
102+
* Icaruspro\_2024\_Run2\_production\_Reproc\_Run2\_v09\_89\_01\_01p03\_bnbmajority\_stage1 (90 TB) \-\> COMPLETED
103+
* Icaruspro\_2024\_Run2\_production\_Reproc\_Run2\_v09\_89\_01\_01p03\_offbeambnbmajority\_stage1 (70 TB) \-\> COMPLETED
104+
* icaruspro\_production\_v09\_89\_01\_01\_2024A\_ICARUS\_MC\_Sys\_NuCos\_2024A\_MC\_Sys\_NuCos\_CV\_2ndV\_stage1 (51 TB) \-\> COMPLETED
105+
106+
### CNAF
107+
108+
* **RUN3 Processing**:
109+
**Valerio and his team:** they have processed 100% of on- and off-beam, both bnbmajority and bnbminbias. Now the Italian team is processing the Calibration. Then, stage1 and caf will be reprocessed. **CNAF is full at 99%. Calibration is ongoing.**
110+
111+
* STORAGE:
112+
113+
* Production:
114+
115+
\=====================================================
116+
\== /storage/gpfs\_data/icarus/plain/data
117+
\=====================================================
118+
test : 0.112 TB
119+
mc-v10\_06\_00\_01p01-202603-cnaf-numi-nue-disap\_test : 0.000 TB
120+
mc\_from\_list\_test : 0.005 TB
121+
mc-v10\_06\_00\_01p01-20260409-cnaf-dnu-test\_standard : 0.000 TB
122+
\-v10\_06\_00\_01p01-202603-cnaf-numi-nue-disap-cv-testvar : 0.000 TB
123+
\-v10\_06\_00\_01p01-202603-cnaf-numi-nue-disap-cv-nueonly : 0.000 TB
124+
mc-v10\_06\_00\_01p01-202603-cnaf-numi-nue-disap\_variations : 0.000 TB
125+
\-v10\_06\_00\_01p01-202603-cnaf-numi-nue-disap-cv : 0.000 TB
126+
mc-v10\_06\_00\_01p01-202603-cnaf-numi-nue-disap-cv-nueonly : 0.000 TB
127+
mc-v10\_06\_00\_01p01-20260409-cnaf-dnu\_m100 : 0.000 TB
128+
\-processing-cnaf-1025-v10\_06\_00\_04p03 : 0.000 TB
129+
mc-v10\_06\_00\_01p01-202603-cnaf-numi-nue-disap-cv : 0.107 TB
130+
mc : 174.763 TB
131+
mc-v0989-extendedCV-BNB : 1.050 TB
132+
prodcorsika\_proton\_intime\_icarus\_bnb\_sce\_1d\_drift\_on\_MC-v09\_87\_00-042024-cnaf : 10.290 TB
133+
mc-v09\_84\_00\_01-202412-cnaf-corrsce : 2.814 TB
134+
mc-v10\_06\_00\_01p01-202603-cnaf-numi-nue-disap\_variations : 97.188 TB
135+
mc-v09\_84\_00\_01-202403-cnaf-corrsce : 3.020 TB
136+
mc-v10\_06\_00\_01p01-202603-cnaf-numi-nue-disap-cv-nueonly : 7.479 TB
137+
mc-v10\_06\_00\_01p01-202603-cnaf-numi-nue-disap-cv : 52.921 TB
138+
prod : 727.291 TB
139+
run2-v09\_84\_00\_01-202403-cnaf : 95.381 TB
140+
run2-v09\_72\_00\_06-202312-cnaf : 5.691 TB
141+
run2-v09\_83\_01-202402-cnaf : 0.000 TB
142+
run3-processing-cnaf-1025-v10\_06\_00\_04p03 : 611.608 TB
143+
run1-v09\_72\_00\_05p03-202311-cnaf : 3.217 TB
144+
run9435-v09\_84\_00\_01-202403-cnaf : 10.793 TB
145+
run2-v09\_89\_01\_01p03-202412-fnal : 0.602 TB
146+
all : 902.167 TB
147+
148+
* Rucio:
149+
![](images/image8.png)
150+
151+
* \[Valerio\] delete of Run 2 raw data
152+
153+
### Keepup Manager \[Nobody\]
154+
155+
### Data Manager \[Nobody\]
156+
157+
* \[Promita\]: update the available samples in SBN Production wiki.
158+
* Investigate:
159+
* \[Alessandro\] /data\_stage1 TO BE DELETED
160+
* \[Alessandro\] /icarus\_keepup, ask for calibration ntuples of run3-run5 because we have multiple copies
161+
* \[Giuseppe\] /mc/2025A\_ICARUS\_NuGraph2
162+
* \[Manuel\] BNB Overlay campaign: check if we can remove some versions
163+
* \[Promita\] run3 specific runs with PMT wave forms?
164+
165+
### Infrastructure
166+
167+
* \[Matheus/Fatima\]: **ICARUS data available on the SBN SAM instance.** SBND has developed scripts to help with the migration, so it might be good to coordinate with them how to move forward.
168+
169+
### Software
170+
171+
* \[Matteo\]: *icaruscode* reproducibility: ongoing. Here [details](https://shortbaseline.slack.com/docs/T7P7C3UAK/F0A0K0PRR16). Matteo checks with Jacob Smith, the release manager. You discovered that the issue was related to the initialization of some variables. We are waiting for a new Production release. The fix is not present is in icaruscode v10\_06\_00\_06p03.
172+
173+
### Computing
174+
175+
* \[Vito\]:
176+
* Token in FTS tested but not used in production for the moment.
177+
* Files must be transferred manually to NERSC. Rucio is setting up to transfer files with NERSC. Rucio also uses a proxy, need to use tokens.
178+
* Updated SAM configuration to run jobs with input files at NERSC \-\> TO BE TESTED
179+
* The files in Resilient are deleted after 30 days automatically if they are not used.
180+
* My test to use the RUCIO RSE FNAL\_LARCV doesn't seem to work, the test file shows only on FNAL dCache, but I'm not sure if SLAC RUCIO RSE is working, last week I reached out to Francois, but no answer so far.
181+
* Split campaigns in slice running at maximum one week to avoid the file saved in scratch are lost.
304 KB
Loading
832 KB
Loading
97.8 KB
Loading
106 KB
Loading
242 KB
Loading
119 KB
Loading
319 KB
Loading
243 KB
Loading

0 commit comments

Comments
 (0)