|
| 1 | +## mag 11, 2026 09:00 GMT-5 | ICARUS Production Meeting |
| 2 | + |
| 3 | +### Attendees |
| 4 | + |
| 5 | +Alessandro Maria Ricci, Daniel Carber, Tracy Usher, Giuseppe Cerati, Vito Di Benedetto, Promita Roy |
| 6 | + |
| 7 | +### Monitoring resource usage |
| 8 | + |
| 9 | +| User Grid UsageHistory of the *Running Jobs by User* for the last 7 days: [link](https://fifemon.fnal.gov/monitor/d/000000053/experiment-batch-details?orgId=1&viewPanel=9&from=now-7d&to=now&var-experiment=icarus&var-pool=dune-global&var-pool=fifebatch)  | User Job EfficiencyHistory of the User Job Efficiency for the last 7 days: [link](https://fifemon.fnal.gov/monitor/d/000000022/experiment-efficiency-details?from=now-7d&to=now&var-experiment=icarus&var-pool=dune-global&var-pool=fifebatch&orgId=1&viewPanel=2)  | |
| 10 | +| ----- | ----- | |
| 11 | +| **Icaruspro Jobs Exit Code**History of the icaruspro job exit code for the last 7 days: [link](https://landscape.fnal.gov/kibana/app/kibana#/dashboard/ba047b90-b8ca-11e7-989a-91951b87e80a?_g=\(refreshInterval:\(pause:!t,value:0\),time:\(from:now-4d,mode:relative,to:now\)\)&_a=\(description:'View%20jobs%20exit%20code,%20where%20they%20ran,%20and%20logs',filters:!\(\('$state':\(store:appState\),meta:\(alias:!n,disabled:!f,index:'fifebatch-history-*',key:pool,negate:!f,params:\(query:fifebatch,type:phrase\),type:phrase,value:fifebatch\),query:\(match:\(pool:\(query:fifebatch,type:phrase\)\)\)\),\('$state':\(store:appState\),meta:\(alias:!n,disabled:!f,index:'fifebatch-history-*',key:User,negate:!f,params:\(query:'icaruspro@fnal.gov',type:phrase\),type:phrase,value:'icaruspro@fnal.gov'\),query:\(match:\(User:\(query:'icaruspro@fnal.gov',type:phrase\)\)\)\),\('$state':\(store:appState\),meta:\(alias:!n,disabled:!f,index:'fifebatch-history-*',key:Jobsub_Group,negate:!f,params:\(query:icarus,type:phrase\),type:phrase,value:icarus\),query:\(match:\(Jobsub_Group:\(query:icarus,type:phrase\)\)\)\)\),fullScreenMode:!f,options:\(darkTheme:!f\),panels:!\(\(embeddableConfig:\(vis:\(colors:\(Cancelled:%23967302,Fail:%23BF1B00,Success:%23629E51\),legendOpen:!t\)\),gridData:\(h:15,i:'1',w:40,x:0,y:0\),id:'2f40f420-b8ca-11e7-989a-91951b87e80a',panelIndex:'1',type:visualization,version:'6.8.23'\),\(gridData:\(h:10,i:'2',w:24,x:24,y:15\),id:'569cca30-b8ca-11e7-989a-91951b87e80a',panelIndex:'2',type:visualization,version:'6.8.23'\),\(gridData:\(h:10,i:'3',w:24,x:0,y:15\),id:'65759a00-b8ca-11e7-989a-91951b87e80a',panelIndex:'3',type:visualization,version:'6.8.23'\),\(embeddableConfig:\(columns:!\(JobsubJobId,Owner,ExitCode,ExitSignal,MATCH_GLIDEIN_Site,MachineAttrMachine0,stdout,stderr\),sort:!\('@timestamp',desc\)\),gridData:\(h:30,i:'4',w:48,x:0,y:25\),id:'7e94c3c0-b8cb-11e7-989a-91951b87e80a',panelIndex:'4',type:search,version:'6.8.23'\),\(gridData:\(h:15,i:'5',w:8,x:40,y:0\),id:AWZpvkXbLj3wKbt0N_Vp,panelIndex:'5',type:visualization,version:'6.8.23'\)\),query:\(language:lucene,query:\(match_all:\(\)\)\),timeRestore:!f,title:'Fifebatch%20History',viewMode:view\)) | **SBN Data Pools**[link](https://fifemon.fnal.gov/monitor/d/rflbgV-iz/dcache-by-poolgroup?orgId=1&var-PoolGroup=SbnData2Pools&from=now-3h&to=now&refresh=5m) | |
| 12 | +|  |  | |
| 13 | +| Dcache Persistent Usage per user Total is 114 TiB: [link](https://fifemon.fnal.gov/monitor/d/000000175/dcache-persistent-usage-by-vo?orgId=1&var-VO=icarus), Used space: 96.6 TiB (87.2%) | | |
| 14 | +|  | | |
| 15 | + |
| 16 | +### **Production requests** |
| 17 | + |
| 18 | +| 2025 Ongoing/Pending Production Requests | |
| 19 | +| ----- | |
| 20 | +|  | |
| 21 | +| **2026 Ongoing/Pending Production Requests** | |
| 22 | +|  | |
| 23 | + |
| 24 | +Link to [spreadsheet](https://docs.google.com/spreadsheets/d/1ffBp475tEzlRilFs7xLhbevSZHjsuk1Dm5FGFIPWsFM/edit?gid=1567393491#gid=1567393491) |
| 25 | +Link to [github project](https://github.com/orgs/SBNSoftware/projects/49) |
| 26 | + |
| 27 | +### |
| 28 | + |
| 29 | +POMS active campaigns [here](https://pomsgpvm02.fnal.gov/poms/show_campaigns/icarus/production) |
| 30 | + |
| 31 | +### Notes |
| 32 | + |
| 33 | +* |
| 34 | + |
| 35 | +### Requests |
| 36 | + |
| 37 | +* Assigned: |
| 38 | + * Request \#86 \[Manuel\]: |
| 39 | + 1. See \#75. stage0 \-\> 100% complete |
| 40 | + 2. Stage1\_caf stopped \-\> 100% complete |
| 41 | + 3. Used g4 wrong FHiCL. |
| 42 | + 4. Larcv saved in SBNDataPool: 12 TB, the rest in scratch |
| 43 | + 5. 6% larcv lost |
| 44 | + 6. TRANSFERRING TO POLARIS |
| 45 | + |
| 46 | + * Request \#123 \[Fatima\]: |
| 47 | + 1. 97% is complete |
| 48 | + |
| 49 | + * Request \#6 \[Manuel\]: |
| 50 | + 1. Reprocess 68% of the larcv |
| 51 | + 2. TRANSFERRING TO POLARIS |
| 52 | + |
| 53 | + |
| 54 | + |
| 55 | + * Request \#8 \[Alessandro\]: Perlmutter and FermiGrid |
| 56 | + 1. Stage0: 74% complete |
| 57 | + 2. Larcv: 56% complete |
| 58 | + 3. TRANSFERRING in S3DF |
| 59 | + |
| 60 | + |
| 61 | + * Request \#17 \[Promita\]: 90% complete |
| 62 | + |
| 63 | + |
| 64 | + * Request \#32-33 \[Fatima\]: |
| 65 | + 1. 32: running |
| 66 | + 2. 33: submitted |
| 67 | + |
| 68 | + |
| 69 | + * Request \#38 \[Manuel\]: testing |
| 70 | + |
| 71 | + |
| 72 | + |
| 73 | + * Request \#47 \[Thomas\]: Aurora |
| 74 | + |
| 75 | + * Request \#49 \[Alessandro\]: started |
| 76 | + 1. Larcv TRANSFERRED TO POLARIS |
| 77 | + |
| 78 | + * Request \#50 \[Antonio\]: test complete |
| 79 | + |
| 80 | +### Action Items and Open issue |
| 81 | + |
| 82 | +* Link to [action items](https://github.com/orgs/SBNSoftware/projects/32) |
| 83 | + |
| 84 | +* **Storage:** 438 TiB free on SBNDataPool. |
| 85 | + |
| 86 | +* \[Matheus/Giuseppe\] SBND is using some space in SBNDataPool. Some SBND datasets can be deleted \-\> still 6 TiB can be recovered. Totally, we recovered **22 TiB**. |
| 87 | + |
| 88 | +* \[Vito/Antonio\] **Transfer of Run2 compressed files to Tape** **(420 TB), some TBs in DataPool2 as well** 100% complete \-\> Deleting on disk ongoing |
| 89 | + The transfer to tape has been split by data stream, the selection was based on origin path, we can update the config to delete the BNB data streams selectively, we have |
| 90 | + run2\_compressed\_bnbmajority\_SBNDATA \-\> DELETED run2\_compressed\_bnbmajority\_SBNDATA2 \-\> DELETED |
| 91 | + run2\_compressed\_bnbminbias\_SBNDATA \-\> DELETED |
| 92 | + run2\_compressed\_bnbminbias\_SBNDATA2 \-\> DELETED |
| 93 | + run2\_compressed\_offbeambnbmajority\_SBNDATA \-\> DELETED |
| 94 | + run2\_compressed\_offbeambnbmajority\_SBNDATA2 \-\> DELETED |
| 95 | + run2\_compressed\_offbeambnbminbias\_SBNDATA \-\> DELETED |
| 96 | + run2\_compressed\_offbeambnbminbias\_SBNDATA2 \-\> DELETED |
| 97 | + SBNDATA/SBNDATA2 suffix is to select files from one of SBNDataPools/SBNData2Pools |
| 98 | + **Keep a subset of bnbmajority compressed raw data (run 9435\)** |
| 99 | + (25 files present a mismatch between tape and disk version, they have not been deleted) |
| 100 | + |
| 101 | +* \[Alessandro\] Transfer of stage1 run2 to tape: |
| 102 | + * Icaruspro\_2024\_Run2\_production\_Reproc\_Run2\_v09\_89\_01\_01p03\_bnbmajority\_stage1 (90 TB) \-\> COMPLETED |
| 103 | + * Icaruspro\_2024\_Run2\_production\_Reproc\_Run2\_v09\_89\_01\_01p03\_offbeambnbmajority\_stage1 (70 TB) \-\> COMPLETED |
| 104 | + * icaruspro\_production\_v09\_89\_01\_01\_2024A\_ICARUS\_MC\_Sys\_NuCos\_2024A\_MC\_Sys\_NuCos\_CV\_2ndV\_stage1 (51 TB) \-\> COMPLETED |
| 105 | + |
| 106 | +### CNAF |
| 107 | + |
| 108 | +* **RUN3 Processing**: |
| 109 | + **Valerio and his team:** they have processed 100% of on- and off-beam, both bnbmajority and bnbminbias. Now the Italian team is processing the Calibration. Then, stage1 and caf will be reprocessed. **CNAF is full at 99%. Calibration is ongoing.** |
| 110 | + |
| 111 | +* STORAGE: |
| 112 | + |
| 113 | +* Production: |
| 114 | + |
| 115 | + \===================================================== |
| 116 | + \== /storage/gpfs\_data/icarus/plain/data |
| 117 | + \===================================================== |
| 118 | + test : 0.112 TB |
| 119 | + mc-v10\_06\_00\_01p01-202603-cnaf-numi-nue-disap\_test : 0.000 TB |
| 120 | + mc\_from\_list\_test : 0.005 TB |
| 121 | + mc-v10\_06\_00\_01p01-20260409-cnaf-dnu-test\_standard : 0.000 TB |
| 122 | + \-v10\_06\_00\_01p01-202603-cnaf-numi-nue-disap-cv-testvar : 0.000 TB |
| 123 | + \-v10\_06\_00\_01p01-202603-cnaf-numi-nue-disap-cv-nueonly : 0.000 TB |
| 124 | + mc-v10\_06\_00\_01p01-202603-cnaf-numi-nue-disap\_variations : 0.000 TB |
| 125 | + \-v10\_06\_00\_01p01-202603-cnaf-numi-nue-disap-cv : 0.000 TB |
| 126 | + mc-v10\_06\_00\_01p01-202603-cnaf-numi-nue-disap-cv-nueonly : 0.000 TB |
| 127 | + mc-v10\_06\_00\_01p01-20260409-cnaf-dnu\_m100 : 0.000 TB |
| 128 | + \-processing-cnaf-1025-v10\_06\_00\_04p03 : 0.000 TB |
| 129 | + mc-v10\_06\_00\_01p01-202603-cnaf-numi-nue-disap-cv : 0.107 TB |
| 130 | + mc : 174.763 TB |
| 131 | + mc-v0989-extendedCV-BNB : 1.050 TB |
| 132 | + prodcorsika\_proton\_intime\_icarus\_bnb\_sce\_1d\_drift\_on\_MC-v09\_87\_00-042024-cnaf : 10.290 TB |
| 133 | + mc-v09\_84\_00\_01-202412-cnaf-corrsce : 2.814 TB |
| 134 | + mc-v10\_06\_00\_01p01-202603-cnaf-numi-nue-disap\_variations : 97.188 TB |
| 135 | + mc-v09\_84\_00\_01-202403-cnaf-corrsce : 3.020 TB |
| 136 | + mc-v10\_06\_00\_01p01-202603-cnaf-numi-nue-disap-cv-nueonly : 7.479 TB |
| 137 | + mc-v10\_06\_00\_01p01-202603-cnaf-numi-nue-disap-cv : 52.921 TB |
| 138 | + prod : 727.291 TB |
| 139 | + run2-v09\_84\_00\_01-202403-cnaf : 95.381 TB |
| 140 | + run2-v09\_72\_00\_06-202312-cnaf : 5.691 TB |
| 141 | + run2-v09\_83\_01-202402-cnaf : 0.000 TB |
| 142 | + run3-processing-cnaf-1025-v10\_06\_00\_04p03 : 611.608 TB |
| 143 | + run1-v09\_72\_00\_05p03-202311-cnaf : 3.217 TB |
| 144 | + run9435-v09\_84\_00\_01-202403-cnaf : 10.793 TB |
| 145 | + run2-v09\_89\_01\_01p03-202412-fnal : 0.602 TB |
| 146 | + all : 902.167 TB |
| 147 | + |
| 148 | +* Rucio: |
| 149 | +  |
| 150 | + |
| 151 | +* \[Valerio\] delete of Run 2 raw data |
| 152 | + |
| 153 | +### Keepup Manager \[Nobody\] |
| 154 | + |
| 155 | +### Data Manager \[Nobody\] |
| 156 | + |
| 157 | +* \[Promita\]: update the available samples in SBN Production wiki. |
| 158 | +* Investigate: |
| 159 | + * \[Alessandro\] /data\_stage1 TO BE DELETED |
| 160 | + * \[Alessandro\] /icarus\_keepup, ask for calibration ntuples of run3-run5 because we have multiple copies |
| 161 | + * \[Giuseppe\] /mc/2025A\_ICARUS\_NuGraph2 |
| 162 | + * \[Manuel\] BNB Overlay campaign: check if we can remove some versions |
| 163 | + * \[Promita\] run3 specific runs with PMT wave forms? |
| 164 | + |
| 165 | +### Infrastructure |
| 166 | + |
| 167 | +* \[Matheus/Fatima\]: **ICARUS data available on the SBN SAM instance.** SBND has developed scripts to help with the migration, so it might be good to coordinate with them how to move forward. |
| 168 | + |
| 169 | +### Software |
| 170 | + |
| 171 | +* \[Matteo\]: *icaruscode* reproducibility: ongoing. Here [details](https://shortbaseline.slack.com/docs/T7P7C3UAK/F0A0K0PRR16). Matteo checks with Jacob Smith, the release manager. You discovered that the issue was related to the initialization of some variables. We are waiting for a new Production release. The fix is not present is in icaruscode v10\_06\_00\_06p03. |
| 172 | + |
| 173 | +### Computing |
| 174 | + |
| 175 | +* \[Vito\]: |
| 176 | + * Token in FTS tested but not used in production for the moment. |
| 177 | + * Files must be transferred manually to NERSC. Rucio is setting up to transfer files with NERSC. Rucio also uses a proxy, need to use tokens. |
| 178 | + * Updated SAM configuration to run jobs with input files at NERSC \-\> TO BE TESTED |
| 179 | + * The files in Resilient are deleted after 30 days automatically if they are not used. |
| 180 | + * My test to use the RUCIO RSE FNAL\_LARCV doesn't seem to work, the test file shows only on FNAL dCache, but I'm not sure if SLAC RUCIO RSE is working, last week I reached out to Francois, but no answer so far. |
| 181 | + * Split campaigns in slice running at maximum one week to avoid the file saved in scratch are lost. |
0 commit comments