Impact of the new feature
WM in general
Is your feature request related to a problem? Please describe.
This is a long wished development, initially reported and briefly discussed in
#8134
This ticket is meant to evaluate and document the current input data placement mechanism and all its dependencies. Once the input data placement logic is fully understood and documented, we can start investigating how to minimize the amount of input data rules pinned on disk, thus removing input blocks while the workflow(s) is still active, as well as releasing a workflow to start processing while input data is still being transferred to Disk.
All of this needs to consider the projection needs for HL-LHC, to be investigated in this ticket:
#11408
This information will also be required for the Computing Conceptual Design Report (CDR).
Describe the solution you'd like
A thorough analysis of the current input data placement logic, for primary/parent/pileup data.
Then the description of a future model that will minimize the disk utilization for input data, including the different workflow types, systems that would have to be created and/or refactored.
No development is expected to be delivered from this issue, but a complete documentation of the required changes and potentially a candidate design for such system.
Describe alternatives you've considered
This issue can potentially spawn a few other more targeted issues to proceed with the required developments.
Additional context
Some investigation has been performed with this ticket: #11418
Impact of the new feature
WM in general
Is your feature request related to a problem? Please describe.
This is a long wished development, initially reported and briefly discussed in
#8134
This ticket is meant to evaluate and document the current input data placement mechanism and all its dependencies. Once the input data placement logic is fully understood and documented, we can start investigating how to minimize the amount of input data rules pinned on disk, thus removing input blocks while the workflow(s) is still active, as well as releasing a workflow to start processing while input data is still being transferred to Disk.
All of this needs to consider the projection needs for HL-LHC, to be investigated in this ticket:
#11408
This information will also be required for the Computing Conceptual Design Report (CDR).
Describe the solution you'd like
A thorough analysis of the current input data placement logic, for primary/parent/pileup data.
Then the description of a future model that will minimize the disk utilization for input data, including the different workflow types, systems that would have to be created and/or refactored.
No development is expected to be delivered from this issue, but a complete documentation of the required changes and potentially a candidate design for such system.
Describe alternatives you've considered
This issue can potentially spawn a few other more targeted issues to proceed with the required developments.
Additional context
Some investigation has been performed with this ticket: #11418