Open
Description
With #1277 merged, coffea does support opening a file using virtual arrays. However, many things are not optimal.
This is an issue just to keep track of the things that are pending to fully support virtual arrays well.
- Bring back the coffea 0.7-like executors.
- Currently the schemas create some fake buffers that are functions of other buffers and are used as offsets. Those are expensive to calculate and it's gonna happen during opening the file making it slow. The schema shouldn't do that and instead, those new fake buffers should live on the objects themselves and should not be created by the schema.
- Allow the ability to store buffer lengths as part of preprocessing and pass them into
from_buffers
(from_buffers
should implement that ability in awkward). - The kernels that are ran to create things like
distinctChildrenDeepIdxG
are expensive. If you lose theweakref
to the original events array without any cuts, those need to be calculated again if you need them. Investigate what coffea should optionally cache regarding the original uncut events. - Can we do something about objects that are really regular arrays like
LHEPdfWeights
but uproot deserializes them as list offset arrays? That sounds like a useless offsets calculation to me but there may not be another way. - Investigate the usage of coffea's caches to store deserialized offsets branches when opening up the file to avoid deserializing the same offsets more than one times (some objects might share offsets).