Virtual Array TODOs

With https://github.com/scikit-hep/coffea/pull/1277 merged, coffea does support opening a file using virtual arrays. However, many things are not optimal.
This is an issue just to keep track of the things that are pending to fully support virtual arrays well.
- [ ] Bring back the coffea 0.7-like executors.
- [ ] Currently the schemas create some fake buffers that are functions of other buffers and are used as offsets. Those are expensive to calculate and it's gonna happen during opening the file making it slow. The schema shouldn't do that and instead, those new fake buffers should live on the objects themselves and should not be created by the schema.
- [ ] Allow the ability to store buffer lengths as part of preprocessing and pass them into `from_buffers` (`from_buffers` should implement that ability in awkward).
- [ ] The kernels that are ran to create things like `distinctChildrenDeepIdxG` are expensive. If you lose the `weakref` to the original events array without any cuts, those need to be calculated again if you need them. Investigate what coffea should optionally cache regarding the original uncut events.
- [ ] Can we do something about objects that are really regular arrays like `LHEPdfWeights` but uproot deserializes them as list offset arrays? That sounds like a useless offsets calculation to me but there may not be another way.
- [ ] Investigate the usage of coffea's caches to store deserialized offsets branches when opening up the file to avoid deserializing the same offsets more than one times (some objects might share offsets).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Virtual Array TODOs #1308

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Virtual Array TODOs #1308

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions