Skip to content

Virtual Array TODOs #1308

Open
Open
@ikrommyd

Description

@ikrommyd

With #1277 merged, coffea does support opening a file using virtual arrays. However, many things are not optimal.
This is an issue just to keep track of the things that are pending to fully support virtual arrays well.

  • Bring back the coffea 0.7-like executors.
  • Currently the schemas create some fake buffers that are functions of other buffers and are used as offsets. Those are expensive to calculate and it's gonna happen during opening the file making it slow. The schema shouldn't do that and instead, those new fake buffers should live on the objects themselves and should not be created by the schema.
  • Allow the ability to store buffer lengths as part of preprocessing and pass them into from_buffers (from_buffers should implement that ability in awkward).
  • The kernels that are ran to create things like distinctChildrenDeepIdxG are expensive. If you lose the weakref to the original events array without any cuts, those need to be calculated again if you need them. Investigate what coffea should optionally cache regarding the original uncut events.
  • Can we do something about objects that are really regular arrays like LHEPdfWeights but uproot deserializes them as list offset arrays? That sounds like a useless offsets calculation to me but there may not be another way.
  • Investigate the usage of coffea's caches to store deserialized offsets branches when opening up the file to avoid deserializing the same offsets more than one times (some objects might share offsets).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions