Skip to content

Feature Request/Idea: Meta-MetaData File #1588

@steven-murray

Description

@steven-murray

Every UVData file contains metadata, but often in real applications you have MANY such files. For example HERA has many files per night, each with two integrations (and otherwise, all the same metadata). OTOH, our large-scale simulations tend to output many files with one frequency per file.

Sometimes, finding the file you really need is quite tricky (sometimes not, of course). For instance, when LST-binning, you need to know the LSTs of within the file, but this can be difficult to put into the filename, for example. But reading every file within a folder to find the right file can take a LONG time, because just creating file handles for so many files takes a long time, not to mention actually reading any of the data.

I propose that we create a meta-metadata file format that smoothly handles these situations. I see the benefits as being:

  1. Easier selection of data from a group of files (as mentioned above)
  2. Easier planning for optimal concat operations, potentially including concatenation over multiple axes simultaneously.
  3. Potentially also ability to do quick "checks" of the integrity of a database of files.

One way to do all this (and probably the most efficient for any given application) would be to create a bonafide database. But this might be overkill in many cases, and scientists often prefer to just deal with files. So I propose we simply create a new HDF5-formatted bespoke format.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions