-
Notifications
You must be signed in to change notification settings - Fork 33
Description
Every UVData file contains metadata, but often in real applications you have MANY such files. For example HERA has many files per night, each with two integrations (and otherwise, all the same metadata). OTOH, our large-scale simulations tend to output many files with one frequency per file.
Sometimes, finding the file you really need is quite tricky (sometimes not, of course). For instance, when LST-binning, you need to know the LSTs of within the file, but this can be difficult to put into the filename, for example. But reading every file within a folder to find the right file can take a LONG time, because just creating file handles for so many files takes a long time, not to mention actually reading any of the data.
I propose that we create a meta-metadata file format that smoothly handles these situations. I see the benefits as being:
- Easier selection of data from a group of files (as mentioned above)
- Easier planning for optimal
concatoperations, potentially including concatenation over multiple axes simultaneously. - Potentially also ability to do quick "checks" of the integrity of a database of files.
One way to do all this (and probably the most efficient for any given application) would be to create a bonafide database. But this might be overkill in many cases, and scientists often prefer to just deal with files. So I propose we simply create a new HDF5-formatted bespoke format.