-
Notifications
You must be signed in to change notification settings - Fork 65
Collections Specification #31
Description
What is an image collection?
A collection of images is a semantic grouping of two or more associated ome-ngff images and/or image-labels.
This definition could include
- Images which do not share a physical coordinate space e.g. training dataset of images containing bees
- Images which share a physical coordinate space and whose storage specification must support sufficient metadata to determine this positioning e.g. high-content screening plates and wells
- A hierarchy of image groups of arbitrary depth which may or may not share physical coordinates
- Other things…?
What workflows should it support?
The specification should support implementations being able to traverse the image collection and, where relevant, map the associated metadata to the physical coordinate space for loading these images.
Ideally, the specification should provide sufficient information at each level of a hierarchical grouping to allow for the loading of both the entire collection, and the loading of an arbitrary level of the hierarchy. This can be important when wanting to share/view partial datasets or update only small parts of the entire collection.
Where labels or other related data is provided (e.g. meshes, points…), the specification should support being able to associate any member of the image collection with its associated labels, regardless of the level in the hierarchy.
The OME-NGFF spec is close to supporting this functionality with the HCS specification which allows the positioning of wells into rows and plates. The main drawbacks of this specification are
- It is too specific to be easily used for images which ARE physically associated but are not HCS acquisitions
- It may be difficult to understand for researchers who are not working with HCS images but nevertheless wish to store their collection in OME-NGFF format
- It does not support an arbitrary depth of groupings
- It does not support collections which are not physically associated
What should it be called?
- Dataset - this term is already used in various places so may not be the best choice
- Collection - a general enough term which is currently mostly unused
- Hierarchical definition - there is a case for this specification being a hierarchy of specifications, with each one defining a more tightly bound collection e.g.
- Bag - associated images with no metadata
- Stack - associated images which overlap in physical space
- Panorama - associated images which stitch together in physical space
Ideally, the names used in the base specification would be general enough to support a broad variety of use cases and tailored use cases could be demonstrated using examples in the documentation.
Reference specifications
BDV XML Files
SVG
TrakEM2
Napari Plugin for image-label collections
mobie grid view of many sources
Related
Image.SC discussion on collections
Live notes from latest community call
HCS Specification
What next?
I think we should first decide on whether we want to support arbitrary levels in the hierarchy and whether we want a general spec which we can “inherit” from for more detailed specs, or whether we want one spec to rule them all.
My vote is that we define the most generic collection (a “bag” of images) which works with arbitrary levels of grouping (it’s collections all the way down), and then work to add to it for more complex collections. I will be working on this over the coming week and will post here once I have something working, but of course would love to hear what everyone’s thoughts are on the best way forward.