-
Notifications
You must be signed in to change notification settings - Fork 16
Description
In #649 I wanted to make the documentation clear for the current schema. Here, I am proposing changing the dimension ordering in OnePhotonSeries and TwoPhotonSeries from the current data[time, width, height] to data[time, height, width] . To clarify, here "width" refers to the horizontal extent (columns, x-axis), while "height" refers to the vertical extent (rows, y-axis):
→ x (columns, width)
0 ╔═══╦═══╦═══╦═══╦═══╗
║ ║ ║ ║ ║ ║
1 ╠═══╬═══╬═══╬═══╬═══╣
y (rows, ║ ║ ║ ║ ║ ║
height) 2 ╠═══╬═══╬═══╬═══╬═══╣
║ ║ ║ ║ ║ ║
↓ 3 ╚═══╩═══╩═══╩═══╩═══╝
0 1 2 3 4
First, this change aligns NWB with the standard matrix indexing convention [row, column] = [height, width] used in the image processing ecossystem:
- scikit-image: Image processing toolkit for Python, arrays have shape
(height, width)with indexingarray[row, col] - OpenCV (Mat.at, array indexing): Computer vision library, matrices indexed as
mat.at<type>(row, col) - imageio: Python library for reading and writing images, returns arrays with shape
(height, width) - tifffile: Python library for reading microscopy TIFF files (including ScanImage), returns arrays indexed as
data[y, x]where y=rows/height, x=columns/width - BioIO: Microscopy file format reader, standardizes all data to 'TCZYX' ordering with spatial dimensions as YX
All of these are analysis and processing libraries where users interact with data through array indexing. Notable exceptions are Pillow (PIL) and ImageJ, which use (width, height) ordering, reflecting their heritage as graphics/plotting-centered tools rather than array-processing libraries (see final paragraph). Switching to this convention would reduce friction in NWB usage.
Second, this change should improve performance for raster-scanning microscopy, which represents a significant portion of optical physiology data. In raster-based systems, the width (horizontal direction) is the fastest-changing dimension during acquisition. When width becomes the last dimension and data is stored in C-order (row-major, the default for HDF5, Zarr, and NumPy), the proposed convention aligns the memory layout with the natural acquisition order.
Notably, ndx-microscopy extension which is looking like the future for NWB's core microscopy handling, already implements the data[time, height, width] convention. @alessandratrapani might be in a better position to add the motivation here.
This proposal represents a shift from a Cartesian/plotting-centric indexing convention to a matrix/image-processing-centric convention. The Cartesian convention writes coordinates as (x, y) with x=width first and is common in graphics and plotting contexts (see OpenCV Point discussion). In contrast, the matrix convention indexes arrays as [row, column] = [height, width] and dominates in image processing and array manipulation libraries. Since OnePhotonSeries and TwoPhotonSeries store raw imaging data that users will primarily interact with through array indexing rather than plotting, adopting the matrix convention better serves the typical NWB analysis workflow.
c.c. @bendichter @alessandratrapani
Links:
Indexing terminology: https://blogs.mathworks.com/loren/2007/06/21/indexing-terminology/