Please describe your wishes and possible alternatives to achieve the desired result.
Somewhat related to #777, but more broadly, pandas has a rich ecosystem of extension arrays that can be baked into indexes (see https://xarray-indexes.readthedocs.io/ for example).
Our current spec is ambiguous as to what a valid on-disk index is: https://anndata.readthedocs.io/en/latest/fileformat-prose.html:
The group MUST contain an array for the index
But is this an hdf5/zarr array or one of our serializable arrays?
Also related: #2238 would allow writing anything that is arrow-based
So there are a few TODOs here that probably have some overlap so it's not clear if they should be separate issues
Please describe your wishes and possible alternatives to achieve the desired result.
Somewhat related to #777, but more broadly,
pandashas a rich ecosystem of extension arrays that can be baked into indexes (see https://xarray-indexes.readthedocs.io/ for example).Our current spec is ambiguous as to what a valid on-disk index is: https://anndata.readthedocs.io/en/latest/fileformat-prose.html:
But is this an hdf5/zarr array or one of our serializable arrays?
Also related: #2238 would allow writing anything that is arrow-based
So there are a few TODOs here that probably have some overlap so it's not clear if they should be separate issues
Nonewhich means we should bump the spec to allow forNonein indexNone. For example, a cross product could only store the min/max/step size per dimension. We should create a specific namespace for this inattrsthat is customizeable to allow for users to create their own custom non-standard/explicit indexing strategies. The idea is that even if this user-defined metadata is unusable for some reason, Feature request - var_names/obs_names as fixed-sized types (integer or bytes) #777 would step in as a fallback