Skip to content

i/o of different index types #2241

@ilan-gold

Description

@ilan-gold

Please describe your wishes and possible alternatives to achieve the desired result.

Somewhat related to #777, but more broadly, pandas has a rich ecosystem of extension arrays that can be baked into indexes (see https://xarray-indexes.readthedocs.io/ for example).

Our current spec is ambiguous as to what a valid on-disk index is: https://anndata.readthedocs.io/en/latest/fileformat-prose.html:

The group MUST contain an array for the index

But is this an hdf5/zarr array or one of our serializable arrays?

Also related: #2238 would allow writing anything that is arrow-based

So there are a few TODOs here that probably have some overlap so it's not clear if they should be separate issues

  • Serializing from Feature request - var_names/obs_names as fixed-sized types (integer or bytes) #777 probably should be None which means we should bump the spec to allow for None in index
  • It's possible that different non-standard/explicit indices (a genome or genome locations of some sort that isn't loaded into memory? cross-product index i.e., for pixels like from @maltekuehl's https://github.com/complextissue/spatiomic?) will want to make it explicit that they have different semantics but still serialize as None. For example, a cross product could only store the min/max/step size per dimension. We should create a specific namespace for this in attrs that is customizeable to allow for users to create their own custom non-standard/explicit indexing strategies. The idea is that even if this user-defined metadata is unusable for some reason, Feature request - var_names/obs_names as fixed-sized types (integer or bytes) #777 would step in as a fallback
  • Clarify what is meant by "array" in a spec bump for the index of a dataframe (maybe one spec bump for all three points?). This could allow for e.g., categorical indices more clearly, for example

Metadata

Metadata

Assignees

No one assigned
    No fields configured for Enhancement.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions