Skip to content

Expose canonical section-name tuple alongside iter_outer #2401

@katosh

Description

@katosh

anndata.utils.iter_outer is currently the only way to enumerate AnnData's standard section names (X, obs, var, uns, obsm, varm, obsp, varp, layers, raw). Two usability limitations fall out of that conflation of "name listing" with "value iteration":

  1. Consumers that only need the names pay the cost of fully materialising every section. Each yield runs getattr(adata, name), which reconstructs the aligned mappings and, for backed AnnData, reopens and closes the backing file. Membership checks and layout introspection have to drive the generator just to read the names.

  2. One section whose attribute access raises terminates the generator mid-iteration, silently dropping every subsequent section. This is surprising when the section holds a broken object (corrupt aligned mapping, subclass with a crashing property, validator that raises) — the most useful thing to do in that state is inspect what's there, and the iterator doesn't let you without each consumer reimplementing the section list locally.

Wish

Expose the canonical section order as a public tuple next to iter_outer and refactor the generator to iterate it:

# src/anndata/utils.py
STANDARD_SECTIONS: tuple[AnnDataElem, ...] = (
    "X", "obs", "var", "uns", "obsm", "varm", "obsp", "varp", "layers", "raw",
)

def iter_outer(adata):
    """Iterate over key-value pairs of the parent "elems" in
    :data:`STANDARD_SECTIONS` order.
    """
    for attr_name in STANDARD_SECTIONS:
        was_closed = adata.isbacked and not adata.file.is_open
        yield (attr_name, getattr(adata, attr_name))
        if was_closed:
            adata.file.close()

Name-only consumers read the constant directly:

from anndata.utils import STANDARD_SECTIONS
if section in STANDARD_SECTIONS: ...

No behavioural change for existing callers of iter_outer — same yield order, same exception semantics.

Why this helps the broken-object case

Callers that want to keep rendering/inspecting when one section raises can iterate STANDARD_SECTIONS directly and wrap each getattr in try/except with an error policy tailored to their context (render an error placeholder, log and continue, degrade gracefully). Callers that want strict semantics (to_memory, _reduce, serialization) keep using iter_outer unchanged.

Concrete reproduction of the broken-section case:

import numpy as np
from anndata import AnnData
from anndata.utils import iter_outer

class BrokenShape:
    @property
    def shape(self):
        raise TypeError("shape is not defined for this object")

adata = AnnData(np.zeros((10, 5)))
adata.obs["batch"] = list("abcdeabcde")
adata.varm._data["broken"] = BrokenShape()  # insert past validation

yielded = []
try:
    for name, _ in iter_outer(adata):
        yielded.append(name)
except TypeError:
    pass
# yielded == ['X', 'obs', 'var', 'uns', 'obsm']
# varm raised → varp, layers, raw never visited.

Alternatives considered

  • Add a strict=False parameter to iter_outer that catches per-section access errors and yields the exception in place of the value. Works, but expands iter_outer's API surface and embeds an error-handling policy in a generator that should stay single-purpose. Exposing the constant keeps the iterator minimal and lets error policy live at the call site.
  • Reorder the existing AnnDataElem Literal in _types.py so get_literal_members(AnnDataElem) yields the display order. Risky — the Literal's declaration order is already used by ANNDATA_ELEMS in experimental/backed/_io.py and downstream tests; a targeted STANDARD_SECTIONS tuple avoids the cross-module coupling.
  • Keep each consumer hardcoding its own section list. Today's status quo. Drifts as soon as sections change (as the recent iter_outer adoption in the HTML repr aimed to eliminate).

Follow-ups this unblocks

  • Rich HTML repr (feat: Add HTML representation #2236): _collect_all_field_names becomes a pure membership/order check against STANDARD_SECTIONS; _render_all_sections walks the constant and isolates per-section access failures so one broken object can't blank the whole repr.
  • Ecosystem packages introspecting AnnData layout no longer need their own copies of the section list.

A PR implementing this is ready on settylab:feat/standard-sections (1 commit, ~50 lines including tests in tests/test_utils.py).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions