You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
anndata.utils.iter_outer is currently the only way to enumerate AnnData's standard section names (X, obs, var, uns, obsm, varm, obsp, varp, layers, raw). Two usability limitations fall out of that conflation of "name listing" with "value iteration":
Consumers that only need the names pay the cost of fully materialising every section. Each yield runs getattr(adata, name), which reconstructs the aligned mappings and, for backed AnnData, reopens and closes the backing file. Membership checks and layout introspection have to drive the generator just to read the names.
One section whose attribute access raises terminates the generator mid-iteration, silently dropping every subsequent section. This is surprising when the section holds a broken object (corrupt aligned mapping, subclass with a crashing property, validator that raises) — the most useful thing to do in that state is inspect what's there, and the iterator doesn't let you without each consumer reimplementing the section list locally.
Wish
Expose the canonical section order as a public tuple next to iter_outer and refactor the generator to iterate it:
# src/anndata/utils.pySTANDARD_SECTIONS: tuple[AnnDataElem, ...] = (
"X", "obs", "var", "uns", "obsm", "varm", "obsp", "varp", "layers", "raw",
)
defiter_outer(adata):
"""Iterate over key-value pairs of the parent "elems" in :data:`STANDARD_SECTIONS` order. """forattr_nameinSTANDARD_SECTIONS:
was_closed=adata.isbackedandnotadata.file.is_openyield (attr_name, getattr(adata, attr_name))
ifwas_closed:
adata.file.close()
No behavioural change for existing callers of iter_outer — same yield order, same exception semantics.
Why this helps the broken-object case
Callers that want to keep rendering/inspecting when one section raises can iterate STANDARD_SECTIONS directly and wrap each getattr in try/except with an error policy tailored to their context (render an error placeholder, log and continue, degrade gracefully). Callers that want strict semantics (to_memory, _reduce, serialization) keep using iter_outer unchanged.
Concrete reproduction of the broken-section case:
importnumpyasnpfromanndataimportAnnDatafromanndata.utilsimportiter_outerclassBrokenShape:
@propertydefshape(self):
raiseTypeError("shape is not defined for this object")
adata=AnnData(np.zeros((10, 5)))
adata.obs["batch"] =list("abcdeabcde")
adata.varm._data["broken"] =BrokenShape() # insert past validationyielded= []
try:
forname, _initer_outer(adata):
yielded.append(name)
exceptTypeError:
pass# yielded == ['X', 'obs', 'var', 'uns', 'obsm']# varm raised → varp, layers, raw never visited.
Alternatives considered
Add a strict=False parameter to iter_outer that catches per-section access errors and yields the exception in place of the value. Works, but expands iter_outer's API surface and embeds an error-handling policy in a generator that should stay single-purpose. Exposing the constant keeps the iterator minimal and lets error policy live at the call site.
Reorder the existing AnnDataElem Literal in _types.py so get_literal_members(AnnDataElem) yields the display order. Risky — the Literal's declaration order is already used by ANNDATA_ELEMS in experimental/backed/_io.py and downstream tests; a targeted STANDARD_SECTIONS tuple avoids the cross-module coupling.
Keep each consumer hardcoding its own section list. Today's status quo. Drifts as soon as sections change (as the recent iter_outer adoption in the HTML repr aimed to eliminate).
Follow-ups this unblocks
Rich HTML repr (feat: Add HTML representation #2236): _collect_all_field_names becomes a pure membership/order check against STANDARD_SECTIONS; _render_all_sections walks the constant and isolates per-section access failures so one broken object can't blank the whole repr.
Ecosystem packages introspecting AnnData layout no longer need their own copies of the section list.
A PR implementing this is ready on settylab:feat/standard-sections (1 commit, ~50 lines including tests in tests/test_utils.py).
anndata.utils.iter_outeris currently the only way to enumerate AnnData's standard section names (X,obs,var,uns,obsm,varm,obsp,varp,layers,raw). Two usability limitations fall out of that conflation of "name listing" with "value iteration":Consumers that only need the names pay the cost of fully materialising every section. Each yield runs
getattr(adata, name), which reconstructs the aligned mappings and, for backed AnnData, reopens and closes the backing file. Membership checks and layout introspection have to drive the generator just to read the names.One section whose attribute access raises terminates the generator mid-iteration, silently dropping every subsequent section. This is surprising when the section holds a broken object (corrupt aligned mapping, subclass with a crashing property, validator that raises) — the most useful thing to do in that state is inspect what's there, and the iterator doesn't let you without each consumer reimplementing the section list locally.
Wish
Expose the canonical section order as a public tuple next to
iter_outerand refactor the generator to iterate it:Name-only consumers read the constant directly:
No behavioural change for existing callers of
iter_outer— same yield order, same exception semantics.Why this helps the broken-object case
Callers that want to keep rendering/inspecting when one section raises can iterate
STANDARD_SECTIONSdirectly and wrap eachgetattrintry/exceptwith an error policy tailored to their context (render an error placeholder, log and continue, degrade gracefully). Callers that want strict semantics (to_memory,_reduce, serialization) keep usingiter_outerunchanged.Concrete reproduction of the broken-section case:
Alternatives considered
strict=Falseparameter toiter_outerthat catches per-section access errors and yields the exception in place of the value. Works, but expandsiter_outer's API surface and embeds an error-handling policy in a generator that should stay single-purpose. Exposing the constant keeps the iterator minimal and lets error policy live at the call site.AnnDataElemLiteral in_types.pysoget_literal_members(AnnDataElem)yields the display order. Risky — the Literal's declaration order is already used byANNDATA_ELEMSinexperimental/backed/_io.pyand downstream tests; a targetedSTANDARD_SECTIONStuple avoids the cross-module coupling.iter_outeradoption in the HTML repr aimed to eliminate).Follow-ups this unblocks
_collect_all_field_namesbecomes a pure membership/order check againstSTANDARD_SECTIONS;_render_all_sectionswalks the constant and isolates per-section access failures so one broken object can't blank the whole repr.A PR implementing this is ready on
settylab:feat/standard-sections(1 commit, ~50 lines including tests intests/test_utils.py).