Skip to content

Conversation

ilan-gold
Copy link
Contributor

@ilan-gold ilan-gold commented Jul 29, 2025

Fixed a bug reported by Jan. We do have a warning about setting non-string indices, but it is not disallowed, in non-lazy AnnData. So this fix matches the behavior for read_lazy.

  • Closes #
  • Tests added
  • Release note added (or unnecessary)

@ilan-gold ilan-gold added this to the 0.12.2 milestone Jul 29, 2025
Copy link

codecov bot commented Jul 29, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.57%. Comparing base (c72dfff) to head (2e14d4f).
⚠️ Report is 18 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2062      +/-   ##
==========================================
- Coverage   87.60%   85.57%   -2.03%     
==========================================
  Files          46       46              
  Lines        7052     7059       +7     
==========================================
- Hits         6178     6041     -137     
- Misses        874     1018     +144     
Files with missing lines Coverage Δ
src/anndata/_io/specs/lazy_methods.py 96.31% <100.00%> (+0.16%) ⬆️

... and 7 files with indirect coverage changes

@ilan-gold ilan-gold requested a review from flying-sheep July 29, 2025 10:50
Copy link
Member

@flying-sheep flying-sheep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have a warning about setting non-string indices, but it is not disallowed, in non-lazy AnnData

which isn’t a great thing. do we raise an error when writing that or …?

As said: UUIDIndex or (barring that) AnonymousIndex makes sense, RangeIndex or Index(dtype=np.int*) don’t.

@ilan-gold
Copy link
Contributor Author

which isn’t a great thing. do we raise an error when writing that or …?

We have a warning about it:

if (
len(value) > 0
and not isinstance(value, pd.RangeIndex)
and infer_dtype(value) not in {"string", "bytes"}
):
sample = list(value[: min(len(value), 5)])
msg = dedent(
f"""
AnnData expects .{attr}.index to contain strings, but got values like:
{sample}
Inferred to be: {infer_dtype(value)}
"""
)
warnings.warn(msg, stacklevel=2)
return value

I think this fix falls under the "people can in theory write it to disk, so we should read it" category. If we want to start disallowing this behavior in some scope (writing? in-memory? both?), I think that's a separate conversation.

@ilan-gold ilan-gold requested a review from flying-sheep July 29, 2025 12:18
Copy link
Member

@flying-sheep flying-sheep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, one more instance of “some old stuff where we weren’t strict enough”. This is some old stuff from #300.

I say we shouldn’t finish enshrining this as a feature. We say we don’t support numeric indices, so we shouldn’t betray that by allowing more and more support to creep in.

Instead we should do something like converting to string on read (for old wrong-written indices) and write (to no longer write wrong indices) – with a warning each. Something that keeps things from breaking but doesn’t allow an unsupported index type to survive a full write-read cycle.

@ilan-gold ilan-gold modified the milestones: 0.12.2, 0.12.3, 0.12.4 Oct 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants