fix: allow categorical index in `read_lazy` #2062

ilan-gold · 2025-07-29T10:17:01Z

Fixed a bug reported by Jan. We do have a warning about setting non-string indices, but it is not disallowed, in non-lazy AnnData. So this fix matches the behavior for read_lazy.

Closes #
Tests added
Release note added (or unnecessary)

codecov · 2025-07-29T10:22:18Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.57%. Comparing base (c72dfff) to head (2e14d4f).
⚠️ Report is 18 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2062      +/-   ##
==========================================
- Coverage   87.60%   85.57%   -2.03%     
==========================================
  Files          46       46              
  Lines        7052     7059       +7     
==========================================
- Hits         6178     6041     -137     
- Misses        874     1018     +144

Files with missing lines	Coverage Δ
src/anndata/_io/specs/lazy_methods.py	`96.31% <100.00%> (+0.16%)`	⬆️

... and 7 files with indirect coverage changes

flying-sheep

We do have a warning about setting non-string indices, but it is not disallowed, in non-lazy AnnData

which isn’t a great thing. do we raise an error when writing that or …?

As said: UUIDIndex or (barring that) AnonymousIndex makes sense, RangeIndex or Index(dtype=np.int*) don’t.

ilan-gold · 2025-07-29T12:18:00Z

which isn’t a great thing. do we raise an error when writing that or …?

We have a warning about it:

anndata/src/anndata/_core/anndata.py

Lines 785 to 800 in c72dfff

    
                   if ( 
        
                       len(value) > 0 
        
                       and not isinstance(value, pd.RangeIndex) 
        
                       and infer_dtype(value) not in {"string", "bytes"} 
        
                   ): 
        
                       sample = list(value[: min(len(value), 5)]) 
        
                       msg = dedent( 
        
                           f""" 
        
                           AnnData expects .{attr}.index to contain strings, but got values like: 
        
                               {sample} 
        
                               Inferred to be: {infer_dtype(value)} 
        
                           """ 
        
                       ) 
        
                       warnings.warn(msg, stacklevel=2) 
        
                   return value

I think this fix falls under the "people can in theory write it to disk, so we should read it" category. If we want to start disallowing this behavior in some scope (writing? in-memory? both?), I think that's a separate conversation.

flying-sheep

I see, one more instance of “some old stuff where we weren’t strict enough”. This is some old stuff from #300.

I say we shouldn’t finish enshrining this as a feature. We say we don’t support numeric indices, so we shouldn’t betray that by allowing more and more support to creep in.

Instead we should do something like converting to string on read (for old wrong-written indices) and write (to no longer write wrong indices) – with a warning each. Something that keeps things from breaking but doesn’t allow an unsupported index type to survive a full write-read cycle.

ilan-gold added 2 commits July 29, 2025 11:55

fix: allow categorical index in read_lazy

2bac6cf

chore: add warning

4a1c49e

ilan-gold added the skip-gpu-ci label Jul 29, 2025

ilan-gold added this to the 0.12.2 milestone Jul 29, 2025

ilan-gold added 2 commits July 29, 2025 12:18

chore: relnote

2d77e89

Merge branch 'main' into ig/allow_different_indexes

2e14d4f

ilan-gold requested a review from flying-sheep July 29, 2025 10:50

flying-sheep reviewed Jul 29, 2025

View reviewed changes

ilan-gold requested a review from flying-sheep July 29, 2025 12:18

flying-sheep requested changes Jul 29, 2025

View reviewed changes

ilan-gold mentioned this pull request Aug 28, 2025

fix: explicit error for pandas.MultiIndex #2096

Merged

3 tasks

ilan-gold modified the milestones: 0.12.2, 0.12.3, 0.12.4 Oct 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: allow categorical index in `read_lazy` #2062

fix: allow categorical index in `read_lazy` #2062

Uh oh!

ilan-gold commented Jul 29, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jul 29, 2025 •

edited

Loading

Uh oh!

flying-sheep left a comment •

edited

Loading

Uh oh!

ilan-gold commented Jul 29, 2025

Uh oh!

flying-sheep left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: allow categorical index in read_lazy #2062

Are you sure you want to change the base?

fix: allow categorical index in read_lazy #2062

Uh oh!

Conversation

ilan-gold commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

flying-sheep left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ilan-gold commented Jul 29, 2025

Uh oh!

flying-sheep left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: allow categorical index in `read_lazy` #2062

fix: allow categorical index in `read_lazy` #2062

ilan-gold commented Jul 29, 2025 •

edited

Loading

codecov bot commented Jul 29, 2025 •

edited

Loading

flying-sheep left a comment •

edited

Loading

flying-sheep left a comment •

edited

Loading