-
Notifications
You must be signed in to change notification settings - Fork 175
fix: allow categorical index in read_lazy
#2062
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2062 +/- ##
==========================================
- Coverage 87.60% 85.57% -2.03%
==========================================
Files 46 46
Lines 7052 7059 +7
==========================================
- Hits 6178 6041 -137
- Misses 874 1018 +144
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do have a warning about setting non-string indices, but it is not disallowed, in non-lazy AnnData
which isn’t a great thing. do we raise an error when writing that or …?
As said: UUIDIndex
or (barring that) AnonymousIndex
makes sense, RangeIndex
or Index(dtype=np.int*)
don’t.
We have a warning about it: anndata/src/anndata/_core/anndata.py Lines 785 to 800 in c72dfff
I think this fix falls under the "people can in theory write it to disk, so we should read it" category. If we want to start disallowing this behavior in some scope (writing? in-memory? both?), I think that's a separate conversation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, one more instance of “some old stuff where we weren’t strict enough”. This is some old stuff from #300.
I say we shouldn’t finish enshrining this as a feature. We say we don’t support numeric indices, so we shouldn’t betray that by allowing more and more support to creep in.
Instead we should do something like converting to string on read (for old wrong-written indices) and write (to no longer write wrong indices) – with a warning each. Something that keeps things from breaking but doesn’t allow an unsupported index type to survive a full write-read cycle.
Fixed a bug reported by Jan. We do have a warning about setting non-string indices, but it is not disallowed, in non-lazy
AnnData
. So this fix matches the behavior forread_lazy
.