Skip to content

feat: Unify X and layers#1707

Draft
flying-sheep wants to merge 35 commits intomainfrom
x-layers-unification
Draft

feat: Unify X and layers#1707
flying-sheep wants to merge 35 commits intomainfrom
x-layers-unification

Conversation

@flying-sheep
Copy link
Member

@flying-sheep flying-sheep commented Oct 9, 2024

TODO:

@flying-sheep flying-sheep changed the title WIP Unify X and layers Oct 9, 2024
@codecov
Copy link

codecov bot commented Oct 9, 2024

⚠️ JUnit XML file not found

The CLI was unable to find any JUnit XML files to upload.
For more help, visit our troubleshooting guide.

@ilan-gold ilan-gold modified the milestones: 0.12.0, 0.12.1, 0.13.0 Jul 16, 2025
@ilan-gold ilan-gold mentioned this pull request Jul 29, 2025
4 tasks
@ilan-gold
Copy link
Contributor

ilan-gold commented Aug 22, 2025

So re: the backed question, I think the answer is not to get rid of it, but to clarify its behavior. To do that, I think we should merge #1927 first, which may include a breaking change. If we stop allowing overriding of on-disk stores (which currently relies on inheritance form scipy.sparse private methods), then this question becomes much more simplified.

The next step after #1927 would probably be to simplify it. From what I can tell, the reaosn we need to track if the file is open or not is because if writing. If everything were purely read-only, I think it would be less of an issue.

My vision there would be in "backed" mode (after the two above points are implemented), we would simply take the X attribute on-disk, and read it in using ad.io.sparse_dataset with mode="r" and call it a day. That's how layers and other elements handle sparse_dataset so X shouldn't be different


@X.setter
def X(self, value: XDataType | None): # noqa: PLR0912
if value is None:
Copy link
Contributor

@ilan-gold ilan-gold Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a lot of this deleted logic contains things that layers can't currently do. For example, overriding the backing X; currently, if you try to override a part of a view of a layers, it will just actualize and not override the old data. But if you do adata_view.X = 1 it will fill ones in to the parent AnnData object. I'm not sure what the history is there, but it accounts for a pretty big discrepancy in functionality

ilan-gold and others added 2 commits August 29, 2025 13:35
Co-authored-by: Philipp A. <flying-sheep@web.de>
@ilan-gold
Copy link
Contributor

ilan-gold commented Feb 4, 2026

My rough plan here:

  1. chore: add anndata.settings.copy_on_write_X #2327
  2. Turn off the above behavior by simply removing the setting on main (i.e., 0.13) - hopefully this doesn't make merges into 0.12.x impossible
  3. feat: support array-api #2071 is in-flight and relevant to setting X so likely would want this to go in first
  4. read_lazy kwarg for not using dask #2147 in parallel to 3. will force us to start thinking about managing open-or-closed hdf5 files outside of X which is what will be needed here for backed mode (and backed mode would also probably be superseded by read_lazy kwarg for not using dask #2147 but not until 0.14 at least)
  5. Merge this PR, which is in a mostly complete state except for the above

@ilan-gold ilan-gold changed the title Unify X and layers feat: Unify X and layers Feb 12, 2026
@ilan-gold
Copy link
Contributor

Now it seems like we've hit a point where the blocking factor is the interaction between layers and backed mode. This PR at this point got rid of _mutated_copy which previously handled detaching a CoW-ed view of AnnData from its file handle so my_backed[subset].X = None would not error out but it does as of the current state of things because it passes through to layers which then calls copy. So we likely need to bring this function back in some capacity but this gets at the intricacies of backed mode + layers.

@ilan-gold
Copy link
Contributor

ilan-gold commented Feb 27, 2026

The main open question here (raised in #2357 and asked here on zulip) is what even backed mode is. Already was the case that backed mode was this weird "if it's set, open the file on access to .X thing" but now that is turbo apparent since nothing really happens in X anymore because there is no _X anymore.

In general, I think figuring out what backed mode should be now that X is copy-on-write (i.e., adata[subset].X doesn't write through) is the open question (asked in the description of #2357).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

concat when some values of X are None Unification of X and layers

2 participants