fix: merge strategies for `concat_on_disk` #2122

ilan-gold · 2025-09-16T15:38:28Z

I'm putting this in the soonest upcoming milestone because

a) I think it's a bug that we didn't have any sort of warning or anything about this and
b) It's in experimental and this isn't even an API change

As for the nature of the changes, I still need to look into why read_elem_lazy produces different results in tokenization for dask dense than read_elem on the off-axis. Otherwise, this could be fully lazy Bug resolved

Closes Concatenating on disk doesn't respect the merge argument. #2110 and part of concat_on_disk fails to write alternative axis mapping and uns #1854
Tests added
Release note added (or unnecessary)

for more information, see https://pre-commit.ci

codecov · 2025-09-23T14:43:58Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.62%. Comparing base (5080856) to head (c95898f).
⚠️ Report is 2 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2122      +/-   ##
==========================================
- Coverage   85.70%   85.62%   -0.08%     
==========================================
  Files          46       46              
  Lines        7078     7083       +5     
==========================================
- Hits         6066     6065       -1     
- Misses       1012     1018       +6

Files with missing lines	Coverage Δ
src/anndata/_core/merge.py	`85.11% <100.00%> (+0.08%)`	⬆️
src/anndata/experimental/merge.py	`87.73% <100.00%> (+0.53%)`	⬆️

... and 3 files with indirect coverage changes

…_merge

ilan-gold · 2025-09-23T15:01:09Z

src/anndata/experimental/merge.py

-    if not alt_mapping:
-        alt_df = pd.DataFrame(index=alt_indices)
-        write_elem(output_group, alt_axis_name, alt_df)


So you might wonder why completely ignoring an argument and also writing to the wrong key for a given function wouldn't produce an error previously. Crazily enough it was that

alt_mapping was always {} here because merge is always None in the tests

We then write a dataframe (because not {} is True) to the wrong key alt_axis_name instead of f"{alt_axis_name}m" but its index is correct

We never actually checked the off-axis dataframe in the tests because merge was None on the groundtruth as well for anndata.concat so all of the other columns were dropped in the groundtruth concatenation

…_merge

src/anndata/_core/merge.py

src/anndata/experimental/merge.py

flying-sheep · 2025-09-29T11:30:24Z

src/anndata/experimental/merge.py

+        This is False by default, since the resulting arrays are often not meaningful, and is ignored when True.
+        If you are interested in this feature, please open an issue.


I’m confused: isn’t the behavior of this PR right now

raises NotImplementedError when set to True

{obs,var}p are merged when it’s set to false?

doesn’t seem to make sense to me!

Shouldn’t this PR instead

document this as working

remove the NotImplementedError

wrap the call to _write_alt_pairwise in if pairwise?

No, I wanted to match the behavior of anndata.concat here so this PR raises NotImplementedError only for the pairwise mappings along the concatenation axis, but just like anndata.concat automatically merges the off-axis.

Ah I see _write_alt_pairwise is about the off-axis. Makes sense!

But the rest is still valid:

anndata/src/anndata/experimental/merge.py

Lines 580 to 582 in 1d4cb87

if pairwise:

msg = "pairwise concatenation not yet implemented"

raise NotImplementedError(msg)

… means that “[pairwise] is ignored when True.” is a lie. It’s not ignored, it throws an error.

I've updated the message now to hopefully be clearer

src/anndata/_core/merge.py

…_merge

…ical

Co-authored-by: Miloš Mičík <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: amalia-k510 <[email protected]> (cherry picked from commit e88a6c2)

Co-authored-by: Miloš Mičík <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: amalia-k510 <[email protected]>

milos7250 and others added 11 commits February 12, 2025 17:22

Fix concat_on_disk

d31a982

[pre-commit.ci] auto fixes from pre-commit.com hooks

68b899a

for more information, see https://pre-commit.ci

Merge branch 'main' into main

8c3efb2

same-strict option

51e86ca

another try to fix the merge issue

06c033d

remove comment

e85109a

Merge branch 'main' into ig/fix_merge

ac4fc6a

wip: begin cleaning up

d5c0668

feat: add in merge for var{m,p}

298e115

Merge branch 'main' into ig/fix_merge

9f8920d

[pre-commit.ci] auto fixes from pre-commit.com hooks

59da975

for more information, see https://pre-commit.ci

ilan-gold added this to the 0.12.2 milestone Sep 23, 2025

ilan-gold added 2 commits September 23, 2025 16:45

fix: merge_strategy arg in test

96ef00a

Merge branch 'ig/fix_merge' of github.com:scverse/anndata into ig/fix…

c5a400c

…_merge

ilan-gold added the skip-gpu-ci label Sep 23, 2025

ilan-gold commented Sep 23, 2025

View reviewed changes

ilan-gold added 7 commits September 23, 2025 17:03

fix: extra merge_strategy

a96b6b7

chore: relnote

16ba557

fix: remove uns writing for now

7bc7890

Merge branch 'main' into ig/fix_merge

4fffdc8

Merge branch 'ig/fix_merge' of github.com:scverse/anndata into ig/fix…

a3c5052

…_merge

fix: lazy chunk comparison

bde7109

Merge branch 'main' into ig/fix_merge

10ea516

ilan-gold added run-gpu-ci and removed skip-gpu-ci labels Sep 26, 2025

ilan-gold added 2 commits September 26, 2025 18:27

fix: use *

0429ae5

Merge branch 'ig/fix_merge' of github.com:scverse/anndata into ig/fix…

7eb868d

…_merge

ilan-gold commented Sep 26, 2025

View reviewed changes

src/anndata/_core/merge.py Outdated Show resolved Hide resolved

github-actions bot removed the run-gpu-ci label Sep 26, 2025

ilan-gold marked this pull request as ready for review September 27, 2025 20:42

ilan-gold requested a review from flying-sheep September 29, 2025 09:40

flying-sheep requested changes Sep 29, 2025

View reviewed changes

flying-sheep reviewed Sep 29, 2025

View reviewed changes

src/anndata/_core/merge.py Show resolved Hide resolved

ilan-gold added 6 commits September 29, 2025 17:10

fix: use &

aa4d5bc

remove uns write

7a4a1af

Merge branch 'main' into ig/fix_merge

1d4cb87

fix: merge warning

bce364b

Merge branch 'ig/fix_merge' of github.com:scverse/anndata into ig/fix…

91234d0

…_merge

Merge branch 'main' into ig/fix_merge

cfc9369

ilan-gold added the skip-gpu-ci label Oct 1, 2025

ilan-gold added 3 commits October 1, 2025 11:59

fix: message

da2950f

Merge branch 'main' into ig/fix_merge

14d827a

fix: don't read into memory dense dask arrays if chunks are not ident…

c95898f

…ical

flying-sheep approved these changes Oct 2, 2025

View reviewed changes

ilan-gold merged commit e88a6c2 into main Oct 2, 2025
18 checks passed

ilan-gold deleted the ig/fix_merge branch October 2, 2025 09:34

lumberbot-app bot added the Still Needs Manual Backport label Oct 2, 2025

flying-sheep removed the Still Needs Manual Backport label Oct 2, 2025

scverse deleted a comment from lumberbot-app bot Oct 2, 2025

ilan-gold modified the milestones: 0.12.2, 0.12.3 Oct 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: merge strategies for `concat_on_disk` #2122

fix: merge strategies for `concat_on_disk` #2122

Uh oh!

ilan-gold commented Sep 16, 2025 •

edited

Loading

Uh oh!

codecov bot commented Sep 23, 2025 •

edited

Loading

Uh oh!

ilan-gold Sep 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

flying-sheep Sep 29, 2025 •

edited

Loading

Uh oh!

ilan-gold Sep 30, 2025

Uh oh!

flying-sheep Sep 30, 2025

Uh oh!

ilan-gold Oct 1, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		This is False by default, since the resulting arrays are often not meaningful, and is ignored when True.
		If you are interested in this feature, please open an issue.

	if pairwise:
	msg = "pairwise concatenation not yet implemented"
	raise NotImplementedError(msg)

fix: merge strategies for concat_on_disk #2122

fix: merge strategies for concat_on_disk #2122

Uh oh!

Conversation

ilan-gold commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ilan-gold Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

flying-sheep Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ilan-gold Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

flying-sheep Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

ilan-gold Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix: merge strategies for `concat_on_disk` #2122

fix: merge strategies for `concat_on_disk` #2122

ilan-gold commented Sep 16, 2025 •

edited

Loading

codecov bot commented Sep 23, 2025 •

edited

Loading

ilan-gold Sep 23, 2025 •

edited

Loading

flying-sheep Sep 29, 2025 •

edited

Loading