Skip to content

Conversation

ilan-gold
Copy link
Contributor

@ilan-gold ilan-gold commented Sep 16, 2025

I'm putting this in the soonest upcoming milestone because

a) I think it's a bug that we didn't have any sort of warning or anything about this and
b) It's in experimental and this isn't even an API change

As for the nature of the changes, I still need to look into why read_elem_lazy produces different results in tokenization for dask dense than read_elem on the off-axis. Otherwise, this could be fully lazy Bug resolved

@ilan-gold ilan-gold added this to the 0.12.2 milestone Sep 23, 2025
Copy link

codecov bot commented Sep 23, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.62%. Comparing base (5080856) to head (c95898f).
⚠️ Report is 2 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2122      +/-   ##
==========================================
- Coverage   85.70%   85.62%   -0.08%     
==========================================
  Files          46       46              
  Lines        7078     7083       +5     
==========================================
- Hits         6066     6065       -1     
- Misses       1012     1018       +6     
Files with missing lines Coverage Δ
src/anndata/_core/merge.py 85.11% <100.00%> (+0.08%) ⬆️
src/anndata/experimental/merge.py 87.73% <100.00%> (+0.53%) ⬆️

... and 3 files with indirect coverage changes

Comment on lines -382 to -384
if not alt_mapping:
alt_df = pd.DataFrame(index=alt_indices)
write_elem(output_group, alt_axis_name, alt_df)
Copy link
Contributor Author

@ilan-gold ilan-gold Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you might wonder why completely ignoring an argument and also writing to the wrong key for a given function wouldn't produce an error previously. Crazily enough it was that

  1. alt_mapping was always {} here because merge is always None in the tests
  2. We then write a dataframe (because not {} is True) to the wrong key alt_axis_name instead of f"{alt_axis_name}m" but its index is correct
  3. We never actually checked the off-axis dataframe in the tests because merge was None on the groundtruth as well for anndata.concat so all of the other columns were dropped in the groundtruth concatenation

@ilan-gold ilan-gold marked this pull request as ready for review September 27, 2025 20:42
Comment on lines 533 to 534
This is False by default, since the resulting arrays are often not meaningful, and is ignored when True.
If you are interested in this feature, please open an issue.
Copy link
Member

@flying-sheep flying-sheep Sep 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m confused: isn’t the behavior of this PR right now

  • raises NotImplementedError when set to True
  • {obs,var}p are merged when it’s set to false?

doesn’t seem to make sense to me!

Shouldn’t this PR instead

  • document this as working
  • remove the NotImplementedError
  • wrap the call to _write_alt_pairwise in if pairwise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I wanted to match the behavior of anndata.concat here so this PR raises NotImplementedError only for the pairwise mappings along the concatenation axis, but just like anndata.concat automatically merges the off-axis.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see _write_alt_pairwise is about the off-axis. Makes sense!

But the rest is still valid:

if pairwise:
msg = "pairwise concatenation not yet implemented"
raise NotImplementedError(msg)

… means that “[pairwise] is ignored when True.” is a lie. It’s not ignored, it throws an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the message now to hopefully be clearer

@ilan-gold ilan-gold merged commit e88a6c2 into main Oct 2, 2025
18 checks passed
@ilan-gold ilan-gold deleted the ig/fix_merge branch October 2, 2025 09:34
ilan-gold added a commit that referenced this pull request Oct 2, 2025
Co-authored-by: Miloš Mičík <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: amalia-k510 <[email protected]>
(cherry picked from commit e88a6c2)
@scverse scverse deleted a comment from lumberbot-app bot Oct 2, 2025
ilan-gold added a commit that referenced this pull request Oct 2, 2025
Co-authored-by: Miloš Mičík <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: amalia-k510 <[email protected]>
@ilan-gold ilan-gold modified the milestones: 0.12.2, 0.12.3 Oct 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Concatenating on disk doesn't respect the merge argument.

4 participants