Skip to content

feat(concat): aligned_axis_key_join for on-axis key joining#2416

Open
Ekin-Kahraman wants to merge 9 commits into
scverse:mainfrom
Ekin-Kahraman:feat/aligned-axis-key-join
Open

feat(concat): aligned_axis_key_join for on-axis key joining#2416
Ekin-Kahraman wants to merge 9 commits into
scverse:mainfrom
Ekin-Kahraman:feat/aligned-axis-key-join

Conversation

@Ekin-Kahraman
Copy link
Copy Markdown

@Ekin-Kahraman Ekin-Kahraman commented May 3, 2026

Closes #2374. Adds aligned_axis_key_join to concat() for controlling on-axis annotation-key joining (obs columns, obsm/obsp keys, symmetric over var, and layers keys) independently of the off-axis index join.

API

  • New parameter: aligned_axis_key_join: Literal["inner", "outer"] | None = None
  • Default None falls back to join, preserving existing behaviour
  • Validates against "inner", "outer", or None only

Scope

  • ✅ obs columns
  • ✅ obsm keys (the on-axis aligned mapping)
  • ✅ obsp keys (pairwise mapping)
  • ✅ Symmetric var/varm/varp when axis="var"
  • ✅ Layers keys (off-axis content alignment of each kept layer still follows join)
  • ✅ Forwards to raw recursive concat
  • concat_on_disk skipped (per obs i.e., axis=0 column joining + anndata.concat is underspecified #2374 thread)

Implementation

The existing inner_concat_aligned_mapping and outer_concat_aligned_mapping helpers now take a keys= parameter (defaulting to intersect_keys / union_keys respectively, so existing callers are unchanged). inner_concat_aligned_mapping additionally takes fill_value and handles the missing-key case for the outer-key + inner-content combination, honouring caller-provided reindexers so layers stay aligned to X's alt-axis.

Tests (15 under aligned_axis_key_join block)

  • Default fallback to join for both axes (parametrised obs/var)
  • Outer key join with inner content join, and reverse
  • Layers follow aligned_axis_key_join on both axes (keys union under outer + inner content; keys intersect under inner + outer content), with shape and content spot-checks
  • Alt-axis mappings (varm) unaffected when concatenating along obs across all merge strategies
  • obsp pairwise paths
  • DataFrame-content obsm following join
  • Inner content with missing keys (DataFrame + ndarray branches)
  • Invalid value raises ValueError
  • Awkward arrays + missing-key + inner-content raises NotImplementedError with a clear message
  • Direct reproduction of the obs i.e., axis=0 column joining + anndata.concat is underspecified #2374 example

Ekin-Kahraman and others added 2 commits May 3, 2026 19:39
Add a parameter to ad.concat() that separates how on-axis keys are
joined (obs/var columns; obsm/obsp or varm/varp keys) from how
off-axis indices are aligned. Default None falls back to join
for backward compatibility.

Closes scverse#2374
@codecov
Copy link
Copy Markdown

codecov Bot commented May 3, 2026

Codecov Report

❌ Patch coverage is 93.75000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.73%. Comparing base (39db433) to head (c47a97f).
⚠️ Report is 10 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/anndata/_core/merge.py 95.65% 2 Missing ⚠️
...anndata/experimental/multi_files/_anncollection.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2416      +/-   ##
==========================================
+ Coverage   85.64%   85.73%   +0.09%     
==========================================
  Files          49       49              
  Lines        7766     7840      +74     
==========================================
+ Hits         6651     6722      +71     
- Misses       1115     1118       +3     
Files with missing lines Coverage Δ
...anndata/experimental/multi_files/_anncollection.py 71.22% <50.00%> (-0.15%) ⬇️
src/anndata/_core/merge.py 85.52% <95.65%> (+0.54%) ⬆️

... and 8 files with indirect coverage changes

Add coverage for the unimplemented branch in _concat_aligned_mapping_split_join
where awkward arrays meet inner content-join with missing keys, lifting
codecov/patch above target.
Copy link
Copy Markdown
Contributor

@ilan-gold ilan-gold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things that stick out to me about this:

  1. I think I didn't fully consider things - I suppose layers are aligned indeed (along both axes actually) and would benefit from control over key-based joins. So they should probably be handled here as well identically. In other words, aligned_join == "inner" does an inner join of the keys using inner_concat_aligned_mapping and vice-versa for outer.
  2. With or without the above, why can't we just reuse the current functions that are renamed to concat_aligned_mapping for inner/outer joining?

Ekin-Kahraman and others added 3 commits May 5, 2026 21:27
…_join to layers

Addresses the two review points on scverse#2416:

1. Reuse the existing helpers instead of a separate split helper.
   Adds `keys: Iterable | None = None` to `inner_concat_aligned_mapping`
   and `outer_concat_aligned_mapping`. Default behaviour is unchanged
   (`intersect_keys` / `union_keys` respectively). When the caller
   passes an explicit `keys` set, the iterated key set is overridden;
   `inner_concat_aligned_mapping` additionally takes `fill_value` and
   handles entries missing from a subset of mappings (the
   outer-key + inner-content combination).
   Drops `_concat_aligned_mapping_split_join`; the obsm/varm callsite
   in `concat()` collapses to a single dispatch.

2. Layers now respect aligned_axis_key_join. The on-axis layer-name
   set is controlled by `aligned_axis_key_join`; the off-axis (alt-axis)
   alignment of each kept layer still follows `join` via the precomputed
   X-axis reindexers.

   Subtlety: in the `join="inner"` + `aligned_axis_key_join="outer"`
   path, the missing-key inner branch must honour the caller's
   reindexers rather than regenerating from `present_els` only. The
   present-only path would intersect over the present subset, leaving
   one-sided layers at their original alt-axis width and breaking the
   AnnData invariant that every layer shares X's alt-axis. The helper
   now takes the precomputed `reindexers[i]` for present entries and
   inserts an identity Reindexer for missing ones; the
   `missing_element` filler uses the matching alt-axis size.

Tests: replaces `test_aligned_axis_key_join_does_not_affect_layers`
with two tests asserting layers do follow the new contract:
- `outer` key join + `inner` content: layer names unioned, all kept
  layers shaped to the inner alt-axis intersection (with content
  spot-checked for fill behaviour);
- `inner` key join + `outer` content: layer names intersected, kept
  layer shaped to the outer alt-axis union.

Existing 13 obsm/varm/obsp tests unchanged. Default
`aligned_axis_key_join=None` still routes to the historical
single-knob behaviour. `concat_on_disk` remains out of scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures the user-facing addition of `aligned_axis_key_join` for
towncrier-style aggregation. Mirrors the format of the other
`*.feat.md` fragments under docs/release-notes/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Ekin-Kahraman
Copy link
Copy Markdown
Author

Both points addressed, main merged in. Helpers reused — inner/outer aligned-mapping helpers take a keys= argument now and the split helper is gone. Layers also respect aligned_axis_key_join; the only subtlety was making the missing-key inner path honour the existing reindexers so layers stay aligned to X.

…ey-join

# Conflicts:
#	src/anndata/_core/merge.py
…verse#1707

scverse#1707 (feat!: Unify X and layers) moved X into the layers mapping under
the `None` key. Two aligned_axis_key_join layer-key tests asserted
`sorted(res.layers.keys())` against named keys only, which now fails
with TypeError when None is present. Switched to set comparison and
explicitly included the `None` (X) key in both the outer-union and
inner-intersection expectations.
@Ekin-Kahraman
Copy link
Copy Markdown
Author

Rebased on main to pick up #1707. Two small conflicts in merge.py: kept your new Default("inner") signature alongside aligned_axis_key_join, and dropped the old X = concat_Xs(...) line since X lives in layers now.

Two of my layer-key tests broke because sorted() doesn't like None and strings mixed. Switched them to set comparison and included the None (X) key in the expected sets.

CI green at dac79dd7. Ready for another look.

Comment thread src/anndata/_core/merge.py Outdated
merge = resolve_merge_strategy(merge)
uns_merge = resolve_merge_strategy(uns_merge)

if aligned_axis_key_join is not None and aligned_axis_key_join not in (
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a JoinT: Literal you can use to get the values using typing.get_args

Comment on lines +1941 to +1944
layer_mappings = [a.layers for a in adatas]
layers = concat_aligned_mapping(
[a.layers for a in adatas], axis=axis, reindexers=reindexers
layer_mappings,
axis=axis,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
layer_mappings = [a.layers for a in adatas]
layers = concat_aligned_mapping(
[a.layers for a in adatas], axis=axis, reindexers=reindexers
layer_mappings,
axis=axis,
layers = concat_aligned_mapping(
[a.layers for a in adatas],
axis=axis,

Comment thread src/anndata/_core/merge.py Outdated
concat_axis=None,
fill_value=None,
force_lazy: bool = False,
keys=None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding another kwarg (which can have unintended side effects if the caller doesn't know the behavior), let's make this argument required by the caller so things are explicit and all the "default" logic is lifted up into one place

@Ekin-Kahraman
Copy link
Copy Markdown
Author

Thanks, addressed these review points:

  • validation now uses the existing Join_T literal values via JOIN_OPTIONS
  • inner_concat_aligned_mapping / outer_concat_aligned_mapping now require an explicit keys= argument, so the default key-selection logic is lifted to the call sites
  • simplified the layers/obsm call site and removed the extra explanatory comments
  • updated the AnnCollection obsm call site to pass intersect_keys(...) explicitly after making keys required

Local checks:

python -m pytest tests/test_concatenate.py -k aligned_axis_key_join
# 15 passed

python -m pytest tests/test_anncollection.py
# 5 passed

python -m ruff check src/anndata/_core/merge.py src/anndata/experimental/multi_files/_anncollection.py
# passed

python -m ruff format --check src/anndata/_core/merge.py src/anndata/experimental/multi_files/_anncollection.py
# passed

The remaining red checks on the PR appear to be maintainer-controlled triage gates: labels/milestone and GPU permission.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

obs i.e., axis=0 column joining + anndata.concat is underspecified

2 participants