feat(concat): aligned_axis_key_join for on-axis key joining#2416
feat(concat): aligned_axis_key_join for on-axis key joining#2416Ekin-Kahraman wants to merge 9 commits into
Conversation
Add a parameter to ad.concat() that separates how on-axis keys are joined (obs/var columns; obsm/obsp or varm/varp keys) from how off-axis indices are aligned. Default None falls back to join for backward compatibility. Closes scverse#2374
for more information, see https://pre-commit.ci
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2416 +/- ##
==========================================
+ Coverage 85.64% 85.73% +0.09%
==========================================
Files 49 49
Lines 7766 7840 +74
==========================================
+ Hits 6651 6722 +71
- Misses 1115 1118 +3
|
Add coverage for the unimplemented branch in _concat_aligned_mapping_split_join where awkward arrays meet inner content-join with missing keys, lifting codecov/patch above target.
ilan-gold
left a comment
There was a problem hiding this comment.
Two things that stick out to me about this:
- I think I didn't fully consider things - I suppose
layersare aligned indeed (along both axes actually) and would benefit from control over key-based joins. So they should probably be handled here as well identically. In other words,aligned_join == "inner"does an inner join of the keys usinginner_concat_aligned_mappingand vice-versa forouter. - With or without the above, why can't we just reuse the current functions that are renamed to
concat_aligned_mappingfor inner/outer joining?
…_join to layers Addresses the two review points on scverse#2416: 1. Reuse the existing helpers instead of a separate split helper. Adds `keys: Iterable | None = None` to `inner_concat_aligned_mapping` and `outer_concat_aligned_mapping`. Default behaviour is unchanged (`intersect_keys` / `union_keys` respectively). When the caller passes an explicit `keys` set, the iterated key set is overridden; `inner_concat_aligned_mapping` additionally takes `fill_value` and handles entries missing from a subset of mappings (the outer-key + inner-content combination). Drops `_concat_aligned_mapping_split_join`; the obsm/varm callsite in `concat()` collapses to a single dispatch. 2. Layers now respect aligned_axis_key_join. The on-axis layer-name set is controlled by `aligned_axis_key_join`; the off-axis (alt-axis) alignment of each kept layer still follows `join` via the precomputed X-axis reindexers. Subtlety: in the `join="inner"` + `aligned_axis_key_join="outer"` path, the missing-key inner branch must honour the caller's reindexers rather than regenerating from `present_els` only. The present-only path would intersect over the present subset, leaving one-sided layers at their original alt-axis width and breaking the AnnData invariant that every layer shares X's alt-axis. The helper now takes the precomputed `reindexers[i]` for present entries and inserts an identity Reindexer for missing ones; the `missing_element` filler uses the matching alt-axis size. Tests: replaces `test_aligned_axis_key_join_does_not_affect_layers` with two tests asserting layers do follow the new contract: - `outer` key join + `inner` content: layer names unioned, all kept layers shaped to the inner alt-axis intersection (with content spot-checked for fill behaviour); - `inner` key join + `outer` content: layer names intersected, kept layer shaped to the outer alt-axis union. Existing 13 obsm/varm/obsp tests unchanged. Default `aligned_axis_key_join=None` still routes to the historical single-knob behaviour. `concat_on_disk` remains out of scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures the user-facing addition of `aligned_axis_key_join` for towncrier-style aggregation. Mirrors the format of the other `*.feat.md` fragments under docs/release-notes/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Both points addressed, main merged in. Helpers reused — |
…ey-join # Conflicts: # src/anndata/_core/merge.py
…verse#1707 scverse#1707 (feat!: Unify X and layers) moved X into the layers mapping under the `None` key. Two aligned_axis_key_join layer-key tests asserted `sorted(res.layers.keys())` against named keys only, which now fails with TypeError when None is present. Switched to set comparison and explicitly included the `None` (X) key in both the outer-union and inner-intersection expectations.
|
Rebased on main to pick up #1707. Two small conflicts in Two of my layer-key tests broke because CI green at |
| merge = resolve_merge_strategy(merge) | ||
| uns_merge = resolve_merge_strategy(uns_merge) | ||
|
|
||
| if aligned_axis_key_join is not None and aligned_axis_key_join not in ( |
There was a problem hiding this comment.
We have a JoinT: Literal you can use to get the values using typing.get_args
| layer_mappings = [a.layers for a in adatas] | ||
| layers = concat_aligned_mapping( | ||
| [a.layers for a in adatas], axis=axis, reindexers=reindexers | ||
| layer_mappings, | ||
| axis=axis, |
There was a problem hiding this comment.
| layer_mappings = [a.layers for a in adatas] | |
| layers = concat_aligned_mapping( | |
| [a.layers for a in adatas], axis=axis, reindexers=reindexers | |
| layer_mappings, | |
| axis=axis, | |
| layers = concat_aligned_mapping( | |
| [a.layers for a in adatas], | |
| axis=axis, |
| concat_axis=None, | ||
| fill_value=None, | ||
| force_lazy: bool = False, | ||
| keys=None, |
There was a problem hiding this comment.
Instead of adding another kwarg (which can have unintended side effects if the caller doesn't know the behavior), let's make this argument required by the caller so things are explicit and all the "default" logic is lifted up into one place
|
Thanks, addressed these review points:
Local checks: python -m pytest tests/test_concatenate.py -k aligned_axis_key_join
# 15 passed
python -m pytest tests/test_anncollection.py
# 5 passed
python -m ruff check src/anndata/_core/merge.py src/anndata/experimental/multi_files/_anncollection.py
# passed
python -m ruff format --check src/anndata/_core/merge.py src/anndata/experimental/multi_files/_anncollection.py
# passedThe remaining red checks on the PR appear to be maintainer-controlled triage gates: labels/milestone and GPU permission. |
Closes #2374. Adds
aligned_axis_key_jointoconcat()for controlling on-axis annotation-key joining (obs columns, obsm/obsp keys, symmetric over var, and layers keys) independently of the off-axis indexjoin.API
aligned_axis_key_join: Literal["inner", "outer"] | None = NoneNonefalls back tojoin, preserving existing behaviour"inner","outer", orNoneonlyScope
axis="var"join)rawrecursive concatconcat_on_diskskipped (perobsi.e.,axis=0column joining +anndata.concatis underspecified #2374 thread)Implementation
The existing
inner_concat_aligned_mappingandouter_concat_aligned_mappinghelpers now take akeys=parameter (defaulting tointersect_keys/union_keysrespectively, so existing callers are unchanged).inner_concat_aligned_mappingadditionally takesfill_valueand handles the missing-key case for the outer-key + inner-content combination, honouring caller-providedreindexersso layers stay aligned to X's alt-axis.Tests (15 under
aligned_axis_key_joinblock)joinfor both axes (parametrised obs/var)aligned_axis_key_joinon both axes (keys union under outer + inner content; keys intersect under inner + outer content), with shape and content spot-checksmergestrategiesjoinValueErrorNotImplementedErrorwith a clear messageobsi.e.,axis=0column joining +anndata.concatis underspecified #2374 example