-
Notifications
You must be signed in to change notification settings - Fork 175
fix: merge strategies for concat_on_disk
#2122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2122 +/- ##
==========================================
- Coverage 85.70% 85.62% -0.08%
==========================================
Files 46 46
Lines 7078 7083 +5
==========================================
- Hits 6066 6065 -1
- Misses 1012 1018 +6
|
if not alt_mapping: | ||
alt_df = pd.DataFrame(index=alt_indices) | ||
write_elem(output_group, alt_axis_name, alt_df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you might wonder why completely ignoring an argument and also writing to the wrong key for a given function wouldn't produce an error previously. Crazily enough it was that
alt_mapping
was always{}
here becausemerge
is alwaysNone
in the tests- We then write a dataframe (because
not {}
isTrue
) to the wrong keyalt_axis_name
instead off"{alt_axis_name}m"
but its index is correct - We never actually checked the off-axis dataframe in the tests because
merge
wasNone
on the groundtruth as well foranndata.concat
so all of the other columns were dropped in the groundtruth concatenation
src/anndata/experimental/merge.py
Outdated
This is False by default, since the resulting arrays are often not meaningful, and is ignored when True. | ||
If you are interested in this feature, please open an issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m confused: isn’t the behavior of this PR right now
- raises NotImplementedError when set to
True
{obs,var}p
are merged when it’s set to false?
doesn’t seem to make sense to me!
Shouldn’t this PR instead
- document this as working
- remove the NotImplementedError
- wrap the call to
_write_alt_pairwise
inif pairwise
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I wanted to match the behavior of anndata.concat
here so this PR raises NotImplementedError
only for the pairwise mappings along the concatenation axis, but just like anndata.concat
automatically merges the off-axis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see _write_alt_pairwise
is about the off-axis. Makes sense!
But the rest is still valid:
anndata/src/anndata/experimental/merge.py
Lines 580 to 582 in 1d4cb87
if pairwise: | |
msg = "pairwise concatenation not yet implemented" | |
raise NotImplementedError(msg) |
… means that “[pairwise] is ignored when True.” is a lie. It’s not ignored, it throws an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the message now to hopefully be clearer
Co-authored-by: Miloš Mičík <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: amalia-k510 <[email protected]> (cherry picked from commit e88a6c2)
Co-authored-by: Miloš Mičík <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: amalia-k510 <[email protected]>
I'm putting this in the soonest upcoming milestone because
a) I think it's a bug that we didn't have any sort of warning or anything about this and
b) It's in
experimental
and this isn't even an API changeAs for the nature of the changes, I still need to look into whyBug resolvedread_elem_lazy
produces different results in tokenization for dask dense thanread_elem
on the off-axis. Otherwise, this could be fully lazymerge
argument. #2110 and part of concat_on_disk fails to write alternative axis mapping and uns #1854