Skip to content

Conversation

ilan-gold
Copy link
Contributor

@ilan-gold ilan-gold marked this pull request as draft September 16, 2025 14:59
Copy link

codecov bot commented Sep 16, 2025

Codecov Report

❌ Patch coverage is 90.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 85.61%. Comparing base (9da064f) to head (81004c5).
⚠️ Report is 2 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/anndata/_io/specs/lazy_methods.py 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2121      +/-   ##
==========================================
+ Coverage   85.42%   85.61%   +0.18%     
==========================================
  Files          46       46              
  Lines        7081     7083       +2     
==========================================
+ Hits         6049     6064      +15     
+ Misses       1032     1019      -13     
Files with missing lines Coverage Δ
src/anndata/_core/aligned_df.py 96.66% <100.00%> (+0.17%) ⬆️
src/anndata/_core/merge.py 85.13% <100.00%> (+0.02%) ⬆️
src/anndata/experimental/backed/_lazy_arrays.py 91.59% <100.00%> (ø)
src/anndata/_io/specs/lazy_methods.py 95.45% <75.00%> (-0.70%) ⬇️

... and 4 files with indirect coverage changes

@ilan-gold ilan-gold added this to the 0.12.2 milestone Sep 16, 2025
Copy link

scverse-benchmark bot commented Sep 16, 2025

Benchmark changes

Change Before [e88a6c2] After [7a35269] Ratio Benchmark (Parameter)
- 3.49±0.01s 629±4ms 0.18 dataset2d.Dataset2D.time_concat(<function Dataset2D.> (0), (-1,))
- 3.50±0.02s 631±0.9ms 0.18 dataset2d.Dataset2D.time_concat(<function Dataset2D.> (0), None)
- 4.12±0.2s 1.52±0.01s 0.37 dataset2d.Dataset2D.time_concat(<function Dataset2D.> (1), (-1,))
- 5.10±0.02s 1.55±0.01s 0.3 dataset2d.Dataset2D.time_concat(<function Dataset2D.> (1), None)
+ 9.98±0.1ms 12.6±0.2ms 1.26 dataset2d.Dataset2D.time_getitem_slice(<function Dataset2D.> (0), (-1,))

Comparison: https://github.com/scverse/anndata/compare/e88a6c2397ccc199eb8265e075e914fc53e8abb1..7a35269b955a90f3d4407c1be3877b3a201c3d16
Last changed: Thu, 2 Oct 2025 14:24:41 +0000

More details: https://github.com/scverse/anndata/pull/2121/checks?check_run_id=51800151862

@ilan-gold ilan-gold marked this pull request as ready for review October 1, 2025 15:53
@ilan-gold ilan-gold requested a review from flying-sheep October 2, 2025 14:00
@ilan-gold ilan-gold modified the milestones: 0.12.2, 0.12.3 Oct 15, 2025
Copy link
Member

@flying-sheep flying-sheep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the idea behind the uuid? Why is this the right thing to do for this use case and not other uses of map_blocks? Is there anything in the dask docs that recommends this pattern for certain use cases?

@ilan-gold
Copy link
Contributor Author

Can you explain the idea behind the uuid?

I just want to be 100% sure the name I create is unique. From the docs:

The key name to use for the output array. Note that this fully specifies the output key name, and must be unique. If not provided, will be determined by a hash of the arguments.

Why is this the right thing to do for this use case

I want to avoid the tokenization of the input function, which appears to be expensive in certain cases. I had originally added name everywhere we do map_blocks but this was overkill it seems and does not seem to have affected performance.

Copy link
Member

@flying-sheep flying-sheep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see! Weird that adding it everywhere didn’t affect performance if adding it here does. Maybe we can form an intuition on where it’s expensive so we don’t have to guess/measure as much?

@ilan-gold
Copy link
Contributor Author

Requested review based on the last non-merge commit @flying-sheep because I added an optimization for something I noticed in #2156 , although I don't know why it fails for Selman (it worked for me but was slow, hence the additional commit after your approval)

@ilan-gold ilan-gold force-pushed the ig/accelerate_map_blocks branch from 017b829 to c29d7f1 Compare October 19, 2025 10:21
@ilan-gold ilan-gold force-pushed the ig/accelerate_map_blocks branch from c29d7f1 to 3bc4ee2 Compare October 19, 2025 10:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ad.concat is slow on lazy data on account of tokenize

2 participants