-
Notifications
You must be signed in to change notification settings - Fork 175
perf: use name
in map_blocks
to bypass tokenization
#2121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2121 +/- ##
==========================================
+ Coverage 85.42% 85.61% +0.18%
==========================================
Files 46 46
Lines 7081 7083 +2
==========================================
+ Hits 6049 6064 +15
+ Misses 1032 1019 -13
|
Benchmark changes
Comparison: https://github.com/scverse/anndata/compare/e88a6c2397ccc199eb8265e075e914fc53e8abb1..7a35269b955a90f3d4407c1be3877b3a201c3d16 More details: https://github.com/scverse/anndata/pull/2121/checks?check_run_id=51800151862 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain the idea behind the uuid
? Why is this the right thing to do for this use case and not other uses of map_blocks
? Is there anything in the dask docs that recommends this pattern for certain use cases?
I just want to be 100% sure the name I create is unique. From the docs:
I want to avoid the tokenization of the input function, which appears to be expensive in certain cases. I had originally added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see! Weird that adding it everywhere didn’t affect performance if adding it here does. Maybe we can form an intuition on where it’s expensive so we don’t have to guess/measure as much?
Requested review based on the last non-merge commit @flying-sheep because I added an optimization for something I noticed in #2156 , although I don't know why it fails for Selman (it worked for me but was slow, hence the additional commit after your approval) |
017b829
to
c29d7f1
Compare
c29d7f1
to
3bc4ee2
Compare
ad.concat
is slow on lazy data on account oftokenize
#1989