fix: temporarily use unary_transform instead of segmented_reduce#3814
fix: temporarily use unary_transform instead of segmented_reduce#3814maxymnaumchyk wants to merge 17 commits intoscikit-hep:mainfrom
Conversation
|
temporary replacement until NVIDIA/cccl#6171 is fixed. Also, a relevant PR with discussion: #3763 |
Codecov Report❌ Patch coverage is
Additional details and impacted files
🚀 New features to boost your workflow:
|
|
The documentation preview is ready to be viewed at http://preview.awkward-array.org.s3-website.us-east-1.amazonaws.com/PR3814 |
|
Hello @shwina! I'm running into another error related to the NVIDIA/cccl#7121 fix. This time it's in the The easiest way to reproduce this: What is interesting I can't reproduce this error directly on the main branch of cccl. Has it already been fixed? Please, check it out. Here is the full error: |
|
Fix in NVIDIA/cccl#7321. We'll push out a release today with this fix so that you don't have to work off of |
|
and thanks to you too! |
|
Hello @shwina! I'm getting another error :( Do you know what might cause it? I'm using Full error code if you need it: |
|
@maxymnaumchyk -- thanks, could you tell me how to reproduce what you're seeing? I tried the following (on your branch): and it seemed to complete without errors. |
|
I can see I'm clearly missing a step since CI is failing :) |
|
Oh - I see the problem. CI is pulling in Can you try with the constraint edit: In the mean time I'll update our own constraints. |
|
yes, thanks Ashwin! It was indeed the problem in the versions of my packages. I'll take a deeper look into it tomorrow~ |
|
@maxymnaumchyk for now, can you bypass It's cheating, but unfortunately RAPIDS won't be relaxing their numba-cuda pins until their next release :( Alternately, if you think this is more appropriate, that's fine with me too:
|
|
Right now, this implementation works, except very slowly. Running this script: Shows:
|
|
The issue here is that the One fix would be to pass the same We have a better solution for this in |
|
Thanks @shwina! That's good to know. Meanwhile, I'll try to figure out how to pass |
|
With the latest import awkward as ak
import cupy as cp
import timeit
awkward_array = ak.Array([[1], [2, 3], [4, 5], [6, 7, 1, 8], [], [9]], backend = 'cuda')
# first time, ak.argmax:
_ = ak.argmax(awkward_array, axis=1) # warmup
start_time = timeit.default_timer()
for i in range(10):
expect = ak.argmax(awkward_array, axis=1)
cp.cuda.Device().synchronize()
end_time = timeit.default_timer()
print(f"Time taken for ak.argmax: {(end_time - start_time) / 10} seconds") |
|
awesome! do you have a planned release? |
Yes, should be available on pip/conda now! Thanks! |
|
Hello @shwina, there is currently a bug(?) with how argmin will use the precompiled |
|
Thanks @maxymnaumchyk. Yes it's definitely a bug. I'm looking into it. |




No description provided.