-
Notifications
You must be signed in to change notification settings - Fork 61
[FEA] Support np.count_nonzero in device functions #831
Copy link
Copy link
Open
Description
np.count_nonzero is unusable in device functions or kernels, where other reductions like sum work just fine:
import cupy as cp
import numba
import numba.cuda
import numpy as np
@numba.cuda.jit
def kernel(arr, out):
i = numba.cuda.grid(1)
if i < out.shape[0]:
out[i] = np.count_nonzero(arr)
arr = cp.array([1, 0, 2, 0, 3], dtype=np.int64) out = cp.zeros(1, dtype=np.int64)
kernel[1, 1](arr, out)
print(out)Failed in cuda mode pipeline (step: nopython frontend)
- Resolution failure for literal arguments:
No implementation of function Function(<function _ol_array_allocate at 0x7b8a8c3c1ea0>) found for signature:
>>> _ol_array_allocate(typeref[<class 'numba.core.types.npytypes.Array'>], int64, uint32)
There are 2 candidate implementations:
- Of which 2 did not match due to:
Overload in function '_ol_array_allocate': File: numba/cuda/np/arrayobj.py: Line 4760.
With argument(s): '(typeref[<class 'numba.core.types.npytypes.Array'>], int64, uint32)':
Rejected as the implementation raised a specific error:
NumbaRuntimeError: Failed in cuda mode pipeline (step: cuda native lowering)
NRT required but not enabled
During: lowering "$8call_function.3 = call $2load_global.0(allocsize, align, func=$2load_global.0, args=[Var(allocsize, arrayobj.py:4766), Var(align, arrayobj.py:4766)], kws=(), vararg=None, varkwarg=None, target=None)" at /home/ashwin/workspace/cccl/.venv/lib/python3.10/site-packages/numba_cuda/numba/cuda/np/arrayobj.py (4767)
During: Pass cuda_native_lowering
raised from /home/ashwin/workspace/cccl/.venv/lib/python3.10/site-packages/numba_cuda/numba/cuda/memory_management/nrt_context.py:52
- Resolution failure for non-literal arguments:
None
During: resolving callee type: BoundFunction((typeref[<class 'numba.core.types.npytypes.Array'>], '_allocate') for typeref[<class 'numba.core.types.npytypes.Array'>])
During: typing of call at /home/ashwin/workspace/cccl/.venv/lib/python3.10/site-packages/numba_cuda/numba/cuda/np/arrayobj.py (4774)
File ".venv/lib/python3.10/site-packages/numba_cuda/numba/cuda/np/arrayobj.py", line 4774:
def _call_allocator(arrtype, size, align):
<source elided>
"""Trampoline to call the intrinsic used for allocation"""
return arrtype._allocate(size, align)
^
Can count_nonzero be supported as a device function?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
feature requestNew feature or requestNew feature or request