Skip to content

[FEA] Support np.count_nonzero in device functions #831

@shwina

Description

@shwina

np.count_nonzero is unusable in device functions or kernels, where other reductions like sum work just fine:

import cupy as cp
import numba
import numba.cuda
import numpy as np


@numba.cuda.jit
def kernel(arr, out):
    i = numba.cuda.grid(1)
    if i < out.shape[0]:
        out[i] = np.count_nonzero(arr)


arr = cp.array([1, 0, 2, 0, 3], dtype=np.int64)                                                                                                                                                                       out = cp.zeros(1, dtype=np.int64)

kernel[1, 1](arr, out)
print(out)
Failed in cuda mode pipeline (step: nopython frontend)
- Resolution failure for literal arguments:
No implementation of function Function(<function _ol_array_allocate at 0x7b8a8c3c1ea0>) found for signature:

 >>> _ol_array_allocate(typeref[<class 'numba.core.types.npytypes.Array'>], int64, uint32)

There are 2 candidate implementations:
  - Of which 2 did not match due to:
  Overload in function '_ol_array_allocate': File: numba/cuda/np/arrayobj.py: Line 4760.
    With argument(s): '(typeref[<class 'numba.core.types.npytypes.Array'>], int64, uint32)':
   Rejected as the implementation raised a specific error:
     NumbaRuntimeError: Failed in cuda mode pipeline (step: cuda native lowering)
   NRT required but not enabled
   During: lowering "$8call_function.3 = call $2load_global.0(allocsize, align, func=$2load_global.0, args=[Var(allocsize, arrayobj.py:4766), Var(align, arrayobj.py:4766)], kws=(), vararg=None, varkwarg=None, target=None)" at /home/ashwin/workspace/cccl/.venv/lib/python3.10/site-packages/numba_cuda/numba/cuda/np/arrayobj.py (4767)
   During: Pass cuda_native_lowering
  raised from /home/ashwin/workspace/cccl/.venv/lib/python3.10/site-packages/numba_cuda/numba/cuda/memory_management/nrt_context.py:52

- Resolution failure for non-literal arguments:
None

During: resolving callee type: BoundFunction((typeref[<class 'numba.core.types.npytypes.Array'>], '_allocate') for typeref[<class 'numba.core.types.npytypes.Array'>])
During: typing of call at /home/ashwin/workspace/cccl/.venv/lib/python3.10/site-packages/numba_cuda/numba/cuda/np/arrayobj.py (4774)


File ".venv/lib/python3.10/site-packages/numba_cuda/numba/cuda/np/arrayobj.py", line 4774:
def _call_allocator(arrtype, size, align):
    <source elided>
    """Trampoline to call the intrinsic used for allocation"""
    return arrtype._allocate(size, align)
    ^

Can count_nonzero be supported as a device function?

Metadata

Metadata

Assignees

Labels

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions