Open
Description
With cupy/cupy#4322 merged in we can begin profiling/benchmarking performance of more workers per GPU.
Setup
- Build latest cupy
python -m pip install .
- CUPY_CUDA_PER_THREAD_DEFAULT_STREAM=1 dask-cuda-worker --nthreads=2
- Use at least 4 GPUs
- test
nthreads
: 1, 2, 4, 8, 32`
Test Operations
- sum
- mean
- svd
- Matrix Multiply
- Array Slicing
- transpose + sum: (x + x.T).sum()
Note @pentschev previously tested these operations on a single GPU:
- https://github.com/pentschev/pybench/blob/89d65a6c418a1fee39d447bd11b8a999835b74a9/pybench/benchmarks/benchmark_array.py
- https://github.com/pentschev/pybench/blob/master/examples/plot_array_example.ipynb
Example of CuPy with Dask
import cupy
import dask.array as da
rs = da.random.RandomState(RandomState=cupy.random.RandomState)
x = rs.normal(10, 1, size=(500000, 500000), chunks=(10000, 10000))
x.sum().compute()
Metadata
Metadata
Assignees
Type
Projects
Status
No status