Skip to content

PTDS Benchmarks #517

Open
Open
@quasiben

Description

@quasiben

With cupy/cupy#4322 merged in we can begin profiling/benchmarking performance of more workers per GPU.

Setup

  • Build latest cupy python -m pip install .
  • CUPY_CUDA_PER_THREAD_DEFAULT_STREAM=1 dask-cuda-worker --nthreads=2
  • Use at least 4 GPUs
  • test nthreads: 1, 2, 4, 8, 32`

Test Operations

  • sum
  • mean
  • svd
  • Matrix Multiply
  • Array Slicing
  • transpose + sum: (x + x.T).sum()

Note @pentschev previously tested these operations on a single GPU:

Example of CuPy with Dask

import cupy
import dask.array as da

rs = da.random.RandomState(RandomState=cupy.random.RandomState) 
x = rs.normal(10, 1, size=(500000, 500000), chunks=(10000, 10000))
x.sum().compute()

cc @charlesbluca

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions