Skip to content

Can't run cugraph benchmarks on 16 & 32GB cards #3541

Open
@randerzander

Description

@randerzander

Version

23.06 nightly

Which installation method(s) does this occur on?

Conda

Describe the bug.

When attempting to run cugraph's benchmarks on t4s and 32GB V100s, several of the benchmarks fail with OOMs on cudf.read_csv calls. Could these be replaced w/ dask_cudf.read_csv instead?

cd /repos/cugraph/benchmarks/cugraph/pytest-based
pytest -sv algos.py

Sample trace:

>   ???  
E   MemoryError: std::bad_alloc: out_of_memory: RMM failure at:/opt/conda/envs/rapids/include/rmm/mr/device/pool_memory_resource.hpp:196: Maximum pool size exceeded

csv.pyx:426: MemoryError
_ ERROR at setup of bench_bfs[ds=/datasets/csv/undirected/hollywood.csv,mm=False,pa=True] _

request = <SubRequest 'anyGraphWithAdjListComputed' for <Function bench_bfs[ds=/datasets/csv/undirected/hollywood.csv,mm=False,pa=True]>>

    @pytest.fixture(scope="module",
                    params=fixture_params)
    def anyGraphWithAdjListComputed(request):
        """
        Create a Graph (directed or undirected) obj based on the param, compute the
        adjacency list and return it.
        """
        setFixtureParamNames(request, ["dataset", "managed_mem", "pool_allocator"])
        csvFileName = request.param[0]
        reinitRMM(request.param[1], request.param[2])
    
>       G = createGraph(csvFileName)

bench_algos.py:159: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _  
bench_algos.py:69: in createGraph
    gdf = utils.read_csv_file(csvFileName)
/opt/conda/envs/rapids/lib/python3.10/site-packages/cugraph/testing/utils.py:228: in read_csv_file
    return cudf.read_csv(
/opt/conda/envs/rapids/lib/python3.10/site-packages/nvtx/nvtx.py:101: in inner
    result = func(*args, **kwargs)
/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/io/csv.py:88: in read_csv
    df = libcudf.csv.read_csv(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _  

>   ???  
E   MemoryError: std::bad_alloc: out_of_memory: RMM failure at:/opt/conda/envs/rapids/include/rmm/mr/device/pool_memory_resource.hpp:196: Maximum pool size exceeded

Minimum reproducible example

No response

Relevant log output

No response

Environment details

No response

Other/Misc.

No response

Code of Conduct

  • I agree to follow cuGraph's Code of Conduct
  • I have searched the open bugs and have found no duplicates for this bug report

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinggraph-devopsIssues for the graph-devops team

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions