Skip to content

[BUG] failed cudf JNI test: testCreateAdaptors cudaErrorInvalidValue invalid argument in cuda12.8+driver 535.xx #3044

Open
@pxLi

Description

@pxLi

Describe the bug
first seen in spark-rapids-jni_nightly-dev, run:1045

This failure currently failed only in cuda12.8-arm64 test (cuda12.8 runtime image on an arm instance has 535 driver)
cudf sha: rapidsai/cudf@cf5edd0
rmm sha: rapidsai/rmm@7f0cead

[2025-03-19T04:46:13.327Z] [ERROR] testCreateAdaptors  Time elapsed: 0.027 s  <<< ERROR!
[2025-03-19T04:46:13.327Z] ai.rapids.cudf.CudfException: CUDA error at: /home/jenkins/agent/workspace/spark-rapids-jni_nightly-dev/target/libcudf/cmake-build/_deps/rmm-src/include/rmm/mr/device/cuda_async_memory_resource.hpp:120: cudaErrorInvalidValue invalid argument
[2025-03-19T04:46:13.327Z] 	at ai.rapids.cudf.Rmm.newCudaAsyncMemoryResource(Native Method)
[2025-03-19T04:46:13.327Z] 	at ai.rapids.cudf.RmmCudaAsyncMemoryResource.<init>(RmmCudaAsyncMemoryResource.java:46)
[2025-03-19T04:46:13.327Z] 	at ai.rapids.cudf.RmmCudaAsyncMemoryResource.<init>(RmmCudaAsyncMemoryResource.java:33)
[2025-03-19T04:46:13.327Z] 	at ai.rapids.cudf.RmmTest.testCreateAdaptors(RmmTest.java:61)
[2025-03-19T04:46:13.327Z] 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[2025-03-19T04:46:13.327Z] 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[2025-03-19T04:46:13.327Z] 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[2025-03-19T04:46:13.327Z] 	at java.lang.reflect.Method.invoke(Method.java:498)
[2025-03-19T04:46:13.327Z] 	at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725)

Steps/Code to reproduce bug
Please provide a list of steps or a code sample to reproduce the issue.
Avoid posting private or sensitive data.

Expected behavior
A clear and concise description of what you expected to happen.

Environment details (please complete the following information)

  • Environment location: [Standalone, YARN, Kubernetes, Cloud(specify cloud provider)]
  • Spark configuration settings related to the issue

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions