Skip to content

[BUG] JNI unit tests are experiencing significant perf regression with cuda 13.1.0 w/ -DUSE_SANITIZER=ON #4127

@pxLi

Description

@pxLi

Describe the bug
As a follow-up to rapidsai/build-planning#236

Starting with the 26.02 release, cuda13 nightly artifacts will statically link the 13.1.0 runtime instead of CUDA 13.0.x.

refer: https://developer.nvidia.com/blog/nvidia-cuda-13-1-powers-next-gen-gpu-programming-with-nvidia-cuda-tile-and-performance-gains/#compile-time_patching

For both x86 and ARM64, we observed significant performance regression with sanitizer memcheck tool

Cases with huge difference (cuda13.1.0 is slower)

Test Class CUDA 13.0.1 (same as 12.9.1) CUDA 13.1.0 Slowdown
NvcompTest 2.86s 51.26s ~18x slower
ScalarTest 0.66s 7.24s ~11x slower
CompiledExpressionTest 1.72s 17.19s ~10x slower
IfElseTest 1.34s 7.30s ~5.5x slower
TableTest 153.69s 689.43s ~4.5x slower
ColumnVectorTest 16.74s 49.02s ~3x slower
ReductionTest 2.43s 6.52s ~2.7x slower
KudoGpuSerializerTest 121.38s 281.59s ~2.3x slower
BinaryOpTest 0.88s 1.90s ~2.2x slower
KudoSerializerTest 860.79s 1142.61s ~1.3x slower

Total duration change

Version Duration
CUDA 13.0.1 ~21 min
CUDA 13.1.0 ~41 min

cuda: 13.0.1 vs 13.1.0
Driver: Verified with both 580.65.06 and 590.44.01
Commit: 3c49e7b
Image: https://github.com/NVIDIA/spark-rapids-jni/blob/main/ci/Dockerfile, pre-built ones:
artifactory_URL/sw-spark-docker/plugin-jni:rockylinux8-cuda13.0.1-blossom
artifactory_URL/sw-spark-docker/plugin-jni:rockylinux8-cuda13.1.0-blossom

Steps/Code to reproduce bug
This repro for both x86 and arm64 in nightly (-DUSE_SANITIZER=ON)

source build/env.sh && ${sclCMD} "ci/nightly-build.sh"

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions