-
Notifications
You must be signed in to change notification settings - Fork 78
Description
Describe the bug
As a follow-up to rapidsai/build-planning#236
Starting with the 26.02 release, cuda13 nightly artifacts will statically link the 13.1.0 runtime instead of CUDA 13.0.x.
For both x86 and ARM64, we observed significant performance regression with sanitizer memcheck tool
Cases with huge difference (cuda13.1.0 is slower)
| Test Class | CUDA 13.0.1 (same as 12.9.1) | CUDA 13.1.0 | Slowdown |
|---|---|---|---|
NvcompTest |
2.86s | 51.26s | ~18x slower |
ScalarTest |
0.66s | 7.24s | ~11x slower |
CompiledExpressionTest |
1.72s | 17.19s | ~10x slower |
IfElseTest |
1.34s | 7.30s | ~5.5x slower |
TableTest |
153.69s | 689.43s | ~4.5x slower |
ColumnVectorTest |
16.74s | 49.02s | ~3x slower |
ReductionTest |
2.43s | 6.52s | ~2.7x slower |
KudoGpuSerializerTest |
121.38s | 281.59s | ~2.3x slower |
BinaryOpTest |
0.88s | 1.90s | ~2.2x slower |
KudoSerializerTest |
860.79s | 1142.61s | ~1.3x slower |
Total duration change
| Version | Duration |
|---|---|
| CUDA 13.0.1 | ~21 min |
| CUDA 13.1.0 | ~41 min |
cuda: 13.0.1 vs 13.1.0
Driver: Verified with both 580.65.06 and 590.44.01
Commit: 3c49e7b
Image: https://github.com/NVIDIA/spark-rapids-jni/blob/main/ci/Dockerfile, pre-built ones:
artifactory_URL/sw-spark-docker/plugin-jni:rockylinux8-cuda13.0.1-blossom
artifactory_URL/sw-spark-docker/plugin-jni:rockylinux8-cuda13.1.0-blossom
Steps/Code to reproduce bug
This repro for both x86 and arm64 in nightly (-DUSE_SANITIZER=ON)
source build/env.sh && ${sclCMD} "ci/nightly-build.sh"
Additional context
Add any other context about the problem here.