Skip to content

Conversation

@nvauto
Copy link
Collaborator

@nvauto nvauto commented Oct 9, 2025

auto-merge triggered by github actions on branch-25.10 to create a PR keeping branch-25.12 up-to-date. If this PR is unable to be merged due to conflicts, it will remain open until manually fix.

Rebases #701 to
spark-rapids-ml 25.08 and cupy 13.6

Description from #701 :
Allowing running benchmarks using System Allocated Memory (SAM) on HMM
and Grace Hopper systems. This depends on
cupy/cupy#8442.

Some additional changes:
- moved custom aligned numpy allocator config to core library and into a
separate module (invoking in a function leads to segfault in cuml import
likely due to a key object being deallocated but haven't root caused)
- set `CUPY_ENABLE_UMP` env var automatically in python worker run time
if sam is enabled (i.e. no separate config needed)
- minor patches to benchmarking scripts (e.g. to fix verbose mode). 
- for algo's that concatenate input batches, memadvise numpy allocations
to stay on host. Otherwise, these migrate to device during concatenation
and can lead to OOM wrt headroom calculation
- reserve device memory for sam rmm allocations during cuml fit, in
addition to non-rmm allocations by libraries (this speeds up kmeans cuml
fit by 13x over original making it 4x faster than uvm)

---------

Signed-off-by: Rong Ou <[email protected]>
Signed-off-by: Erik Ordentlich <[email protected]>
Co-authored-by: Rong Ou <[email protected]>
@nvauto nvauto merged commit 8ea7f8a into branch-25.12 Oct 9, 2025
@nvauto
Copy link
Collaborator Author

nvauto commented Oct 9, 2025

SUCCESS - auto-merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants