feat(driver): allow selecting pinned host allocation flags by jordan-wu-97 · Pull Request #580 · chelsea0x3b/cudarc

jordan-wu-97 · 2026-05-27T04:23:56Z

Summary

Add CudaContext::alloc_pinned_with_flags() so callers can select CUDA pinned host allocation flags.
Preserve the existing behavior of alloc_pinned() by delegating with CU_MEMHOSTALLOC_WRITECOMBINED.
Add a GPU-backed regression test that compares CPU read performance for default and write-combined pinned allocations.

Addresses #579.

Motivation

CU_MEMHOSTALLOC_WRITECOMBINED is useful for host-written buffers, but CPU reads from write-combined memory can be much slower than reads from default pinned host memory. Callers should be able to select flags according to their host access pattern.

Testing

cargo fmt --check
cargo clippy --no-default-features --features cuda-13020,no-std,cudnn,cublas,cublaslt,nvrtc,driver,curand,nccl,dynamic-loading,cufile,cupti,nvtx,cufft --all-targets -- -D warnings
NVIDIA B300 pod, debug test profile: cargo test --lib --no-default-features --features cuda-12080,std,driver,dynamic-loading test_default_pinned_host_reads_are_faster_than_write_combined -- --nocapture
- default pinned host reads: 18.811 ms
- write-combined host reads: 1.287 s
NVIDIA B300 pod, release test profile: cargo test --release --lib --no-default-features --features cuda-12080,std,driver,dynamic-loading test_default_pinned_host_reads_are_faster_than_write_combined -- --nocapture
- default pinned host reads: 1.483 ms
- write-combined host reads: 666.643 ms

gemini-code-assist

Code Review

This pull request introduces alloc_pinned_with_flags to allow allocating page-locked host memory with custom flags, and refactors alloc_pinned to use it. It also adds tests to verify the performance differences between default and write-combined host memory. A critical security vulnerability was identified where integer overflow in the byte size calculation could lead to an undersized allocation and subsequent out-of-bounds memory access.

gemini-code-assist · 2026-05-27T04:24:44Z

    ) -> Result<PinnedHostSlice<T>, DriverError> {
        self.bind_to_thread()?;
-        let ptr = result::malloc_host(
-            len * std::mem::size_of::<T>(),
-            sys::CU_MEMHOSTALLOC_WRITECOMBINED,
-        )?;
+        let ptr = result::malloc_host(len * std::mem::size_of::<T>(), flags)?;
        let ptr = ptr as *mut T;
        assert!(!ptr.is_null());
        assert!(len * std::mem::size_of::<T>() < isize::MAX as usize);


The multiplication len * std::mem::size_of::<T>() can overflow if len is extremely large. In release mode, this wraps around to a small value, leading to an undersized allocation via malloc_host. Since the returned PinnedHostSlice still reports the original large len, subsequent writes or reads to the slice will result in out-of-bounds memory access (heap buffer overflow), which is a critical security vulnerability.

Using checked_mul and validating that the total size is less than isize::MAX before allocation prevents this vulnerability.

) -> Result<PinnedHostSlice<T>, DriverError> { self.bind_to_thread()?; let num_bytes = len .checked_mul(std::mem::size_of::<T>()) .filter(|&bytes| bytes < isize::MAX as usize) .ok_or(DriverError(sys::CUresult::CUDA_ERROR_INVALID_VALUE))?; let ptr = result::malloc_host(num_bytes, flags)?; let ptr = ptr as *mut T; assert!(!ptr.is_null());

Fixed in 671d3d4. The byte count is now computed with checked_mul before calling malloc_host, then the existing isize::MAX slice-size invariant is asserted on that checked value. I kept this as panic-based validation rather than returning DriverError(CUDA_ERROR_INVALID_VALUE), since this is a Rust slice validity invariant and preserves the existing behavior. Added test_alloc_pinned_panics_on_size_overflow; it passes in the CUDA test pod.

Expose alloc_pinned_with_flags so callers can choose default pinned memory when CPU reads matter while retaining the existing write-combined default.

Measure default and write-combined pinned allocations on the CPU and guard against choosing read-hostile flags for read-heavy buffers.

Compute pinned allocation bytes without wrapping before calling CUDA while preserving the existing panic-based slice-size invariant. Add a regression test for overflowing typed lengths.

Keep the configurable allocation API neutral while warning users of the CPU-read cost on the convenience method that selects write-combined memory.

chelsea0x3b · 2026-06-18T19:26:49Z

+    /// [CudaContext::alloc_pinned_with_flags()] with `0` for default page
+    /// locked host memory if CPU reads matter.


I don't see 0 as an option for flags, where are you seeing this from the nvidia docs?

jordan-wu-97 marked this pull request as ready for review May 27, 2026 04:24

jordan-wu-97 requested a review from chelsea0x3b as a code owner May 27, 2026 04:24

gemini-code-assist Bot reviewed May 27, 2026

View reviewed changes

jordan-wu-97 added 2 commits May 27, 2026 04:27

feat(driver): allow selecting pinned host allocation flags

3cd1734

Expose alloc_pinned_with_flags so callers can choose default pinned memory when CPU reads matter while retaining the existing write-combined default.

chore: add pinned host CPU read performance test

2afd12c

Measure default and write-combined pinned allocations on the CPU and guard against choosing read-hostile flags for read-heavy buffers.

jordan-wu-97 force-pushed the jordan/pinned-host-alloc-flag branch from ae2594d to 2afd12c Compare May 27, 2026 04:27

jordan-wu-97 changed the title ~~feat: allow selecting pinned host allocation flags~~ feat(driver): allow selecting pinned host allocation flags May 27, 2026

jordan-wu-97 added 2 commits May 27, 2026 04:35

feat(driver): prevent pinned host allocation size overflow

671d3d4

Compute pinned allocation bytes without wrapping before calling CUDA while preserving the existing panic-based slice-size invariant. Add a regression test for overflowing typed lengths.

chore: document write-combined behavior on default pinned allocation

7a1ff9f

Keep the configurable allocation API neutral while warning users of the CPU-read cost on the convenience method that selects write-combined memory.

chelsea0x3b reviewed Jun 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(driver): allow selecting pinned host allocation flags#580

feat(driver): allow selecting pinned host allocation flags#580
jordan-wu-97 wants to merge 4 commits into
chelsea0x3b:mainfrom
jordan-wu-97:jordan/pinned-host-alloc-flag

jordan-wu-97 commented May 27, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 27, 2026

Uh oh!

jordan-wu-97 May 27, 2026

Uh oh!

chelsea0x3b Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		/// [CudaContext::alloc_pinned_with_flags()] with `0` for default page
		/// locked host memory if CPU reads matter.

Uh oh!

Conversation

jordan-wu-97 commented May 27, 2026

Summary

Motivation

Testing

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

jordan-wu-97 May 27, 2026

Choose a reason for hiding this comment

Uh oh!

chelsea0x3b Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants