Allow configuring allocation flags for `PinnedHostSlice`

### Problem

`CudaContext::alloc_pinned()` currently creates `PinnedHostSlice<T>` allocations with
`CU_MEMHOSTALLOC_WRITECOMBINED` unconditionally:

https://github.com/chelsea0x3b/cudarc/blob/3e5d38b5fe5ec81c934bdc2c7207f181772e307d/src/driver/safe/core.rs#L1405-L1428

Write-combined pinned memory can be useful when host memory is primarily written by
the CPU and transferred to the device. However, it has poor CPU read performance.
CUDA's documentation notes that reading from write-combined host memory on the CPU
is prohibitively slow.

https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g572ca4011bfcb25034888a14d4e035b9

Because pinned buffers can have different host access patterns, selecting
`CU_MEMHOSTALLOC_WRITECOMBINED` unconditionally makes `PinnedHostSlice` unsuitable
for cases where CPU reads matter.

### Proposal

Provide an API that allows callers to select the flags used for pinned host
allocations while preserving existing behavior for current callers.

For example:

```rust
pub unsafe fn alloc_pinned_with_flags<T: DeviceRepr>(
    self: &Arc<Self>,
    len: usize,
    flags: u32,
) -> Result<PinnedHostSlice<T>, DriverError>;
```

The existing method could continue using write-combined memory for characteristic backwards compatibility:

```rust
pub unsafe fn alloc_pinned<T: DeviceRepr>(
    self: &Arc<Self>,
    len: usize,
) -> Result<PinnedHostSlice<T>, DriverError> {
    self.alloc_pinned_with_flags(len, sys::CU_MEMHOSTALLOC_WRITECOMBINED)
}
```


	impl CudaContext {
	/// Allocates page locked host memory with [sys::CU_MEMHOSTALLOC_WRITECOMBINED] flags.
	///
	/// See [cuda docs](https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g572ca4011bfcb25034888a14d4e035b9)
	///
	/// # Safety
	/// 1. This is unsafe because the memory is unset after this call.
	pub unsafe fn alloc_pinned<T: DeviceRepr>(
	self: &Arc<Self>,
	len: usize,
	) -> Result<PinnedHostSlice<T>, DriverError> {
	self.bind_to_thread()?;
	let ptr = result::malloc_host(
	len * std::mem::size_of::<T>(),
	sys::CU_MEMHOSTALLOC_WRITECOMBINED,
	)?;
	let ptr = ptr as *mut T;
	assert!(!ptr.is_null());
	assert!(len * std::mem::size_of::<T>() < isize::MAX as usize);
	assert!(ptr.is_aligned());
	let event = self.new_event(Some(sys::CUevent_flags::CU_EVENT_BLOCKING_SYNC))?;
	Ok(PinnedHostSlice { ptr, len, event })
	}
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow configuring allocation flags for `PinnedHostSlice` #579

Problem

Proposal

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Allow configuring allocation flags for PinnedHostSlice #579

Description

Problem

Proposal

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Allow configuring allocation flags for `PinnedHostSlice` #579