Devices

:cpp:any:`cuda::device_ref`

:cpp:any:`cuda::device_ref` is a lightweight, non-owning handle to a CUDA device ordinal. It allows to query information about a device and serves as an argument to other runtime APIs which are tied to a specific device. It offers:

get(): native device ordinal
name(): device name
init(): initialize the device context
peers(): list peers for which peer access can be enabled
has_peer_access_to(cuda::device_ref): query if peer access can be enabled to the given device
attribute(attr) / attribute<::cudaDeviceAttr>(): attribute queries

Availability: CCCL 3.1.0 / CUDA 13.1

:cpp:any:`cuda::devices`

:cpp:any:`cuda::devices` is a random-access view of all available CUDA devices in the form of :cpp:any:`cuda::device_ref` objects. It provides indexing, size, and iteration for use in range-based loops.

Availability: CCCL 3.1.0 / CUDA 13.1

Example:

#include <cuda/devices>
#include <iostream>

void print_devices() {
  for (auto& dev : cuda::devices) {
    std::cout << "Device " << dev.get() << ": " << dev.name() << std::endl;
  }
}

Device attributes

cuda::device_attributes provides strongly-typed attribute query objects usable with :cpp:any:`cuda::device_ref::attribute`. Selected examples:

compute_capability
multiprocessor_count
concurrent_managed_access
clock_rate
numa_id

Availability: CCCL 3.1.0 / CUDA 13.1

Example:

#include <cuda/devices>

int get_max_blocks_on_device(cuda::device_ref dev) {
  return cuda::device_attributes::multiprocessor_count(dev) * cuda::device_attributes::blocks_per_multiprocessor(dev);
}

:cpp:any:`cuda::arch_traits`

Per-architecture trait accessors providing limits and capabilities common to all devices of an architecture. Compared to cuda::device_attributes, :cpp:any:`cuda::arch_traits` provide a compile-time accessible structure that describes common characteristics of all devices of an architecture, while attributes are run-time queries of a single characteristic of a specific device.

:cpp:any:`cuda::arch_traits` and :cpp:any:`cuda::arch_traits_for` (compile-time and run-time forms).
Returns a :cpp:any:`cuda::arch_traits_t` with fields like max_threads_per_block, max_shared_memory_per_block, cluster_supported and other capability flags.
Traits for the current architecture can be accessed with :cpp:any:`cuda::device::current_arch_traits`

Availability: CCCL 3.1.0 / CUDA 13.1

Example:

#include <cuda/devices>

template <cuda::arch_id Arch>
__device__ void fn() {
  auto traits = cuda::arch_traits<Arch>();
  if constexpr (traits.cluster_supported) {
    // cluster specific code
  } else {
    // non-cluster code
  }
}

__global__ void kernel() {
  fn<cuda::arch_id::sm_90>();
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Devices

:cpp:any:`cuda::device_ref`

:cpp:any:`cuda::devices`

Device attributes

:cpp:any:`cuda::arch_traits`

FilesExpand file tree

device.rst

Latest commit

History

device.rst

File metadata and controls

Devices

:cpp:any:`cuda::device_ref`

:cpp:any:`cuda::devices`

Device attributes

:cpp:any:`cuda::arch_traits`