Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/libcudacxx/Doxyfile
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,9 @@ INPUT = ../../libcudacxx/include/cuda/__algorithm \
../../libcudacxx/include/cuda/__container \
../../libcudacxx/include/cuda/__device \
../../libcudacxx/include/cuda/__event \
../../libcudacxx/include/cuda/__hierarchy \
../../libcudacxx/include/cuda/__iterator \
../../libcudacxx/include/cuda/__launch \
../../libcudacxx/include/cuda/__memory_pool \
../../libcudacxx/include/cuda/__memory_resource \
../../libcudacxx/include/cuda/__stream \
Expand Down
27 changes: 15 additions & 12 deletions docs/libcudacxx/runtime/algorithm.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,20 +3,23 @@
Algorithm
==========

The ``runtime`` part of the ``cuda/algorithm`` header provides stream-ordered, byte-wise primitives that operate on ``cuda::std::span`` and
``cuda::std::mdspan``-compatible types. They require a ``cuda::stream_ref`` to enqueue work.
The ``runtime`` part of the ``cuda/algorithm`` header provides stream-ordered, byte-wise primitives that operate on
:cpp:any:`cuda::std::span` and :cpp:any:`cuda::std::mdspan`-compatible types. They require a
:cpp:any:`cuda::stream_ref` to enqueue work.

``cuda::copy_bytes``
---------------------
:cpp:any:`cuda::copy_bytes`
-------------------------------
.. _cccl-runtime-algorithm-copy_bytes:

Launch a byte-wise copy from source to destination on the provided stream.

- Signature: ``copy_bytes(stream, src, dst, config = {})``
- Overloads accept ``cuda::std::span``-convertible contiguous ranges or ``cuda::std::mdspan``-convertible multi-dimensional views.
- Signature: :cpp:any:`cuda::copy_bytes`
- Overloads accept :cpp:any:`cuda::std::span`-convertible contiguous ranges or
:cpp:any:`cuda::std::mdspan`-convertible multi-dimensional views.
- Elements must be trivially copyable
- ``cuda::std::mdspan``-convertible types must convert to an mdspan that is exhaustive
- The optional ``config`` argument is a ``cuda::copy_configuration`` that controls source access order and managed-memory location hints
- :cpp:any:`cuda::std::mdspan`-convertible types must convert to an mdspan that is exhaustive
- The optional ``config`` argument is a :cpp:any:`cuda::copy_configuration` that controls source access order and
managed-memory location hints

Availability: CCCL 3.1.0 / CUDA 13.1

Expand Down Expand Up @@ -46,15 +49,15 @@ Availability: CCCL 3.1.0 / CUDA 13.1
}


``cuda::fill_bytes``
---------------------
:cpp:any:`cuda::fill_bytes`
-------------------------------
.. _cccl-runtime-algorithm-fill_bytes:

Launch a byte-wise fill of the destination on the provided stream.

- Overloads accept ``cuda::std::span``-convertible or ``cuda::std::mdspan``-convertible destinations.
- Overloads accept :cpp:any:`cuda::std::span`-convertible or :cpp:any:`cuda::std::mdspan`-convertible destinations.
- Elements must be trivially copyable
- ``cuda::std::mdspan``-convertible types must convert to an mdspan that is exhaustive
- :cpp:any:`cuda::std::mdspan`-convertible types must convert to an mdspan that is exhaustive

Availability: CCCL 3.1.0 / CUDA 13.1

Expand Down
31 changes: 20 additions & 11 deletions docs/libcudacxx/runtime/buffer.rst
Original file line number Diff line number Diff line change
@@ -1,19 +1,28 @@
.. _cccl-runtime-buffer:

.. |cuda_make_buffer| replace:: ``cuda::make_buffer``
.. _cuda_make_buffer: ../api/namespacecuda_1a8d909070d4cf758e776659b91e473a6f.html

Buffer
======

The buffer API provides a typed container allocated from memory resources. It handles stream-ordered allocation, initialization, and deallocation of memory.

``cuda::buffer``
----------------
:cpp:any:`cuda::buffer`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Shouldnt those be :cpp:function:?

---------------------------
.. _cccl-runtime-buffer-buffer:

``cuda::buffer`` is a container that manages typed storage allocated from a given :ref:`memory resource <libcudacxx-extended-api-memory-resources-resource>` in stream order using a provided :ref:`stream_ref <cccl-runtime-stream-stream-ref>`. The elements are initialized during construction, which may require a kernel launch. The stream provided during construction is stored and later used for deallocation of the buffer, either explicitly or when the buffer destructor is called.
:cpp:any:`cuda::buffer` is a container that manages typed storage allocated from a given
:ref:`memory resource <libcudacxx-extended-api-memory-resources-resource>` in stream order using a provided
:ref:`stream_ref <cccl-runtime-stream-stream-ref>`. The elements are initialized during construction, which may require
a kernel launch. The stream provided during construction is stored and later used for deallocation of the buffer,
either explicitly or when the buffer destructor is called.

Buffer owns a copy of the memory resource, which means it must be copy-constructible. If a resource is not copy-constructible, like memory pool objects, :ref:`shared_resource <libcudacxx-extended-api-memory-resources-shared-resource>` can be used to attach shared ownership to a resource type.

In addition to being typed, ``buffer`` also takes a set of :ref:`properties <libcudacxx-extended-api-memory-resources-properties>` to ensure that memory accessibility and other constraints are checked at compile time.
In addition to being typed, :cpp:any:`cuda::buffer` also takes a set of
:ref:`properties <libcudacxx-extended-api-memory-resources-properties>` to ensure that memory accessibility and other
constraints are checked at compile time.

While the buffer operates in stream order, it can also be constructed with a :ref:`synchronous_resource <libcudacxx-extended-api-memory-resources-synchronous-resource>`, in which case it will automatically use the :ref:`synchronous_resource_adapter <libcudacxx-extended-api-memory-resources-synchronous-adapter>` to wrap the provided resource.

Expand Down Expand Up @@ -49,8 +58,8 @@ Type Aliases

Convenience type aliases are provided for common buffer types:

- ``cuda::device_buffer<T>`` - Buffer with ``device_accessible`` property
- ``cuda::host_buffer<T>`` - Buffer with ``host_accessible`` property
- :cpp:any:`cuda::device_buffer` - Buffer with ``device_accessible`` property
- :cpp:any:`cuda::host_buffer` - Buffer with ``host_accessible`` property

Example:

Expand Down Expand Up @@ -150,13 +159,13 @@ Example:
// Alternative would be to call buf.destroy(stream2)
}
``cuda::make_buffer``
---------------------
|cuda_make_buffer|_
------------------------------------------------------------------------------------------------
.. _cccl-runtime-buffer-make-buffer:

``cuda::make_buffer()`` is a factory function that creates buffers with automatic property deduction from the memory
resource. It supports the same construction patterns as the buffer constructors, in addition to an overload that sets
all elements of the buffer to the same value.
|cuda_make_buffer|_ is a factory function that
creates buffers with automatic property deduction from the memory resource. It supports the same construction patterns
as the buffer constructors, in addition to an overload that sets all elements of the buffer to the same value.

Example:

Expand Down
11 changes: 7 additions & 4 deletions docs/libcudacxx/runtime/cudart_interactions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
CUDA Runtime interactions
=========================

Some runtime objects have a non-owning ``_ref`` counterpart (for example, ``stream`` and ``stream_ref``). Prefer the
Some runtime objects have a non-owning ``_ref`` counterpart (for example, :cpp:any:`cuda::stream` and
:cpp:any:`cuda::stream_ref`). Prefer the
owning type for lifetime management, and use the ``_ref`` type for code that would otherwise accept a C++ reference but
needs to interoperate with existing CUDA Runtime code.

Expand Down Expand Up @@ -42,8 +43,9 @@ Example: handle interop patterns
Device selection
----------------

The Runtime API emphasizes explicit device selection. Most entry points take a ``cuda::device_ref`` or a device-bound
resource (such as ``cuda::stream{device}``) rather than relying on implicit global state like ``cudaSetDevice``. This
The Runtime API emphasizes explicit device selection. Most entry points take a :cpp:any:`cuda::device_ref` or a
device-bound resource (such as :cpp:any:`cuda::stream`) rather than relying on implicit global state like
``cudaSetDevice``. This
makes device ownership and lifetime clearer, especially in multi-GPU code.

The current device can still be set via the CUDA Runtime, but cccl-runtime APIs ignore that global state and require an
Expand All @@ -54,7 +56,8 @@ Default stream interop
----------------------

The CUDA default (NULL) stream is not exposed as a first-class runtime object because it is tied to implicit per-device
state and encourages hidden dependencies. Instead, it can be wrapped into ``cuda::stream_ref`` when needed for interop.
state and encourages hidden dependencies. Instead, it can be wrapped into :cpp:any:`cuda::stream_ref` when needed for
interop.

.. note::

Expand Down
33 changes: 18 additions & 15 deletions docs/libcudacxx/runtime/device.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,27 +3,29 @@
Devices
=======

``cuda::device_ref``
---------------------
:cpp:any:`cuda::device_ref`
-------------------------------
.. _cccl-runtime-device-device-ref:

``cuda::device_ref`` is a lightweight, non-owning handle to a CUDA device ordinal. It allows to query information about a device and serves as an argument to other runtime APIs which are tied to a specific device.
:cpp:any:`cuda::device_ref` is a lightweight, non-owning handle to a CUDA device ordinal. It allows to query
information about a device and serves as an argument to other runtime APIs which are tied to a specific device.
It offers:

- ``get()``: native device ordinal
- ``name()``: device name
- ``init()``: initialize the device context
- ``peers()``: list peers for which peer access can be enabled
- ``has_peer_access_to(device_ref)``: query if peer access can be enabled to the given device
- ``has_peer_access_to(cuda::device_ref)``: query if peer access can be enabled to the given device
- ``attribute(attr)`` / ``attribute<::cudaDeviceAttr>()``: attribute queries

Availability: CCCL 3.1.0 / CUDA 13.1

``cuda::devices``
------------------
:cpp:any:`cuda::devices`
----------------------------
.. _cccl-runtime-device-devices:

``cuda::devices`` is a random-access view of all available CUDA devices in the form of ``cuda::device_ref`` objects. It
:cpp:any:`cuda::devices` is a random-access view of all available CUDA devices in the form of
:cpp:any:`cuda::device_ref` objects. It
provides indexing, size, and iteration for use
in range-based loops.

Expand All @@ -47,7 +49,7 @@ Device attributes
.. _cccl-runtime-device-attributes:

``cuda::device_attributes`` provides strongly-typed attribute query objects usable with
``device_ref::attribute``. Selected examples:
:cpp:any:`cuda::device_ref::attribute`. Selected examples:

- ``compute_capability``
- ``multiprocessor_count``
Expand All @@ -67,18 +69,19 @@ Example:
return cuda::device_attributes::multiprocessor_count(dev) * cuda::device_attributes::blocks_per_multiprocessor(dev);
}

``cuda::arch_traits``
---------------------
:cpp:any:`cuda::arch_traits`
--------------------------------
.. _cccl-runtime-device-arch-traits:

Per-architecture trait accessors providing limits and capabilities common to all devices of an architecture.
Compared to ``device_attributes``, ``cuda::arch_traits`` provide a compile-time accessible structure that describes common characteristics of all devices of an architecture, while attributes are run-time queries of a single characteristic of a specific device.
Compared to ``cuda::device_attributes``, :cpp:any:`cuda::arch_traits` provide a compile-time accessible
structure that describes common characteristics of all devices of an architecture, while attributes are run-time
queries of a single characteristic of a specific device.

- ``cuda::arch_traits<cuda::arch_id::sm_80>()`` (compile-time) or
``cuda::arch_traits_for(cuda::arch_id)`` / ``cuda::arch_traits_for(cuda::compute_capability)`` (run-time).
- Returns a ``cuda::arch_traits_t`` with fields like
- :cpp:any:`cuda::arch_traits` and :cpp:any:`cuda::arch_traits_for` (compile-time and run-time forms).
- Returns a :cpp:any:`cuda::arch_traits_t` with fields like
``max_threads_per_block``, ``max_shared_memory_per_block``, ``cluster_supported`` and other capability flags.
- Traits for the current architecture can be accessed with ``cuda::device::current_arch_traits()``
- Traits for the current architecture can be accessed with :cpp:any:`cuda::device::current_arch_traits`

Availability: CCCL 3.1.0 / CUDA 13.1

Expand Down
23 changes: 13 additions & 10 deletions docs/libcudacxx/runtime/event.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,17 @@ Events

Event is a snapshot of execution state of a stream. It can be used to synchronize work submitted to a stream up to a certain point, establish dependency between streams or measure time passed between two events.

``cuda::event_ref``
:cpp:any:`cuda::event_ref`
--------------------------------------------------
.. _cccl-runtime-event-event-ref:

``cuda::event_ref`` is a non-owning wrapper around a ``cudaEvent_t``. It prevents unsafe implicit constructions from
:cpp:any:`cuda::event_ref` is a non-owning wrapper around a ``cudaEvent_t``. It prevents unsafe implicit constructions from
``nullptr`` or integer literals and provides convenient helpers:

- ``record(cuda::stream_ref)``: record the event on a stream
- ``sync()``: wait for the recorded work to complete
- ``is_done()``: non-blocking completion query
- comparison operators against other ``event_ref`` or ``cudaEvent_t``
- comparison operators against other :cpp:any:`cuda::event_ref` or ``cudaEvent_t``

Availability: CCCL 3.1.0 / CUDA 13.1

Expand All @@ -30,13 +30,14 @@ Example:
e.record(stream);
}

``cuda::event``
:cpp:any:`cuda::event`
--------------------------------------------
.. _cccl-runtime-event-event:

``cuda::event`` is an owning wrapper around a ``cudaEvent_t`` (with timing disabled). It inherits from ``event_ref`` and provides all of its functionality.
It also creates and destroys the native event, can be moved (but not copied), and can release ownership via ``release()``. Construction can target a specific
``cuda::device_ref`` or record immediately on a ``cuda::stream_ref``.
:cpp:any:`cuda::event` is an owning wrapper around a ``cudaEvent_t`` (with timing disabled). It inherits from
:cpp:any:`cuda::event_ref` and provides all of its functionality. It also creates and destroys the native event, can be moved (but
not copied), and can release ownership via ``release()``. Construction can target a specific :cpp:any:`cuda::device_ref`
or record immediately on a :cpp:any:`cuda::stream_ref`.

Availability: CCCL 3.1.0 / CUDA 13.1

Expand All @@ -56,11 +57,13 @@ Availability: CCCL 3.1.0 / CUDA 13.1

.. _cccl-runtime-event-timed-event:

``cuda::timed_event``
:cpp:any:`cuda::timed_event`
-----------------------------------------------------

``cuda::timed_event`` is an owning wrapper for a timed ``cudaEvent_t``. It inherits from ``event`` and provides all of its functionality.
It also supports elapsed-time queries between two events via ``operator-``, returning ``cuda::std::chrono::nanoseconds``.
:cpp:any:`cuda::timed_event` is an owning wrapper for a timed ``cudaEvent_t``. It inherits from :cpp:any:`cuda::event` and provides
all of its functionality.
It also supports elapsed-time queries between two events via ``operator-``, returning
:cpp:any:`cuda::std::chrono::nanoseconds`.

Availability: CCCL 3.1.0 / CUDA 13.1

Expand Down
Loading
Loading