Skip to content

Commit 68f5e41

Browse files
authored
[libcu++] Add links to API reference docs in runtime section (#7984)
* Add links to API reference docs in runtime section * Use stricter roles
1 parent cf864f5 commit 68f5e41

File tree

11 files changed

+250
-152
lines changed

11 files changed

+250
-152
lines changed

docs/libcudacxx/Doxyfile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,9 @@ INPUT = ../../libcudacxx/include/cuda/__algorithm \
1313
../../libcudacxx/include/cuda/__container \
1414
../../libcudacxx/include/cuda/__device \
1515
../../libcudacxx/include/cuda/__event \
16+
../../libcudacxx/include/cuda/__hierarchy \
1617
../../libcudacxx/include/cuda/__iterator \
18+
../../libcudacxx/include/cuda/__launch \
1719
../../libcudacxx/include/cuda/__memory_pool \
1820
../../libcudacxx/include/cuda/__memory_resource \
1921
../../libcudacxx/include/cuda/__stream \

docs/libcudacxx/runtime/algorithm.rst

Lines changed: 15 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -3,20 +3,23 @@
33
Algorithm
44
==========
55

6-
The ``runtime`` part of the ``cuda/algorithm`` header provides stream-ordered, byte-wise primitives that operate on ``cuda::std::span`` and
7-
``cuda::std::mdspan``-compatible types. They require a ``cuda::stream_ref`` to enqueue work.
6+
The ``runtime`` part of the ``cuda/algorithm`` header provides stream-ordered, byte-wise primitives that operate on
7+
:cpp:class:`cuda::std::span` and :cpp:class:`cuda::std::mdspan`-compatible types. They require a
8+
:cpp:class:`cuda::stream_ref` to enqueue work.
89

9-
``cuda::copy_bytes``
10-
---------------------
10+
:cpp:func:`cuda::copy_bytes`
11+
-------------------------------
1112
.. _cccl-runtime-algorithm-copy_bytes:
1213

1314
Launch a byte-wise copy from source to destination on the provided stream.
1415

15-
- Signature: ``copy_bytes(stream, src, dst, config = {})``
16-
- Overloads accept ``cuda::std::span``-convertible contiguous ranges or ``cuda::std::mdspan``-convertible multi-dimensional views.
16+
- Signature: :cpp:func:`cuda::copy_bytes`
17+
- Overloads accept :cpp:class:`cuda::std::span`-convertible contiguous ranges or
18+
:cpp:class:`cuda::std::mdspan`-convertible multi-dimensional views.
1719
- Elements must be trivially copyable
18-
- ``cuda::std::mdspan``-convertible types must convert to an mdspan that is exhaustive
19-
- The optional ``config`` argument is a ``cuda::copy_configuration`` that controls source access order and managed-memory location hints
20+
- :cpp:class:`cuda::std::mdspan`-convertible types must convert to an mdspan that is exhaustive
21+
- The optional ``config`` argument is a :cpp:struct:`cuda::copy_configuration` that controls source access order and
22+
managed-memory location hints
2023

2124
Availability: CCCL 3.1.0 / CUDA 13.1
2225

@@ -46,15 +49,15 @@ Availability: CCCL 3.1.0 / CUDA 13.1
4649
}
4750
4851
49-
``cuda::fill_bytes``
50-
---------------------
52+
:cpp:func:`cuda::fill_bytes`
53+
-------------------------------
5154
.. _cccl-runtime-algorithm-fill_bytes:
5255

5356
Launch a byte-wise fill of the destination on the provided stream.
5457

55-
- Overloads accept ``cuda::std::span``-convertible or ``cuda::std::mdspan``-convertible destinations.
58+
- Overloads accept :cpp:class:`cuda::std::span`-convertible or :cpp:class:`cuda::std::mdspan`-convertible destinations.
5659
- Elements must be trivially copyable
57-
- ``cuda::std::mdspan``-convertible types must convert to an mdspan that is exhaustive
60+
- :cpp:class:`cuda::std::mdspan`-convertible types must convert to an mdspan that is exhaustive
5861

5962
Availability: CCCL 3.1.0 / CUDA 13.1
6063

docs/libcudacxx/runtime/buffer.rst

Lines changed: 20 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,28 @@
11
.. _cccl-runtime-buffer:
22

3+
.. |cuda_make_buffer| replace:: ``cuda::make_buffer``
4+
.. _cuda_make_buffer: ../api/namespacecuda_1a8d909070d4cf758e776659b91e473a6f.html
5+
36
Buffer
47
======
58

69
The buffer API provides a typed container allocated from memory resources. It handles stream-ordered allocation, initialization, and deallocation of memory.
710

8-
``cuda::buffer``
9-
----------------
11+
:cpp:class:`cuda::buffer`
12+
---------------------------
1013
.. _cccl-runtime-buffer-buffer:
1114

12-
``cuda::buffer`` is a container that manages typed storage allocated from a given :ref:`memory resource <libcudacxx-extended-api-memory-resources-resource>` in stream order using a provided :ref:`stream_ref <cccl-runtime-stream-stream-ref>`. The elements are initialized during construction, which may require a kernel launch. The stream provided during construction is stored and later used for deallocation of the buffer, either explicitly or when the buffer destructor is called.
15+
:cpp:class:`cuda::buffer` is a container that manages typed storage allocated from a given
16+
:ref:`memory resource <libcudacxx-extended-api-memory-resources-resource>` in stream order using a provided
17+
:ref:`stream_ref <cccl-runtime-stream-stream-ref>`. The elements are initialized during construction, which may require
18+
a kernel launch. The stream provided during construction is stored and later used for deallocation of the buffer,
19+
either explicitly or when the buffer destructor is called.
1320

1421
Buffer owns a copy of the memory resource, which means it must be copy-constructible. If a resource is not copy-constructible, like memory pool objects, :ref:`shared_resource <libcudacxx-extended-api-memory-resources-shared-resource>` can be used to attach shared ownership to a resource type.
1522

16-
In addition to being typed, ``buffer`` also takes a set of :ref:`properties <libcudacxx-extended-api-memory-resources-properties>` to ensure that memory accessibility and other constraints are checked at compile time.
23+
In addition to being typed, :cpp:class:`cuda::buffer` also takes a set of
24+
:ref:`properties <libcudacxx-extended-api-memory-resources-properties>` to ensure that memory accessibility and other
25+
constraints are checked at compile time.
1726

1827
While the buffer operates in stream order, it can also be constructed with a :ref:`synchronous_resource <libcudacxx-extended-api-memory-resources-synchronous-resource>`, in which case it will automatically use the :ref:`synchronous_resource_adapter <libcudacxx-extended-api-memory-resources-synchronous-adapter>` to wrap the provided resource.
1928

@@ -49,8 +58,8 @@ Type Aliases
4958

5059
Convenience type aliases are provided for common buffer types:
5160

52-
- ``cuda::device_buffer<T>`` - Buffer with ``device_accessible`` property
53-
- ``cuda::host_buffer<T>`` - Buffer with ``host_accessible`` property
61+
- :cpp:any:`cuda::device_buffer` - Buffer with ``device_accessible`` property
62+
- :cpp:any:`cuda::host_buffer` - Buffer with ``host_accessible`` property
5463

5564
Example:
5665

@@ -150,13 +159,13 @@ Example:
150159
// Alternative would be to call buf.destroy(stream2)
151160
}
152161
153-
``cuda::make_buffer``
154-
---------------------
162+
|cuda_make_buffer|_
163+
------------------------------------------------------------------------------------------------
155164
.. _cccl-runtime-buffer-make-buffer:
156165

157-
``cuda::make_buffer()`` is a factory function that creates buffers with automatic property deduction from the memory
158-
resource. It supports the same construction patterns as the buffer constructors, in addition to an overload that sets
159-
all elements of the buffer to the same value.
166+
|cuda_make_buffer|_ is a factory function that
167+
creates buffers with automatic property deduction from the memory resource. It supports the same construction patterns
168+
as the buffer constructors, in addition to an overload that sets all elements of the buffer to the same value.
160169

161170
Example:
162171

docs/libcudacxx/runtime/cudart_interactions.rst

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,8 @@
33
CUDA Runtime interactions
44
=========================
55

6-
Some runtime objects have a non-owning ``_ref`` counterpart (for example, ``stream`` and ``stream_ref``). Prefer the
6+
Some runtime objects have a non-owning ``_ref`` counterpart (for example, :cpp:struct:`cuda::stream` and
7+
:cpp:class:`cuda::stream_ref`). Prefer the
78
owning type for lifetime management, and use the ``_ref`` type for code that would otherwise accept a C++ reference but
89
needs to interoperate with existing CUDA Runtime code.
910

@@ -42,8 +43,9 @@ Example: handle interop patterns
4243
Device selection
4344
----------------
4445

45-
The Runtime API emphasizes explicit device selection. Most entry points take a ``cuda::device_ref`` or a device-bound
46-
resource (such as ``cuda::stream{device}``) rather than relying on implicit global state like ``cudaSetDevice``. This
46+
The Runtime API emphasizes explicit device selection. Most entry points take a :cpp:class:`cuda::device_ref` or a
47+
device-bound resource (such as :cpp:struct:`cuda::stream`) rather than relying on implicit global state like
48+
``cudaSetDevice``. This
4749
makes device ownership and lifetime clearer, especially in multi-GPU code.
4850

4951
The current device can still be set via the CUDA Runtime, but cccl-runtime APIs ignore that global state and require an
@@ -54,7 +56,8 @@ Default stream interop
5456
----------------------
5557

5658
The CUDA default (NULL) stream is not exposed as a first-class runtime object because it is tied to implicit per-device
57-
state and encourages hidden dependencies. Instead, it can be wrapped into ``cuda::stream_ref`` when needed for interop.
59+
state and encourages hidden dependencies. Instead, it can be wrapped into :cpp:class:`cuda::stream_ref` when needed for
60+
interop.
5861

5962
.. note::
6063

docs/libcudacxx/runtime/device.rst

Lines changed: 18 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -3,27 +3,29 @@
33
Devices
44
=======
55

6-
``cuda::device_ref``
7-
---------------------
6+
:cpp:class:`cuda::device_ref`
7+
-------------------------------
88
.. _cccl-runtime-device-device-ref:
99

10-
``cuda::device_ref`` is a lightweight, non-owning handle to a CUDA device ordinal. It allows to query information about a device and serves as an argument to other runtime APIs which are tied to a specific device.
10+
:cpp:class:`cuda::device_ref` is a lightweight, non-owning handle to a CUDA device ordinal. It allows to query
11+
information about a device and serves as an argument to other runtime APIs which are tied to a specific device.
1112
It offers:
1213

1314
- ``get()``: native device ordinal
1415
- ``name()``: device name
1516
- ``init()``: initialize the device context
1617
- ``peers()``: list peers for which peer access can be enabled
17-
- ``has_peer_access_to(device_ref)``: query if peer access can be enabled to the given device
18+
- ``has_peer_access_to(cuda::device_ref)``: query if peer access can be enabled to the given device
1819
- ``attribute(attr)`` / ``attribute<::cudaDeviceAttr>()``: attribute queries
1920

2021
Availability: CCCL 3.1.0 / CUDA 13.1
2122

22-
``cuda::devices``
23-
------------------
23+
:cpp:var:`cuda::devices`
24+
----------------------------
2425
.. _cccl-runtime-device-devices:
2526

26-
``cuda::devices`` is a random-access view of all available CUDA devices in the form of ``cuda::device_ref`` objects. It
27+
:cpp:var:`cuda::devices` is a random-access view of all available CUDA devices in the form of
28+
:cpp:class:`cuda::device_ref` objects. It
2729
provides indexing, size, and iteration for use
2830
in range-based loops.
2931

@@ -47,7 +49,7 @@ Device attributes
4749
.. _cccl-runtime-device-attributes:
4850

4951
``cuda::device_attributes`` provides strongly-typed attribute query objects usable with
50-
``device_ref::attribute``. Selected examples:
52+
:cpp:func:`cuda::device_ref::attribute`. Selected examples:
5153

5254
- ``compute_capability``
5355
- ``multiprocessor_count``
@@ -67,18 +69,19 @@ Example:
6769
return cuda::device_attributes::multiprocessor_count(dev) * cuda::device_attributes::blocks_per_multiprocessor(dev);
6870
}
6971
70-
``cuda::arch_traits``
71-
---------------------
72+
:cpp:any:`cuda::arch_traits`
73+
--------------------------------
7274
.. _cccl-runtime-device-arch-traits:
7375

7476
Per-architecture trait accessors providing limits and capabilities common to all devices of an architecture.
75-
Compared to ``device_attributes``, ``cuda::arch_traits`` provide a compile-time accessible structure that describes common characteristics of all devices of an architecture, while attributes are run-time queries of a single characteristic of a specific device.
77+
Compared to ``cuda::device_attributes``, :cpp:any:`cuda::arch_traits` provide a compile-time accessible
78+
structure that describes common characteristics of all devices of an architecture, while attributes are run-time
79+
queries of a single characteristic of a specific device.
7680

77-
- ``cuda::arch_traits<cuda::arch_id::sm_80>()`` (compile-time) or
78-
``cuda::arch_traits_for(cuda::arch_id)`` / ``cuda::arch_traits_for(cuda::compute_capability)`` (run-time).
79-
- Returns a ``cuda::arch_traits_t`` with fields like
81+
- :cpp:any:`cuda::arch_traits` and :cpp:any:`cuda::arch_traits_for` (compile-time and run-time forms).
82+
- Returns a :cpp:struct:`cuda::arch_traits_t` with fields like
8083
``max_threads_per_block``, ``max_shared_memory_per_block``, ``cluster_supported`` and other capability flags.
81-
- Traits for the current architecture can be accessed with ``cuda::device::current_arch_traits()``
84+
- Traits for the current architecture can be accessed with :cpp:func:`cuda::device::current_arch_traits`
8285

8386
Availability: CCCL 3.1.0 / CUDA 13.1
8487

docs/libcudacxx/runtime/event.rst

Lines changed: 13 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -5,17 +5,17 @@ Events
55

66
Event is a snapshot of execution state of a stream. It can be used to synchronize work submitted to a stream up to a certain point, establish dependency between streams or measure time passed between two events.
77

8-
``cuda::event_ref``
8+
:cpp:class:`cuda::event_ref`
99
--------------------------------------------------
1010
.. _cccl-runtime-event-event-ref:
1111

12-
``cuda::event_ref`` is a non-owning wrapper around a ``cudaEvent_t``. It prevents unsafe implicit constructions from
12+
:cpp:class:`cuda::event_ref` is a non-owning wrapper around a ``cudaEvent_t``. It prevents unsafe implicit constructions from
1313
``nullptr`` or integer literals and provides convenient helpers:
1414

1515
- ``record(cuda::stream_ref)``: record the event on a stream
1616
- ``sync()``: wait for the recorded work to complete
1717
- ``is_done()``: non-blocking completion query
18-
- comparison operators against other ``event_ref`` or ``cudaEvent_t``
18+
- comparison operators against other :cpp:class:`cuda::event_ref` or ``cudaEvent_t``
1919

2020
Availability: CCCL 3.1.0 / CUDA 13.1
2121

@@ -30,13 +30,14 @@ Example:
3030
e.record(stream);
3131
}
3232
33-
``cuda::event``
33+
:cpp:class:`cuda::event`
3434
--------------------------------------------
3535
.. _cccl-runtime-event-event:
3636

37-
``cuda::event`` is an owning wrapper around a ``cudaEvent_t`` (with timing disabled). It inherits from ``event_ref`` and provides all of its functionality.
38-
It also creates and destroys the native event, can be moved (but not copied), and can release ownership via ``release()``. Construction can target a specific
39-
``cuda::device_ref`` or record immediately on a ``cuda::stream_ref``.
37+
:cpp:class:`cuda::event` is an owning wrapper around a ``cudaEvent_t`` (with timing disabled). It inherits from
38+
:cpp:class:`cuda::event_ref` and provides all of its functionality. It also creates and destroys the native event, can be moved (but
39+
not copied), and can release ownership via ``release()``. Construction can target a specific :cpp:class:`cuda::device_ref`
40+
or record immediately on a :cpp:class:`cuda::stream_ref`.
4041

4142
Availability: CCCL 3.1.0 / CUDA 13.1
4243

@@ -56,11 +57,13 @@ Availability: CCCL 3.1.0 / CUDA 13.1
5657
5758
.. _cccl-runtime-event-timed-event:
5859

59-
``cuda::timed_event``
60+
:cpp:class:`cuda::timed_event`
6061
-----------------------------------------------------
6162

62-
``cuda::timed_event`` is an owning wrapper for a timed ``cudaEvent_t``. It inherits from ``event`` and provides all of its functionality.
63-
It also supports elapsed-time queries between two events via ``operator-``, returning ``cuda::std::chrono::nanoseconds``.
63+
:cpp:class:`cuda::timed_event` is an owning wrapper for a timed ``cudaEvent_t``. It inherits from :cpp:class:`cuda::event` and provides
64+
all of its functionality.
65+
It also supports elapsed-time queries between two events via ``operator-``, returning
66+
:cpp:class:`cuda::std::chrono::nanoseconds`.
6467

6568
Availability: CCCL 3.1.0 / CUDA 13.1
6669

0 commit comments

Comments
 (0)