Skip to content

Commit fda5671

Browse files
authored
[Docs, CUDA] Update cuda compilation docs (llvm#191271)
1 parent 5f27de8 commit fda5671

File tree

1 file changed

+29
-24
lines changed

1 file changed

+29
-24
lines changed

llvm/docs/CompileCudaWithLLVM.rst

Lines changed: 29 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -22,22 +22,24 @@ Compiling CUDA Code
2222
Prerequisites
2323
-------------
2424

25-
CUDA is supported since llvm 3.9. Clang currently supports CUDA 7.0 through
26-
12.1. If clang detects a newer CUDA version, it will issue a warning and will
27-
attempt to use detected CUDA SDK it as if it were CUDA 12.1.
25+
CUDA has been supported since LLVM 3.9. Clang typically supports the recent
26+
major CUDA releases, though the support for the most recent versions may need
27+
Clang compiled from recent sources. If clang detects a newer CUDA version,
28+
it will issue a warning and will make a best-effort attempt to use detected
29+
CUDA SDK as if it were the most recent version supported by Clang.
2830

29-
Before you build CUDA code, you'll need to have installed the CUDA SDK. See
31+
Before building CUDA code, you'll need to have installed the CUDA SDK. See
3032
`NVIDIA's CUDA installation guide
3133
<https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html>`_ for
32-
details. Note that clang `maynot support
34+
details. Note that clang `may not support
3335
<https://bugs.llvm.org/show_bug.cgi?id=26966>`_ the CUDA toolkit as installed by
3436
some Linux package managers. Clang does attempt to deal with specific details of
3537
CUDA installation on a handful of common Linux distributions, but in general the
3638
most reliable way to make it work is to install CUDA in a single directory from
3739
NVIDIA's `.run` package and specify its location via `--cuda-path=...` argument.
3840

39-
CUDA compilation is supported on Linux. Compilation on macOS and Windows may or
40-
may not work and currently have no maintainers.
41+
CUDA compilation is fully supported on Linux. Compilation on Windows should work, but your mileage may vary.
42+
Compilation on macOS is no longer supported as CUDA support has been dropped by NVIDIA.
4143

4244
Invoking clang
4345
--------------
@@ -55,7 +57,7 @@ brackets as described below:
5557

5658
.. code-block:: console
5759
58-
$ clang++ axpy.cu -o axpy --cuda-gpu-arch=<GPU arch> \
60+
$ clang++ axpy.cu -o axpy --offload-arch=<GPU arch> \
5961
-L<CUDA install path>/<lib64 or lib> \
6062
-lcudart_static -ldl -lrt -pthread
6163
$ ./axpy
@@ -64,9 +66,10 @@ brackets as described below:
6466
y[2] = 6
6567
y[3] = 8
6668
67-
On macOS, replace `-lcudart_static` with `-lcudart`; otherwise, you may get
68-
"CUDA driver version is insufficient for CUDA runtime version" errors when you
69-
run your program.
69+
Note that it has to be `clang++` as CUDA headers rely on C++ features.
70+
71+
.. note::
72+
macOS is no longer supported for CUDA compilation.
7073

7174
* ``<CUDA install path>`` -- the directory where you installed CUDA SDK.
7275
Typically, ``/usr/local/cuda``.
@@ -80,15 +83,18 @@ run your program.
8083

8184
* ``<GPU arch>`` -- the `compute capability
8285
<https://developer.nvidia.com/cuda-gpus>`_ of your GPU. For example, if you
83-
want to run your program on a GPU with compute capability of 3.5, specify
84-
``--cuda-gpu-arch=sm_35``.
86+
want to run your program on a GPU with compute capability of 8.0, specify
87+
``--offload-arch=sm_80``.
88+
89+
Note: You cannot pass ``compute_XX`` as an argument to ``--offload-arch``;
90+
only ``sm_XX`` is currently supported.
8591

86-
Note: You cannot pass ``compute_XX`` as an argument to ``--cuda-gpu-arch``;
87-
only ``sm_XX`` is currently supported. However, clang always includes PTX in
88-
its binaries, so e.g. a binary compiled with ``--cuda-gpu-arch=sm_30`` would be
89-
forwards-compatible with e.g. ``sm_35`` GPUs.
92+
CUDA compilation no longer includes PTX by default. If you want to enable it,
93+
use ``--cuda-include-ptx=all|sm_*``. For example, a binary compiled with
94+
``--offload-arch=sm_80`` would need ``--cuda-include-ptx=sm_80`` (or ``all``)
95+
to be forwards-compatible with e.g. ``sm_86`` GPUs.
9096

91-
You can pass ``--cuda-gpu-arch`` multiple times to compile for multiple archs.
97+
You can pass ``--offload-arch`` multiple times to compile for multiple archs.
9298

9399
The `-L` and `-l` flags only need to be passed when linking. When compiling,
94100
you may also need to pass ``--cuda-path=/path/to/cuda`` if you didn't install
@@ -177,11 +183,10 @@ nvcc does not officially support ``std::complex``. It's an error to use
177183
__device__`` code due to nvcc's interpretation of the "wrong-side rule" (see
178184
below). However, we have heard from implementers that it's possible to get
179185
into situations where nvcc will omit a call to an ``std::complex`` function,
180-
especially when compiling without optimizations.
186+
especially when compiling without optimizations. Using ``--expt-relaxed-constexpr``
187+
may help.
181188

182-
As of 2016-11-16, clang supports ``std::complex`` without these caveats. It is
183-
tested with libstdc++ 4.8.5 and newer, but is known to work only with libc++
184-
newer than 2016-11-16.
189+
Clang supports ``std::complex`` without these caveats.
185190

186191
``<algorithm>``
187192
---------------
@@ -270,8 +275,8 @@ compilation steps.
270275
* Invoke ``ptxas`` to generate a SASS file, ``S_arch``. Note that, unlike
271276
nvcc, clang always generates SASS code.
272277

273-
* Invoke ``fatbin`` to combine all ``P_arch`` and ``S_arch`` files into a
274-
single fat binary file, ``F``.
278+
* Invoke ``fatbin`` to combine all ``S_arch`` files (and ``P_arch`` files if
279+
PTX inclusion was requested) into a single fat binary file, ``F``.
275280

276281
* Compile ``H`` using clang. ``__device__`` code is parsed and must be
277282
semantically correct, even though we're not generating code for the device

0 commit comments

Comments
 (0)