@@ -22,22 +22,24 @@ Compiling CUDA Code
2222Prerequisites
2323-------------
2424
25- CUDA is supported since llvm 3.9. Clang currently supports CUDA 7.0 through
26- 12.1. If clang detects a newer CUDA version, it will issue a warning and will
27- attempt to use detected CUDA SDK it as if it were CUDA 12.1.
25+ CUDA has been supported since LLVM 3.9. Clang typically supports the recent
26+ major CUDA releases, though the support for the most recent versions may need
27+ Clang compiled from recent sources. If clang detects a newer CUDA version,
28+ it will issue a warning and will make a best-effort attempt to use detected
29+ CUDA SDK as if it were the most recent version supported by Clang.
2830
29- Before you build CUDA code, you'll need to have installed the CUDA SDK. See
31+ Before building CUDA code, you'll need to have installed the CUDA SDK. See
3032`NVIDIA's CUDA installation guide
3133<https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html> `_ for
32- details. Note that clang `maynot support
34+ details. Note that clang `may not support
3335<https://bugs.llvm.org/show_bug.cgi?id=26966> `_ the CUDA toolkit as installed by
3436some Linux package managers. Clang does attempt to deal with specific details of
3537CUDA installation on a handful of common Linux distributions, but in general the
3638most reliable way to make it work is to install CUDA in a single directory from
3739NVIDIA's `.run ` package and specify its location via `--cuda-path=... ` argument.
3840
39- CUDA compilation is supported on Linux. Compilation on macOS and Windows may or
40- may not work and currently have no maintainers .
41+ CUDA compilation is fully supported on Linux. Compilation on Windows should work, but your mileage may vary.
42+ Compilation on macOS is no longer supported as CUDA support has been dropped by NVIDIA .
4143
4244Invoking clang
4345--------------
@@ -55,7 +57,7 @@ brackets as described below:
5557
5658.. code-block :: console
5759
58- $ clang++ axpy.cu -o axpy --cuda-gpu -arch=<GPU arch> \
60+ $ clang++ axpy.cu -o axpy --offload -arch=<GPU arch> \
5961 -L<CUDA install path>/<lib64 or lib> \
6062 -lcudart_static -ldl -lrt -pthread
6163 $ ./axpy
@@ -64,9 +66,10 @@ brackets as described below:
6466 y[2] = 6
6567 y[3] = 8
6668
67- On macOS, replace `-lcudart_static ` with `-lcudart `; otherwise, you may get
68- "CUDA driver version is insufficient for CUDA runtime version" errors when you
69- run your program.
69+ Note that it has to be `clang++ ` as CUDA headers rely on C++ features.
70+
71+ .. note ::
72+ macOS is no longer supported for CUDA compilation.
7073
7174* ``<CUDA install path> `` -- the directory where you installed CUDA SDK.
7275 Typically, ``/usr/local/cuda ``.
@@ -80,15 +83,18 @@ run your program.
8083
8184* ``<GPU arch> `` -- the `compute capability
8285 <https://developer.nvidia.com/cuda-gpus> `_ of your GPU. For example, if you
83- want to run your program on a GPU with compute capability of 3.5, specify
84- ``--cuda-gpu-arch=sm_35 ``.
86+ want to run your program on a GPU with compute capability of 8.0, specify
87+ ``--offload-arch=sm_80 ``.
88+
89+ Note: You cannot pass ``compute_XX `` as an argument to ``--offload-arch ``;
90+ only ``sm_XX `` is currently supported.
8591
86- Note: You cannot pass `` compute_XX `` as an argument to `` --cuda-gpu-arch ``;
87- only ``sm_XX `` is currently supported. However, clang always includes PTX in
88- its binaries, so e.g. a binary compiled with ``--cuda-gpu-arch=sm_30 `` would be
89- forwards-compatible with e.g. ``sm_35 `` GPUs.
92+ CUDA compilation no longer includes PTX by default. If you want to enable it,
93+ use ``--cuda-include-ptx=all|sm_* ``. For example, a binary compiled with
94+ `` --offload-arch=sm_80 `` would need ``--cuda-include-ptx=sm_80 `` (or `` all ``)
95+ to be forwards-compatible with e.g. ``sm_86 `` GPUs.
9096
91- You can pass ``--cuda-gpu -arch `` multiple times to compile for multiple archs.
97+ You can pass ``--offload -arch `` multiple times to compile for multiple archs.
9298
9399The `-L ` and `-l ` flags only need to be passed when linking. When compiling,
94100you may also need to pass ``--cuda-path=/path/to/cuda `` if you didn't install
@@ -177,11 +183,10 @@ nvcc does not officially support ``std::complex``. It's an error to use
177183__device__ `` code due to nvcc's interpretation of the "wrong-side rule" (see
178184below). However, we have heard from implementers that it's possible to get
179185into situations where nvcc will omit a call to an ``std::complex `` function,
180- especially when compiling without optimizations.
186+ especially when compiling without optimizations. Using ``--expt-relaxed-constexpr ``
187+ may help.
181188
182- As of 2016-11-16, clang supports ``std::complex `` without these caveats. It is
183- tested with libstdc++ 4.8.5 and newer, but is known to work only with libc++
184- newer than 2016-11-16.
189+ Clang supports ``std::complex `` without these caveats.
185190
186191``<algorithm> ``
187192---------------
@@ -270,8 +275,8 @@ compilation steps.
270275 * Invoke ``ptxas `` to generate a SASS file, ``S_arch ``. Note that, unlike
271276 nvcc, clang always generates SASS code.
272277
273- * Invoke ``fatbin `` to combine all ``P_arch `` and ``S_arch `` files into a
274- single fat binary file, ``F ``.
278+ * Invoke ``fatbin `` to combine all ``S_arch `` files ( and ``P_arch `` files if
279+ PTX inclusion was requested) into a single fat binary file, ``F ``.
275280
276281 * Compile ``H `` using clang. ``__device__ `` code is parsed and must be
277282 semantically correct, even though we're not generating code for the device
0 commit comments