This repository was archived by the owner on Apr 15, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathKokkos_PG_Compilation.tex
More file actions
233 lines (189 loc) · 11.7 KB
/
Kokkos_PG_Compilation.tex
File metadata and controls
233 lines (189 loc) · 11.7 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
\chapter{Compiling}\label{C:build}
This chapter explains how to compile Kokkos, and how to link your
application against Kokkos. Kokkos supports two build systems:
\begin{itemize}
\item Using the embedded Makefile
\item Trilinos' CMake build system
\end{itemize}
Note that the two explicitly supported build methods should not be
mixed. For example, do not include the embedded Makefile in your
application build process, while explicitly linking against a
pre-compiled Kokkos library in Trilinos. We also include specific
advice for building for NVIDIA GPUs, and for Intel Xeon Phi.
\section{General Information}\label{S:build:gen}
Kokkos consists mainly of header files.
Only a few functions have to be compiled into object files outside of the application's source code.
Those functions are contained in \verb!.cpp! files inside the \lstinline|kokkos/core/src| directory and its subdirectories.
The files are internally protected with macros to prevent compilation if the related execution space is not enabled.
Thus, it is not necessary to create a list of included object files specific to your compilation target.
One may simply compile all \verb!.cpp! files.
The enabled features are controlled via macros which have to be provided in the compilation line or in the \lstinline|KokkosCore_config.h| include file.
A list of macros can be found in Table \ref{TBL:CompileMacros}.
\begin{table}
\caption{Table of configuration Macros}
\label{TBL:CompileMacros}
\begin{small}
\begin{tabular}[t]{p{0.18\textwidth}p{0.3\textwidth}p{0.42\textwidth}}
\hline\hline
Macro & Effect & Comment \\
\hline
{\tiny KOKKOS\_HAVE\_CUDA} & Enable the CUDA execution space. & Requires a compiler capable of understanding CUDA-C. See Section \ref{S:build:CUDA}. \\
{\tiny KOKKOS\_HAVE\_OPENMP} & Enable the OpenMP execution space. & Requires the compiler to support OpenMP (e.g., \verb!-fopenmp!). \\
{\tiny KOKKOS\_HAVE\_PTHREADS} & Enable the Threads execution space. & Requires linking with libpthread.\\
{\tiny KOKKOS\_HAVE\_Serial} & Enable the Serial execution space. & \\
{\tiny KOKKOS\_HAVE\_CXX11} & Enable internal usage of C++11 features. & The code needs to be compile with the C++11 standard. Most compilers accept the \verb!-std=c++11! flag for this.\\
{\tiny KOKKOS\_HAVE\_HWLOC} & Enable thread and memory pinning via hwloc. & Requires linking with libhwloc.\\
\hline\hline
\end{tabular}
\end{small}
\end{table}
In order to compile Kokkos a C++11 compliant compiler is needed.
For an up to date list of compilers which are tested on a nightly basis, please refer to the README on the github repository.
At the time of writing supported compilers include:
\begin{lstlisting}
GCC 4.7.2, 4.8.x, 4.9.x, 5.1.x, 5.3.x
Intel 14.x, 15.x, 16.x, 17.x
Clang 3.5.1, 3.6.x, 3.7.x, 3.8.x, 3.9.x;
Cuda 7.0, 7.5, 8.0;
XL 13.3;
\end{lstlisting}
\section{Using Kokkos' Makefile system}\label{S:build:make}
The base of the build system is the Makefile.kokkos, which is designed to be included by application Makefiles.
It contains logic to (re)generate the KokkosCore\_config.h file if necessary, build the Kokkos library, and provide updated compiler and linker flags.
The system can digest a number of variables which are used to configure Kokkos settings.
Generally the variables are parsed for Keywords.
This allows for multiple options being given for each variable.
The separator doesn't matter as long as it doesn't interact with the Make system.
A list of variables, their meaning and options is given in table \ref{TBL:MakefileOptions}.
\begin{table}
\caption{Variables for the embedded Makefile}
\label{TBL:MakefileOptions}
\begin{small}
\begin{tabular}[t]{p{0.3\textwidth}p{0.65\textwidth}}
\hline\hline
Variable & Description \\
\hline
KOKKOS\_PATH (IN): & Path to the Kokkos root or install directory.
One can either build against an existing install
of Kokkos or use its source directly for an
embedded build. In the former case the "Input variables"
are set inside the embedded Makefile.kokkos and it
is not valid to set them differently in the including
Makefile.\\
\hline
CUDA\_PATH (IN): & Path to the Cuda toolkit root directory.\\
\hline
KOKKOS\_DEVICES (IN): & What Execution and Memory Spaces should be enabled. \\
\hspace{0.5cm}Options: & OpenMP,Serial,Pthreads,Cuda \\
\hspace{0.5cm}Default: & OpenMP \\
\hline
KOKKOS\_ARCH (IN): & What backend Architecture to build for. \\
\hspace{0.5cm}Options: & KNL,KNC,SNB,HSW,BDW,Kepler,Kepler30,Kepler35,Kepler37,Maxwell,
Maxwell50,Pascal60,Pascal61,ARMv8,ARMv81,ARMv8-ThunderX,BGQ,Power7,Power8 \\
\hspace{0.5cm}Default: & None -- this means no particular architecture flags
are set. \\
\hline
KOKKOS\_USE\_TPLS (IN): & Enable optional third party libraries. \\
\hspace{0.5cm}Options: & hwloc,librt,experimental\_memkind \\
\hspace{0.5cm}Default: & \\
\hline
KOKKOS\_OPTIONS (IN): & Enable optional settings \\
\hspace{0.5cm}Options: & aggressive\_vectorization \\
\hspace{0.5cm}Default: & \\
\hline
KOKKOS\_CUDA\_OPTIONS (IN): & Enable optional settings specific to CUDA. \\
\hspace{0.5cm}Options: & force\_uvm,use\_ldg,rdc,enable\_lambda \\
\hspace{0.5cm}Default: & \\
\hline
HWLOC\_PATH (IN): & Path to the hardware locality library if enabled. \\
\hline
KOKKOS\_DEBUG (IN): & Enable debugging. \\
\hspace{0.5cm}Options: & yes,no \\
\hspace{0.5cm}Default: & no \\
\hline
KOKKOS\_CXX\_STANDARD (IN): & Set the C++ standard to be used. \\
\hspace{0.5cm}Options: & c++11 \\
\hspace{0.5cm}Default: & c++11 \\
\hline\hline
KOKKOS\_CPPFLAGS (OUT): & Pre processor flags (include directories and defines).
Add this to applications compiler and preprocessor flags. \\
\hline
KOKKOS\_CXXFLAGS (OUT): & Compiler flags. Add this to the applications compiler flags. \\
\hline
KOKKOS\_LDFLAGS (OUT): & Linker flags. Add this to the applications linker flags. \\
\hline
KOKKOS\_LIBS (OUT): & Libraries required by Kokkos. Add this to the link line after the linker flags. \\
\hline
KOKKOS\_CPP\_DEPENDS (OUT): & Dependencies for compilation units which include any Kokkos header files.
Add this as a dependency to compilation targets including any Kokkos code. \\
\hline
KOKKOS\_LINK\_DEPENDS (OUT): & Dependencies of an application linking in the Kokkos library.
Add this to the dependency list of link targets. \\
\hline\hline
CXXFLAGS (IN): & User provided compiler flags which will be used to compile the Kokkos library. \\
\hline
CXX (IN): & The compiler used to compile the Kokkos library. \\
\hline\hline
\end{tabular}
\end{small}
\end{table}
A word of caution on where to include the Makefile.kokkos:
since the embedded Makefiles defines targets it is usually
better to include it after the first application target
has been defined. Since that target can't use the flags
from the embedded Makefiles it should be a meta target:
\begin{lstlisting}
CXX=g++
default: main
include Makefile.kokkos
main: $(KOKKOS_LINK_DEPENDS) $(KOKKOS_CPP_DEPENDS) main.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) \
$(KOKKOS_LDFLAGS) $(KOKKOS_LIBS) main.cpp -o main
\end{lstlisting}
More example application Makefiles can be found in the tutorial examples under \verb!kokkos/example/tutorial!.
Kokkos provides a script \lstinline|generate_makefile.bash| which can generate a Makefile for building and installing the library
as well as building and running the tests. Please run \lstinline|generate_makefile.bash --help| for options.
Note that paths given to the script must be absolute paths, and the script must be run with the \lstinline|bash| shell (the script will
do it if it is run directly i.e. as \lstinline|./generate_makefile.bash|.
\section{Using Trilinos' CMake build system}\label{S:build:Trilinos}
The Trilinos project (see \url{trilinos.org}) is an effort to develop
algorithms and enabling technologies within an object-oriented
software framework for the solution of large-scale, complex
multiphysics engineering and scientific problems. Trilinos is
organized into packages. Even though Kokkos is a stand-alone software
project, Trilinos uses Kokkos extensively. Thus, Trilinos' source
code includes Kokkos' source code, and builds Kokkos as part of its
build process.
Trilinos' build system uses CMake. Thus, in order to build Kokkos as
part of Trilinos, you must first install CMake (version
\texttt{2.8.12} or newer; CMake \texttt{3.x} works).
To enable Kokkos when building Trilinos, set the CMake option \verb!Trilinos_ENABLE_Kokkos!.
Trilinos' build system lets packages express dependencies on other packages or external libraries.
If you enable any Trilinos package (e.g., Tpetra) that has a required dependency on Kokkos,
Trilinos will enable Kokkos automatically.
Configuration macros are automatically inferred from Trilinos settings.
For example, if the CMake option \lstinline|Trilinos_ENABLE_OpenMP| is ON, Trilinos will define the macro \lstinline|KOKKOS_HAVE_OPENMP|.
Trilinos' build system will autogenerate the previously mentioned \lstinline|KokkosCore_config.h| file that contains those macros.
We refer readers to Trilinos' documentation for details. Also, the
\texttt{kokkos/config} directory includes examples of Trilinos
configuration scripts.
\section{Building for CUDA}\label{S:build:CUDA}
Any Kokkos application compiled for CUDA embeds CUDA code via template meta-programming.
Thus, the whole application must be built with a CUDA-capable compiler.
(At the moment, the only such compilers are NVIDIA's NVCC and Clang 4.0 [not released yet at time of writing].)
More precisely, every compilation unit containing a Kokkos kernel or a function called from a Kokkos kernel has to be compiled with a CUDA-capable compiler.
This includes files containing \lstinline|Kokkos::View| allocations, which call an initialization kernel.
The current version of NVCC has some shortcomings when used as the main compiler for a project, in particular when part of a complex build system.
For example, it does not understand most GCC command-line options, which must be prepended by the \lstinline|-Xcompiler| flag when calling NVCC.
Kokkos comes with a shell script, called \lstinline|nvcc_wrapper|, that wraps NVCC to address these issues.
We intend this as a drop-in replacement for a normal GCC-compatible compiler (e.g., GCC or Intel) in your build system.
It analyzes the provided command-line options and prepends them correctly.
It also adds the correct flags for compiling generic C++ files containing CUDA code (e.g., \verb!*.cpp!, \verb!*.cxx!, or \verb!*.CC!).
By default \lstinline|nvcc_wrapper| calls \verb!g++! as the host compiler.
You may override this by providing NVCC's '\lstinline|-ccbin|' option as a compiler flag.
The default can be set by editing the script itself or by setting the environment variable \lstinline|NVCC_WRAPPER_DEFAULT_COMPILER|.
Many people use a system like Environment Modules (see \\ \url{http://modules.sourceforge.net/}) to manage their shell environment.
When using a module system, it can be useful to provide different versions for different back-end compiler types (e.g., \verb!icpc!, \verb!pgc++!, \verb!g++!, and \verb!clang!).
To use the \lstinline|nvcc_wrapper| in conjunction with MPI wrappers, simply overwrite which C++ compiler is called by the MPI wrapper.
For example, you can reset OpenMPI's C++ compiler by setting the \lstinline|OMPI_CXX| environment variable.
Make sure that \lstinline|nvcc_wrapper| calls the host compiler with which the MPI library was compiled.