|
1 | 1 | **alpaka** - Abstraction Library for Parallel Kernel Acceleration |
2 | 2 | ================================================================= |
3 | 3 |
|
| 4 | +[](https://travis-ci.org/ComputationalRadiationPhysics/alpaka) |
| 5 | +[](https://ci.appveyor.com/project/BenjaminW3/alpaka-vuiya/branch/develop) |
| 6 | + |
4 | 7 | The **alpaka** library is a header-only C++11 abstraction library for accelerator development. |
5 | 8 |
|
6 | 9 | Its aim is to provide performance portability across accelerators through the abstraction (not hiding!) of the underlying levels of parallelism. |
@@ -42,73 +45,129 @@ Accelerator Back-ends |
42 | 45 | |Accelerator Back-end|Lib/API|Devices|Execution strategy grid-blocks|Execution strategy block-threads| |
43 | 46 | |---|---|---|---|---| |
44 | 47 | |Serial|n/a|Host CPU (single core)|sequential|sequential (only 1 thread per block)| |
45 | | -|OpenMP 2.0 blocks|OpenMP 2.0|Host CPU (multi core)|parallel (preemptive multitasking)|sequential (only 1 thread per block)| |
46 | | -|OpenMP 2.0 threads|OpenMP 2.0|Host CPU (multi core)|sequential|parallel (preemptive multitasking)| |
47 | | -|OpenMP 4.0 (CPU)|OpenMP 4.0|Host CPU (multi core)|parallel (undefined)|parallel (preemptive multitasking)| |
| 48 | +|OpenMP 2.0+ blocks|OpenMP 2.0+|Host CPU (multi core)|parallel (preemptive multitasking)|sequential (only 1 thread per block)| |
| 49 | +|OpenMP 2.0+ threads|OpenMP 2.0+|Host CPU (multi core)|sequential|parallel (preemptive multitasking)| |
| 50 | +|OpenMP 4.0+ (CPU)|OpenMP 4.0+|Host CPU (multi core)|parallel (undefined)|parallel (preemptive multitasking)| |
48 | 51 | | std::thread | std::thread |Host CPU (multi core)|sequential|parallel (preemptive multitasking)| |
49 | | -| Boost.Fiber | boost::fibers::fiber |Host CPU (single core)|sequential|parallel (cooperative multitasking)| |
50 | | -|CUDA 7.0|CUDA 7.0|NVIDIA GPUs SM 2.0+|parallel (undefined)|parallel (lock-step within warps)| |
| 52 | +|CUDA 7.0+|CUDA 7.0+|NVIDIA GPUs SM 2.0+|parallel (undefined)|parallel (lock-step within warps)| |
| 53 | +|TBB 2.2+ blocks|TBB 2.2+|Host CPU (multi core)|parallel (preemptive multitasking)|sequential (only 1 thread per block)| |
51 | 54 |
|
52 | 55 |
|
53 | 56 | Supported Compilers |
54 | 57 | ------------------- |
55 | 58 |
|
56 | 59 | This library uses C++11 (or newer when available). |
57 | 60 |
|
58 | | -|Accelerator Back-end|gcc 4.9.2|gcc 5.2|clang 3.5/3.6|clang 3.7|MSVC 2015| |
59 | | -|---|---|---|---|---|---| |
60 | | -|Serial|:white_check_mark:|:white_check_mark:|:white_check_mark:|:white_check_mark:|:white_check_mark:| |
61 | | -|OpenMP 2.0 blocks|:white_check_mark:|:white_check_mark:|:x:|:white_check_mark:|:white_check_mark:| |
62 | | -|OpenMP 2.0 threads|:white_check_mark:|:white_check_mark:|:x:|:white_check_mark:|:white_check_mark:| |
63 | | -|OpenMP 4.0 (CPU)|:white_check_mark:|:white_check_mark:|:x:|:x:|:x:| |
64 | | -| std::thread |:white_check_mark:|:white_check_mark:|:white_check_mark:|:white_check_mark:|:white_check_mark:| |
65 | | -| Boost.Fiber |:white_check_mark:|:white_check_mark:|:white_check_mark:|:white_check_mark:|:white_check_mark:| |
66 | | -|CUDA 7.0|:white_check_mark:|:x:|:x:|:x:|:x:| |
67 | | - |
68 | | -**NOTE**: :bangbang: Currently the *CUDA accelerator back-end* can not be enabled together with the *std::thread accelerator back-end* or the *Boost.Fiber accelerator back-end* due to bugs in the NVIDIA nvcc compiler :bangbang: |
69 | | - |
70 | | -Build status master branch: [](https://travis-ci.org/ComputationalRadiationPhysics/alpaka) |
71 | | - |
72 | | -Build status develop branch: [](https://travis-ci.org/ComputationalRadiationPhysics/alpaka) |
| 61 | +|Accelerator Back-end|gcc 4.9.2|gcc 5.3|gcc 6.1|clang 3.5/3.6|clang 3.7|clang 3.8|MSVC 2015| |
| 62 | +|---|---|---|---|---|---|---|---| |
| 63 | +|Serial|:white_check_mark:|:white_check_mark:|:white_check_mark:|:white_check_mark:|:white_check_mark:|:white_check_mark:|:white_check_mark:| |
| 64 | +|OpenMP 2.0+ blocks|:white_check_mark:|:white_check_mark:|:white_check_mark:|:x:|:white_check_mark:|:white_check_mark:|:white_check_mark:| |
| 65 | +|OpenMP 2.0+ threads|:white_check_mark:|:white_check_mark:|:white_check_mark:|:x:|:white_check_mark:|:white_check_mark:|:white_check_mark:| |
| 66 | +|OpenMP 4.0+ (CPU)|:white_check_mark:|:white_check_mark:|:white_check_mark:|:x:|:x:|:x:|:x:| |
| 67 | +| std::thread |:white_check_mark:|:white_check_mark:|:white_check_mark:|:white_check_mark:|:white_check_mark:|:white_check_mark:|:white_check_mark:| |
| 68 | +|CUDA 7.0+|:white_check_mark: (nvcc 7.0+)|:x:|:x:|:x:|:x:|:white_check_mark: (native)|:x:| |
| 69 | +|TBB 2.2+|:white_check_mark:|:white_check_mark:|:white_check_mark:|:white_check_mark:|:white_check_mark:|:white_check_mark:|:grey_question:| |
73 | 70 |
|
74 | 71 |
|
75 | 72 | Dependencies |
76 | 73 | ------------ |
77 | 74 |
|
78 | | -[Boost](http://boost.org/) 1.56+ is the only mandatory external dependency. |
| 75 | +[Boost](http://boost.org/) 1.59+ is the only mandatory external dependency. |
79 | 76 | The **alpaka** library itself just requires header-only libraries. |
80 | 77 | However some of the accelerator back-end implementations require different boost libraries to be built. |
81 | 78 |
|
82 | | -When an accelerator back-end using *Boost.Fiber* is enabled, the develop branch of boost and the proposed boost library [`boost-fibers`](https://github.com/olk/boost-fiber) (develop branch) are required. |
83 | | -`boost-fibers`, `boost-context` and all of its dependencies are required to be build in C++14 mode `./b2 cxxflags="-std=c++14"`. |
84 | | - |
85 | 79 | When an accelerator back-end using *CUDA* is enabled, version *7.0* of the *CUDA SDK* is the minimum requirement. |
| 80 | +*NOTE*: When using *CUDA* 7.0, the *CUDA accelerator back-end* can not be enabled together with the *std::thread accelerator back-end* or the *Boost.Fiber accelerator back-end* due to bugs in the nvcc compiler. |
86 | 81 |
|
87 | | -When an accelerator back-end using *OpenMP 2.0/4.0* is enabled, the compiler and the platform have to support the corresponding *OpenMP* version. |
| 82 | +When an accelerator back-end using *OpenMP* is enabled, the compiler and the platform have to support the corresponding minimum *OpenMP* version. |
| 83 | + |
| 84 | +When an accelerator back-end using *TBB* is enabled, the compiler and the platform have to support the corresponding minimum *TBB* version. |
88 | 85 |
|
89 | 86 |
|
90 | 87 | Usage |
91 | 88 | ----- |
92 | 89 |
|
93 | 90 | The library is header only so nothing has to be build. |
94 | 91 | CMake 3.3.0+ is required to provide the correct defines and include paths. |
95 | | -Just call `ALPAKA_ADD_EXECUTABLE` instead of `CUDA_ADD_EXECUTABLE` or `ADD_EXECUTABLE` and the difficulties of the CUDA nvcc compier in handling `.cu` and `.cpp` files is automatically taken care of. |
96 | | -Examples how to utilize alpaka within CMake can be found in the `examples` folder. |
97 | | - |
| 92 | +Just call `ALPAKA_ADD_EXECUTABLE` instead of `CUDA_ADD_EXECUTABLE` or `ADD_EXECUTABLE` and the difficulties of the CUDA nvcc compiler in handling `.cu` and `.cpp` files are automatically taken care of. |
98 | 93 | Source files do not need any special file ending. |
| 94 | +Examples of how to utilize alpaka within CMake can be found in the `example` folder. |
| 95 | + |
99 | 96 | The whole alpaka library can be included with: `#include <alpaka/alpaka.hpp>` |
100 | | -Code not intended to be utilized by users is hidden in the `detail` namespace. |
| 97 | +Code that is not intended to be utilized by the user is hidden in the `detail` namespace. |
| 98 | + |
| 99 | + |
| 100 | +Introduction |
| 101 | +------------ |
| 102 | + |
| 103 | +For a quick introduction, feel free to playback the recording of our presentation at |
| 104 | +[GTC 2016](http://mygtc.gputechconf.com/quicklink/858sI36): |
| 105 | + |
| 106 | + - E. Zenker, R. Widera, G. Juckeland et al., |
| 107 | + *Porting the Plasma Simulation PIConGPU to Heterogeneous Architectures with Alpaka*, |
| 108 | + [video link (39 min)](http://on-demand.gputechconf.com/gtc/2016/video/S6298.html) |
| 109 | + |
| 110 | + |
| 111 | +Citing alpaka |
| 112 | +------------- |
| 113 | + |
| 114 | +Currently all authors of **alpaka** are scientists or connected with |
| 115 | +research. For us to justify the importance and impact of our work, please |
| 116 | +consider citing us accordingly in your derived work and publications: |
| 117 | + |
| 118 | +```latex |
| 119 | +% Peer-Reviewed Publication %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
| 120 | +% |
| 121 | +% Peer reviewed and accepted publication in |
| 122 | +% "The Sixth International Workshop on |
| 123 | +% Accelerators and Hybrid Exascale Systems (AsHES)" |
| 124 | +% at the |
| 125 | +% "30th IEEE International Parallel and Distributed |
| 126 | +% Processing Symposium" in Chicago, IL, USA |
| 127 | +@inproceedings{ZenkerAsHES2016, |
| 128 | + author = {Erik Zenker and Benjamin Worpitz and Ren{\'{e}} Widera |
| 129 | + and Axel Huebl and Guido Juckeland and |
| 130 | + Andreas Kn{\"{u}}pfer and Wolfgang E. Nagel and Michael Bussmann}, |
| 131 | + title = {Alpaka - An Abstraction Library for Parallel Kernel Acceleration}, |
| 132 | + archivePrefix = "arXiv", |
| 133 | + eprint = {1602.08477}, |
| 134 | + keywords = {Computer science;CUDA;Mathematical Software;nVidia;OpenMP;Package; |
| 135 | + performance portability;Portability;Tesla K20;Tesla K80}, |
| 136 | + day = {23}, |
| 137 | + month = {May}, |
| 138 | + year = {2016}, |
| 139 | + publisher = {IEEE Computer Society}, |
| 140 | + url = {http://arxiv.org/abs/1602.08477}, |
| 141 | +} |
| 142 | +
|
| 143 | +
|
| 144 | +% Original Work: Benjamin Worpitz' Master Thesis %%%%%%%%%% |
| 145 | +% |
| 146 | +@MasterThesis{Worpitz2015, |
| 147 | + author = {Benjamin Worpitz}, |
| 148 | + title = {Investigating performance portability of a highly scalable |
| 149 | + particle-in-cell simulation code on various multi-core |
| 150 | + architectures}, |
| 151 | + school = {{Technische Universit{\"{a}}t Dresden}}, |
| 152 | + month = {Sep}, |
| 153 | + year = {2015}, |
| 154 | + type = {Master Thesis}, |
| 155 | + doi = {10.5281/zenodo.49768}, |
| 156 | + url = {http://dx.doi.org/10.5281/zenodo.49768} |
| 157 | +} |
| 158 | +``` |
101 | 159 |
|
102 | 160 |
|
103 | 161 | Authors |
104 | 162 | ------- |
105 | 163 |
|
106 | | -### Maintainers and core developers |
| 164 | +### Maintainers and Core Developers |
107 | 165 |
|
108 | | -- Benjamin Worpitz |
| 166 | +- Benjamin Worpitz (original author) |
| 167 | +- Erik Zenker |
| 168 | +- Rene Widera |
109 | 169 |
|
110 | | -### Participants, Former Members and Thanks |
| 170 | +### Former Members, Contributions and Thanks |
111 | 171 |
|
112 | 172 | - Dr. Michael Bussmann |
113 | | -- Rene Widera |
114 | 173 | - Axel Huebl |
0 commit comments