-
Notifications
You must be signed in to change notification settings - Fork 17
CP016: Sub groups proposal #75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
71c71fd
23dc44a
f08fbdc
4497bf8
56d5f4c
c85b168
d962acb
b9afd9a
40cc53c
88d23ea
411d1a0
3f6b332
84390e6
f9b2caa
ea38897
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
# Basic sub-group extension | ||
|
||
| Proposal ID | CP016 | | ||
|-------------|--------| | ||
| Name | Basic sub group extension | | ||
| Date of Creation | 14 September 2018 | | ||
| Target | SYCL 1.2.1 | | ||
| Current Status | _Work In Progress_ | | ||
| Reply-to | Ruyman Reyes <[email protected]> | | ||
| Original author | Ruyman Reyes <[email protected]> | | ||
| Contributors | Ruyman Reyes <[email protected]>, Gordon Brown <[email protected]>, Victor Lomuller <[email protected]> | | ||
|
||
## Overview | ||
|
||
This vendor extension aims to define an interface to expose sub-group functionality, | ||
as defined in the SYCL 2.2 provisional and the OpenCL 2.2 provisional, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OpenCL 2.2 isn't provisional (could probably also say "OpenCL 2.1" here as that is when subgroups were added to the main spec). |
||
in SYCL 1.2.1. | ||
|
||
The extension is only targeting OpenCL devices that expose | ||
`cl_codeplay_basic_subgroups` vendor extension. | ||
|
||
|
||
## References | ||
|
||
[1] SYCL 1.2.1 specification | ||
https://www.khronos.org/registry/SYCL/specs/sycl-1.2.1.pdf | ||
|
||
[2] SYCL 2.2 provisional specification (revision date 2016/02/15) | ||
https://www.khronos.org/registry/SYCL/specs/sycl-2.2.pdf | ||
|
||
[3] OpenCL 2.2 API specification | ||
https://www.khronos.org/registry/OpenCL/specs/2.2/pdf/OpenCL_API.pdf | ||
|
||
[4] OpenCL C++ 1.0 specification | ||
https://www.khronos.org/registry/OpenCL/specs/2.2/pdf/OpenCL_Cxx.pdf | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,157 @@ | ||
# Basic Sub group support | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OpenCL uses "Sub-group", not "Sub group". This is on many lines, I won't comment on them all. |
||
|
||
This proposal aims to define an interface for using OpenCL 2.2 sub groups in | ||
SYCL the provisional SYCL 1.2.1 specification, relying on the underlying | ||
OpenCL implementation supporting the extension `cl_codeplay_basic_subgroups`. | ||
|
||
The extension exposes to programmers the ability to identify sub-groups | ||
on a work-group, count the number of sub-groups available and perform | ||
a broadcast from one work-item on a sub-group to the rest. | ||
|
||
Details of the execution and memory model changes can be found in the | ||
documentation for the Codeplay's OpenCL vendor extension `cl_codeplay_basic_subgroups` | ||
once available. | ||
|
||
## Execution model | ||
|
||
When this vendor extension is available, the execution model of SYCL 1.2.1 | ||
is extended to also include support for sub-groups of threads inside of a | ||
work-group. | ||
Overall, these sub-groups work following the description of the OpenCL 2.2 | ||
sub-groups, with some restrictions: | ||
|
||
* The number of sub-groups available for each work-group is determined | ||
at compile-time and remains the same during the execution of the SYCL application. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not as familiar with the details of the SYCL spec, but looking at it now it seems that current use of "compile-time" means when compiling the SYCL program rather than the SYCL runtime calling I'm just unclear on the intended meaning of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
* The number of threads per sub-group is known at compile-time, and remains the | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This use of "threads" seems unusual. "The sub-group size is known ...` perhaps? |
||
same during execution of the SYCL application. | ||
* Only those functions defined in this proposal are available. | ||
In particular, there is no sub-group pipe communication. | ||
|
||
## Memory model | ||
|
||
Sub-groups can access global and local memory, but, given there is no | ||
memory-scope to the atomic or barriers operations in SYCL 1.2.1, there is no | ||
possibility to specify an equivalent of sub-group memory scope. | ||
|
||
## Namespace `basic_sub_group` | ||
|
||
All new functionality is exposed under the `basic_sub_group` namespace | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There doesn't seem to be any |
||
in the `codeplay` vendor extension namespace. | ||
When the vendor extension `basic_sub_group` is available, the macro | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OpenCL subgroup extensions used It may be valid to ignore that given the type |
||
`SYCL_CODEPLAY_BASIC_SUB_GROUP` is defined in the header. | ||
|
||
### Class `sub_group` | ||
|
||
The extension adds a new class template `sub_group` that identifies the | ||
sub group range and the current sub group id. | ||
It also for providing sub group barriers. | ||
|
||
```cpp | ||
namespace cl { | ||
namespace sycl { | ||
namespace codeplay { | ||
|
||
template <int Dimensions> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see that this comes from the SYCL 2.2 provisional specification, but for my own sanity: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or is it important that, for example, with a 3D ND-Range that sub-group also has |
||
class sub_group { | ||
public: | ||
|
||
constexpr range<Dimensions> get_sub_group_range() const; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I find this really restrictive, you basically ban compilation for generic targets. But I also understand the need if you know your underlying target. Could we have a in-between solution ? i.e. constexpr iff you know the underlying target and it properties |
||
|
||
constexpr size_t get_sub_group_range(int dimension) const; | ||
|
||
constexpr size_t get_sub_group_linear_range() const; | ||
|
||
id<Dimensions> get_sub_group_id() const; | ||
|
||
size_t get_sub_group_id(int dimension) const; | ||
|
||
size_t get_sub_group_linear_id() const; | ||
|
||
void barrier(access::fence_space accessSpace = access::fence_space::global_and_local) const; | ||
|
||
/* T is permitted to be int, unsigned int, long, unsigned long, | ||
float, half, double */ | ||
template <typename T> | ||
T broadcast(size_t subGroupId, T value); | ||
|
||
/* Predicate must be a callable type which returns bool */ | ||
template <typename Predicate> | ||
bool all_of(Predicate predicate) const; | ||
|
||
/* Predicate must be a callable type which returns bool */ | ||
template <typename Predicate> | ||
bool any_of(Predicate predicate) const; | ||
}; | ||
|
||
} // namespace codeplay | ||
} // namespace sycl | ||
} // namespace cl | ||
``` | ||
|
||
## Free functions | ||
|
||
```cpp | ||
namespace cl { | ||
namespace sycl { | ||
namespace codeplay { | ||
|
||
template <int Dimensions, T> | ||
T broadcast(sub_group<Dimensions> subGroup, size_t subGroupId, T value); | ||
|
||
template <int Dimensions, typename Predicate> | ||
bool all_of(sub_group<Dimensions> subGroup, Predicate predicate); | ||
|
||
template <int Dimensions, typename Predicate> | ||
bool any_of(sub_group<Dimensions> subGroup, Predicate predicate); | ||
|
||
template <int Dimensions> | ||
void barrier(sub_group<Dimensions> subGroup, access::fence_space accessSpace | ||
= access::fence_space::global_and_local) const; | ||
|
||
} // namespace codeplay | ||
} // namespace sycl | ||
} // namespace cl | ||
``` | ||
|
||
## Extensions to the nd\_item class | ||
|
||
Extensions to the `nd_item` interface will be exposed via the a derived `nd_item` class template in the `codeplay` vendor extension namespace. | ||
|
||
New member function `get_sub_group` for identifying the current sub group and gaining access to sub group operations. | ||
|
||
```cpp | ||
namespace cl { | ||
namespace sycl { | ||
namespace codeplay { | ||
|
||
template <int Dimensions> | ||
class nd_item : public ::cl::sycl::nd_item<Dimensions> { | ||
public: | ||
|
||
sub_group<Dimensions> get_sub_group() const; | ||
|
||
}; | ||
|
||
} // namespace codeplay | ||
} // namespace sycl | ||
} // namespace cl | ||
``` | ||
|
||
## Example | ||
|
||
Below is trivial example showing how you would use `sub_group` to broadcast a value from one work-item within a sub-group to all other work-items in the sub-group. | ||
|
||
```cpp | ||
using namespace cl::sycl; | ||
|
||
template <typename dimT> | ||
void my_subgroup_load(sub_group<dimT> subG, global_ptr<float> myArray) { | ||
|
||
float4 f; | ||
if (subG.get_id() == 0) { | ||
f.load(myArray); | ||
} | ||
barrier(subG, access::fence_space::global_and_local); | ||
float4 res = broadcast(subG, 0, f); | ||
} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comment: Seems like this proposal is missing device info properties?
info::device::max_num_sub_groups
andinfo::device::sub_group_independent_forward_progress