Skip to content

CP016: Sub groups proposal #75

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,3 +54,4 @@ Each proposal in the table below will be tagged with one of the following states
| CP013 | [Supporting Heterogeneous & Distributed Computing Through Affinity](affinity/index.md) | ISO C++ SG1, SG14 | 15 November 2017 | 12 August 2018 | _Work in Progress_ |
| CP014 | [Shared Virtual Memory](svm/index.md) | SYCL 2.2 | 22 January 2018 | 22 January 2018 | _Work in Progress_ |
| CP015 | [Specialization Constant](spec-constant/index.md) | SYCL 1.2.1 extension / SYCL 2.2 | 24 April 2018 | 24 April 2018 | _Work in Progress_ |
| CP016 | [Sub Groups](spec-constant/index.md) | SYCL 1.2.1 extension | 14 September 2018 | 14 September 2018 | _Work in Progress_ |
36 changes: 36 additions & 0 deletions sub-groups/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Basic sub-group extension
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment: Seems like this proposal is missing device info properties?

info::device::max_num_sub_groups and info::device::sub_group_independent_forward_progress


| Proposal ID | CP016 |
|-------------|--------|
| Name | Basic sub group extension |
| Date of Creation | 14 September 2018 |
| Target | SYCL 1.2.1 |
| Current Status | _Work In Progress_ |
| Reply-to | Ruyman Reyes <[email protected]> |
| Original author | Ruyman Reyes <[email protected]> |
| Contributors | Ruyman Reyes <[email protected]>, Gordon Brown <[email protected]>, Victor Lomuller <[email protected]> |

## Overview

This vendor extension aims to define an interface to expose sub-group functionality,
as defined in the SYCL 2.2 provisional and the OpenCL 2.2 provisional,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenCL 2.2 isn't provisional (could probably also say "OpenCL 2.1" here as that is when subgroups were added to the main spec).

in SYCL 1.2.1.

The extension is only targeting OpenCL devices that expose
`cl_codeplay_basic_subgroups` vendor extension.


## References

[1] SYCL 1.2.1 specification
https://www.khronos.org/registry/SYCL/specs/sycl-1.2.1.pdf

[2] SYCL 2.2 provisional specification (revision date 2016/02/15)
https://www.khronos.org/registry/SYCL/specs/sycl-2.2.pdf

[3] OpenCL 2.2 API specification
https://www.khronos.org/registry/OpenCL/specs/2.2/pdf/OpenCL_API.pdf

[4] OpenCL C++ 1.0 specification
https://www.khronos.org/registry/OpenCL/specs/2.2/pdf/OpenCL_Cxx.pdf

157 changes: 157 additions & 0 deletions sub-groups/sycl-1.2.1/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
# Basic Sub group support
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenCL uses "Sub-group", not "Sub group".

This is on many lines, I won't comment on them all.


This proposal aims to define an interface for using OpenCL 2.2 sub groups in
SYCL the provisional SYCL 1.2.1 specification, relying on the underlying
OpenCL implementation supporting the extension `cl_codeplay_basic_subgroups`.

The extension exposes to programmers the ability to identify sub-groups
on a work-group, count the number of sub-groups available and perform
a broadcast from one work-item on a sub-group to the rest.

Details of the execution and memory model changes can be found in the
documentation for the Codeplay's OpenCL vendor extension `cl_codeplay_basic_subgroups`
once available.

## Execution model

When this vendor extension is available, the execution model of SYCL 1.2.1
is extended to also include support for sub-groups of threads inside of a
work-group.
Overall, these sub-groups work following the description of the OpenCL 2.2
sub-groups, with some restrictions:

* The number of sub-groups available for each work-group is determined
at compile-time and remains the same during the execution of the SYCL application.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not as familiar with the details of the SYCL spec, but looking at it now it seems that current use of "compile-time" means when compiling the SYCL program rather than the SYCL runtime calling clCompileProgram or similar. This information can't be known until clGetKernelSubGroupInfo (or clGetKernelSubGroupInfoCODEPLAY) can be called.

I'm just unclear on the intended meaning of compile-time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compile-time is this unclear: SYCL compile time or OpenCL.
The text after seems to imply SYCL because it sub-group range is constexpr, but this could be made more explicit here.

* The number of threads per sub-group is known at compile-time, and remains the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This use of "threads" seems unusual.

"The sub-group size is known ...`

perhaps?

same during execution of the SYCL application.
* Only those functions defined in this proposal are available.
In particular, there is no sub-group pipe communication.

## Memory model

Sub-groups can access global and local memory, but, given there is no
memory-scope to the atomic or barriers operations in SYCL 1.2.1, there is no
possibility to specify an equivalent of sub-group memory scope.

## Namespace `basic_sub_group`

All new functionality is exposed under the `basic_sub_group` namespace
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There doesn't seem to be any basic_sub_group namespace used in this extension?

in the `codeplay` vendor extension namespace.
When the vendor extension `basic_sub_group` is available, the macro
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenCL subgroup extensions used subgroups in their names (e.g. cl_khr_subgroups).

It may be valid to ignore that given the type cl::sycl::codeplay::sub_group.

`SYCL_CODEPLAY_BASIC_SUB_GROUP` is defined in the header.

### Class `sub_group`

The extension adds a new class template `sub_group` that identifies the
sub group range and the current sub group id.
It also for providing sub group barriers.

```cpp
namespace cl {
namespace sycl {
namespace codeplay {

template <int Dimensions>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that this comes from the SYCL 2.2 provisional specification, but for my own sanity: Dimensions is here as hypothetical future proofing and barring some future vendor extension it will always have the value 1?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is it important that, for example, with a 3D ND-Range that sub-group also has Dimension of 3, even if the actual sizes are always of the form {N, 1, 1}?

class sub_group {
public:

constexpr range<Dimensions> get_sub_group_range() const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this really restrictive, you basically ban compilation for generic targets. But I also understand the need if you know your underlying target. Could we have a in-between solution ? i.e. constexpr iff you know the underlying target and it properties


constexpr size_t get_sub_group_range(int dimension) const;

constexpr size_t get_sub_group_linear_range() const;

id<Dimensions> get_sub_group_id() const;

size_t get_sub_group_id(int dimension) const;

size_t get_sub_group_linear_id() const;

void barrier(access::fence_space accessSpace = access::fence_space::global_and_local) const;

/* T is permitted to be int, unsigned int, long, unsigned long,
float, half, double */
template <typename T>
T broadcast(size_t subGroupId, T value);

/* Predicate must be a callable type which returns bool */
template <typename Predicate>
bool all_of(Predicate predicate) const;

/* Predicate must be a callable type which returns bool */
template <typename Predicate>
bool any_of(Predicate predicate) const;
};

} // namespace codeplay
} // namespace sycl
} // namespace cl
```

## Free functions

```cpp
namespace cl {
namespace sycl {
namespace codeplay {

template <int Dimensions, T>
T broadcast(sub_group<Dimensions> subGroup, size_t subGroupId, T value);

template <int Dimensions, typename Predicate>
bool all_of(sub_group<Dimensions> subGroup, Predicate predicate);

template <int Dimensions, typename Predicate>
bool any_of(sub_group<Dimensions> subGroup, Predicate predicate);

template <int Dimensions>
void barrier(sub_group<Dimensions> subGroup, access::fence_space accessSpace
= access::fence_space::global_and_local) const;

} // namespace codeplay
} // namespace sycl
} // namespace cl
```

## Extensions to the nd\_item class

Extensions to the `nd_item` interface will be exposed via the a derived `nd_item` class template in the `codeplay` vendor extension namespace.

New member function `get_sub_group` for identifying the current sub group and gaining access to sub group operations.

```cpp
namespace cl {
namespace sycl {
namespace codeplay {

template <int Dimensions>
class nd_item : public ::cl::sycl::nd_item<Dimensions> {
public:

sub_group<Dimensions> get_sub_group() const;

};

} // namespace codeplay
} // namespace sycl
} // namespace cl
```

## Example

Below is trivial example showing how you would use `sub_group` to broadcast a value from one work-item within a sub-group to all other work-items in the sub-group.

```cpp
using namespace cl::sycl;

template <typename dimT>
void my_subgroup_load(sub_group<dimT> subG, global_ptr<float> myArray) {

float4 f;
if (subG.get_id() == 0) {
f.load(myArray);
}
barrier(subG, access::fence_space::global_and_local);
float4 res = broadcast(subG, 0, f);
}
```