[SYCL] split builder and subgroup layering#21773
Draft
koparasy wants to merge 4 commits intointel:syclfrom
Draft
[SYCL] split builder and subgroup layering#21773koparasy wants to merge 4 commits intointel:syclfrom
koparasy wants to merge 4 commits intointel:syclfrom
Conversation
Split general-purpose utility umbrellas into narrow internal headers so users that only need one helper stop paying for unrelated machinery. These change save around 30ms when building `sycl/ext/oneapi/free_function_queries.hpp`, coming from removing 17 header includes (and their transitive dependencies) that were not actually needed for the building of `group.hpp` * detail/assert.hpp: extracted __SYCL_ASSERT macro from common.hpp into its own minimal header; common.hpp now includes it. * detail/loop.hpp: extracted detail::loop / loop_impl from helpers.hpp into a standalone header; retargeted accessor.hpp, group_algorithm.hpp, detail/builtins/builtins.hpp, and source/builtins/host_helper_macros.hpp to include the narrow helper directly. * detail/nd_loop.hpp: extracted NDLoop, NDLoopIterateImpl, and InitializedVal from common.hpp; rewired cg_types.hpp and group.hpp to include nd_loop.hpp rather than the heavier common.hpp. * detail/device_info_types.hpp: moved uuid_type / luid_type out of the broad type_traits.hpp into a dedicated device-info header; included that header from info/info_desc.hpp and relaxed the runtime check in device_impl.hpp to size + trivially-copyable requirements so the move stays source-compatible. * group.hpp: replaced common.hpp / generic_type_traits.hpp / type_traits.hpp / item.hpp with the new narrow headers; added a private convertToOpenCLGroupAsyncCopyPtr helper that inlines the OpenCL pointer-conversion logic without pulling in the full generic conversion machinery. * detail/async_work_group_copy_ptr.hpp: new narrow header providing async_copy_elem_type<T> and convertToOpenCLGroupAsyncCopyPtr. Dependencies are access/access.hpp, fwd/half.hpp, fwd/multi_ptr.hpp, <stdint.h>, <cstddef>, <type_traits> — all already required by any async_work_group_copy caller, so zero transitive cost is added. Uses std::make_signed_t / std::make_unsigned_t instead of hand-rolled fixed-width alias chains. * detail/type_traits/bool_traits.hpp: new narrow header providing is_scalar_bool, is_vector_bool, is_bool, change_base_type_t. Depends only on vec_marray_traits.hpp + <type_traits>, so it does not pull in the heavier type_traits.hpp chain. type_traits.hpp includes it and removes its own duplicate definitions, so existing callers are unaffected. * group.hpp: remove inline group_async_copy_opencl_type family and convertToOpenCLGroupAsyncCopyPtr; include the new header instead. Drop now-unnecessary fwd/half.hpp, <cstddef>, and bfloat16 forward declaration (all moved into the new header). * nd_item.hpp: replace #include <generic_type_traits.hpp> (which pulled in aliases.hpp, bit_cast.hpp, limits) with the new narrow header; replace ConvertToOpenCLType_t + DestT(ptr.get()) pattern with convertToOpenCLGroupAsyncCopyPtr(ptr) at all four call sites. * test/include_deps/deps_known.sh: add sed rule for the unified-runtime/ subdirectory so ur_api.h and ur_api_funcs.def are stripped to bare filenames rather than emitting absolute build paths. * test/include_deps/*.cpp: regenerated all golden files to reflect the updated include graphs and the deps_known.sh fix.
Move Builder out of helpers.hpp into detail/builder.hpp:
- Extract the Builder class and declptr helper into a focused header
that declares only the forward types it actually needs (item, group,
h_item, id, nd_item, range). Device-side SPIR-V built-in access
is kept self-contained via spirv_vars.hpp.
Move SPIR-V fence helpers out of helpers.hpp into
detail/spirv_memory_semantics.hpp:
- getSPIRVMemorySemanticsMask (memory_order and fence_space overloads)
now lives in a header that only pulls spirv_types.hpp, access/access.hpp,
and memory_enums.hpp.
Make detail/helpers.hpp a thin forwarder:
- Include builder.hpp + spirv_memory_semantics.hpp.
- Retain get_or_store<T> and is_power_of_two in-place.
- Drop the forwarding class/enum declarations now in builder.hpp.
- All existing #include <sycl/detail/helpers.hpp> sites continue to
work without modification.
Split sycl/sub_group.hpp into focused detail headers:
- detail/sub_group_core.hpp: sub_group struct with lightweight query
API (get_local_id, get_group_id, leader, etc.) fully inline, plus
forward declarations for the deprecated load/store and barrier
members. Includes only spirv_vars.hpp, access/access.hpp, the narrow
fwd/multi_ptr.hpp forward header, id.hpp, range.hpp, and
memory_enums.hpp. No spirv_ops.hpp, no bit_cast, no generic_type_traits.
- detail/sub_group_extra.hpp: out-of-line definitions of the deprecated
barrier() and barrier(fence_space) methods. Includes spirv_ops.hpp
and spirv_memory_semantics.hpp.
- detail/sub_group_load_store.hpp: the detail::sub_group namespace
block-load/store helpers and the out-of-line definitions for the
deprecated load/store member templates.
- detail/sub_group.hpp: internal aggregator (core + extra + load_store)
for SYCL runtime and extension headers that need the full type.
Make sycl/sub_group.hpp a thin aggregator:
- Include detail/sub_group_core.hpp + detail/sub_group_extra.hpp +
detail/sub_group_load_store.hpp + nd_item.hpp.
- Keep the out-of-line nd_item::get_sub_group() definition here.
Narrow ext/oneapi/free_function_queries.hpp:
- Include detail/builder.hpp and detail/sub_group_core.hpp directly
instead of the heavier group.hpp + sub_group.hpp umbrella includes,
avoiding the load/store machinery for a header whose only sub_group
use is constructing a default sub_group().
Update include_deps tests to reflect the new include graph.
No ABI or API changes: all deprecated sub_group load/store and barrier
members are preserved with the same signatures. Existing
free_function_queries.hpp remain public entry points).
Compile-time impact on ext/oneapi/free_function_queries.hpp
-----------------------------------------------------------
Measured with clang -ftime-trace on a device-only SYCL compilation.
Transitive SYCL headers: 36 -> 32; stdlib headers: 17 -> 10.
Headers removed from the include closure of free_function_queries.hpp:
sycl/sub_group.hpp (replaced by sub_group_core.hpp)
sycl/__spirv/spirv_ops.hpp (was pulled by sub_group load/store)
sycl/detail/generic_type_traits.hpp (was pulled by SelectBlockT)
sycl/detail/address_space_cast.hpp (was pulled by dynamic_address_cast)
sycl/bit_cast.hpp (was pulled by block read/write casting)
+ 7 stdlib headers (utility, limits, initializer_list, and friends)
driven out by the above.
Per-header isolated compile time (device-only, spir64):
Header PR: intel#21762 DEV Delta
free_function_queries.hpp (whole) 109 ms 71 ms -38 ms (-35%)
sycl/sub_group.hpp 107 ms n/a eliminated
sycl/__spirv/spirv_ops.hpp 45 ms n/a eliminated
sycl/detail/generic_type_traits.hpp 50 ms n/a eliminated
sycl/detail/sub_group_core.hpp n/a 52 ms new (replaces sub_group.hpp)
The layering also sets up a clean future deletion path: if/when the
deprecated load/store API is removed, the work reduces to deleting
sub_group_load_store.hpp, sub_group_extra.hpp, and ~18 declaration
lines in sub_group_core.hpp — no surgery on a 671-line monolith.
5bb2d61 to
c9f79a2
Compare
koparasy
added a commit
to koparasy/llvm
that referenced
this pull request
Apr 15, 2026
This change reduces compile-time overhead in SYCL headers by splitting `group.hpp` and `nd_item.hpp` into query-only core headers plus heavier extra layers, and by introducing `access_base.hpp` so lightweight query paths no longer pull in the full `access/access.hpp` umbrella. - Add `detail/group_core.hpp` for the `group` class definition and query-only API. - Add `detail/group_extra.hpp` for heavier functionality: - `private_memory` - `parallel_for_work_item` - async work-group copy helpers and definitions - Convert `sycl/group.hpp` into a thin public aggregator. - Add `detail/nd_item_core.hpp` for the `nd_item` class definition and query-only API. - Add `detail/nd_item_extra.hpp` for heavier functionality: - `get_offset()` - `get_nd_range()` - async work-group copy definitions - wait helpers - root-group helpers - Convert `sycl/nd_item.hpp` into a thin public aggregator. - Add `sycl/access/access_base.hpp` as a lightweight home for: - access enums - `remove_decoration` utilities - Keep `sycl/access/access.hpp` as the heavier umbrella. - Retarget lightweight dependency sites to `access_base.hpp`: - `detail/fwd/multi_ptr.hpp` - `detail/spirv_memory_semantics.hpp` - query-only core headers - Replace the public `nd_item.hpp` dependency with: - `detail/nd_item_core.hpp` - `detail/sub_group_core.hpp` - `detail/builder.hpp` - Keep behavior unchanged while reducing the transitive include graph. - Update `detail/builder.hpp` to match the new stateless `nd_item` core path. - Adjust subgroup-related layering fallout needed by the new header closure. - Regenerate include-deps tests and update the `layout_array.cpp` ABI baseline. Measured with `count_trace_includes.py` against the `intel#21773` baseline. - Transitive headers: 79 → 67 - SYCL headers benchmarked: 32 → 19 - stdlib headers benchmarked: 10 → 10 | Header | Before | After | Delta | |-------------------------------------|--------|--------|------------------| | free_function_queries.hpp (whole) | 73.0 ms| 57.7 ms| -15.3 ms (-21%) | | sycl/nd_item.hpp | 72.0 ms| n/a | eliminated | | sycl/group.hpp | 61.6 ms| n/a | eliminated | | sycl/detail/helpers.hpp | 32.2 ms| n/a | eliminated | | sycl/detail/async_work_group_copy_ptr.hpp | 27.7 ms | n/a | eliminated | | sycl/pointers.hpp | 24.3 ms| n/a | eliminated | | sycl/access/access.hpp | 24.4 ms| n/a | eliminated | | sycl/detail/nd_item_core.hpp | n/a | 55.4 ms| new | | sycl/detail/group_core.hpp | n/a | 53.5 ms| new | | sycl/access/access_base.hpp | n/a | 10.4 ms| new | - `sycl/nd_item.hpp` → replaced by `detail/nd_item_core.hpp` - `sycl/group.hpp` transitively disappears from the query-only path - `sycl/pointers.hpp` - `sycl/device_event.hpp` - `sycl/nd_range.hpp` - `sycl/detail/async_work_group_copy_ptr.hpp` - `sycl/access/access.hpp` - No functional changes intended. - Public `sycl/group.hpp` and `sycl/nd_item.hpp` remain source-compatible. - The remaining visible fallout is limited to header layering and test baseline updates. - The `layout_array.cpp` update reflects record-layout dump output for the now-empty `nd_item` core type, not an intended runtime ABI change. Depends on intel#21773
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This change reduces compile-time overhead in SYCL headers by breaking up
heavy umbrella headers (helpers.hpp, sub_group.hpp) into focused components
and narrowing include dependencies in free_function_queries.hpp.
Refactor helpers.hpp
Split sub_group.hpp
Narrow free_function_queries.hpp
Compile-time impact (device-only, spir64)
Measured with
clang -ftime-trace.Removed headers from free_function_queries.hpp
Notes
but no runtime ABI impact is expected.
Future cleanup
This structure enables clean removal of deprecated load/store APIs by
deleting sub_group_load_store.hpp, sub_group_extra.hpp, and a small
number of declarations in sub_group_core.hpp.
Depends on #21762