Skip to content

Conversation

@rjoursler
Copy link
Contributor

@rjoursler rjoursler commented Dec 19, 2025

JIC anyone wants to review this over the holidays. There is still a little bit of work remaining to ensure there are no functional or performance regression, but the core changes are now implemented.

This PR modifies the generation and encoding used by reusable_dispatcher_t to significantly increase flexibility. This increased flexibility is then used to make the reference eltwise kernel 100% completely reusable. The key changes in this PR:

  • Switch to directly encoding expressions for calculations - this enables better compression of the expressions, so that more buffers can be registered, to encode expressions unrelated to buffer offsets (i.e. gws_overflow and gws_in_padding), and to optimize expensive computations (i.e. the addition idiv).
  • Switch buffer offsets computation to be in terms of the outer dimensions i.e. sum(outer_dim_idx * outer_dim_stride) + offset(get_inner_dim()). This enables properly inclining constants associated with blocked layouts, which avoids expensive divisions.

Beyond that, there are a few other (but more minor) optimizations at play

  • Switch named buffer encoding to a bitset. This reduces the overall structure size as we do not need to store buffer names and normalizes the structure layout which could prevent non-determinism due to ordering.
  • Use 32-bit values in the runtime params when possible. This reduces the necessary data transfer to the kernel, and also avoids emulated 64-bit arithmetic.

@rjoursler rjoursler requested a review from a team as a code owner December 19, 2025 20:26
@github-actions github-actions bot added the platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel label Dec 19, 2025
@rjoursler rjoursler changed the title xe: enable completely reusable kernels xe: enable resuable_dispatcher_t to handle all buffer layouts Dec 19, 2025
@rjoursler rjoursler force-pushed the rjoursle/gemmstone_align branch 2 times, most recently from 1f1d521 to 9898aab Compare January 2, 2026 18:56
@rjoursler rjoursler changed the base branch from rjoursle/gemmstone_align to main January 2, 2026 23:17
@rjoursler
Copy link
Contributor Author

make test
disable test_device_cpu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant