xe: enable resuable_dispatcher_t to handle all buffer layouts #4474
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
JIC anyone wants to review this over the holidays. There is still a little bit of work remaining to ensure there are no functional or performance regression, but the core changes are now implemented.
This PR modifies the generation and encoding used by
reusable_dispatcher_tto significantly increase flexibility. This increased flexibility is then used to make the reference eltwise kernel 100% completely reusable. The key changes in this PR:gws_overflowandgws_in_padding), and to optimize expensive computations (i.e. the additionidiv).sum(outer_dim_idx * outer_dim_stride) + offset(get_inner_dim()). This enables properly inclining constants associated with blocked layouts, which avoids expensive divisions.Beyond that, there are a few other (but more minor) optimizations at play