[WIP] fused sdpa training #4498

syurkevi · 2026-01-06T22:42:55Z

Description

This is a WIP PR implementing a fused sdpa training kernel. The current bottleneck seems to be the individual gemm ukernels which with their current setup are ~3-5 slower than the primitives based gemms of the same size. Tuning allows for a large variation in runtime, however certain tile sizes seem to be inaccessible due to hardcoded FMA values in microkernel_provider.cpp
Ongoing work is focusing on improving gemmstone strategies to more closely match the performance of standalone gemms. Once those are closer the benefits of the fused kernel can be better realized.

syurkevi added 13 commits September 5, 2025 14:39

sdpa: add bwd tests and primitives

49d6a15

xe: sdpa: add workspace intermediates for training

5e10e80

xe: sdpa: working bwd atomic kernel

8734913

xe: sdpa: atomics w/additional MM

064dd75

xe: sdpa: introduce combined logsumexp

e6f6e8b

xe: sdpa: split conf between fwd/bwd

ea07311

xe: sdpa: split Di kernel, share dA SLM

6e5114f

xe: sdpa: clang-format18

91482f6

xe: sdpa: add sg_tile support to micro_bwd

a155308

xe: sdpa: separate preprocess and bwd launch config

fa9fad1

xe: sdpa: enable tile tuning for multiple bwd gemms

0de4840

xe: sdpa: zero dQ for atomics

2291ed9

xe: sdpa: reduce SMEM usage

4dfec42

syurkevi requested review from a team as code owners January 6, 2026 22:42

github-actions bot added platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel component:tests Codeowner: @oneapi-src/onednn-arch component:common labels Jan 6, 2026

syurkevi marked this pull request as draft January 6, 2026 22:44

syurkevi added 3 commits January 7, 2026 15:59

xe: sdpa: add batching to micro_bwd

e0449c7

xe: sdpa: move dv,dk to slm

891656d

xe: sdpa: bwd remainder handling

adf7f0e

syurkevi force-pushed the syurkevi/fused_sdpa_training branch from 2bcb1f5 to adf7f0e Compare January 10, 2026 06:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] fused sdpa training #4498

[WIP] fused sdpa training #4498

Uh oh!

syurkevi commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[WIP] fused sdpa training #4498

Are you sure you want to change the base?

[WIP] fused sdpa training #4498

Uh oh!

Conversation

syurkevi commented Jan 6, 2026

Description

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant