[CK_TILE][FMHA] Integrate FAv2 & FAv3 (WIP) in the single fmha_fwd() API #3153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

poyenc wants to merge 60 commits into ROCm:develop from poyenc:poyenc/integrate-fmha-fwd-v2-v3-apis

+897 −1,456

Contributor

poyenc commented Nov 4, 2025 •

edited

Loading

Proposed changes

This PR merges the two APIs—fmha_fwd() and fmha_fwd_v3()—into a single unified interface. The same script, fmha_fwd.py, is now used to generate two underlying implementation functions: fmha_fwd_v2() and fmha_fwd_v3(). The public API fmha_fwd() conditionally dispatches to fmha_fwd_v3(), although the fmha_fwd_v3() path is temporarily disabled for now (the full implementation is not ready to merge due to compiler issues).

In addition, I redesigned the code-generation logic to allow users to generate multiple dispatcher functions and organize pipelines using appropriate filters.

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

I have added tests relevant to the introduced functionality, and the unit tests are passing locally
I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
I have added inline documentation which enables the maintainers with understanding the motivation
I have removed the stale documentation which is no longer relevant after this pull request
(If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
I have run clang-format on all changed files
Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

poyenc self-assigned this

poyenc requested review from ThomasNing, afagaj, andriy-ca, aosewski, asleepzzz, bartekxk, carlushuang, cgmillette, coderfeli, geyyer, illsilin, qianfengz, shumway, tenpercent and vidyasagar-amd as code owners

November 4, 2025 10:15

poyenc marked this pull request as draft

November 4, 2025 10:15

poyenc changed the title ~~[CK_TILE][FMHA] Integrate FAv2 & FAv3 in fmha_fwd() API~~ [CK_TILE][FMHA] Integrate FAv2 & FAv3 in single fmha_fwd() API

poyenc changed the title ~~[CK_TILE][FMHA] Integrate FAv2 & FAv3 in single fmha_fwd() API~~ [CK_TILE][FMHA] Integrate FAv2 & FAv3 in the single fmha_fwd() API

poyenc added 11 commits

November 12, 2025 00:16


          Let fmha_fwd_v3() compatible with fmha_fwd()

57a47b0


          Decouple get_fwd_blobs() and FmhaFwdKernel

b93a0ad


          Decouple compatibility checks from get_fwd_blobs()

6e46366


          Extract product feature checks out from get_fwd_blobs()

756a1b8


          Remove duplicated code in factories and redundant checks

4c5a68e


          Remove FmhaFwdKernel<>::GetName()

41cd25b


          Let FmhaFwdApiPool support pipelines with different mask_impl

3e0ad2c


          Add tile setting for fmha fwd v3 pipeline

4e6153b


          Add fwd v3 instances to tile_example_fmha_fwd manually

6eaa880


          Remove unused function import

d6a99c2


          Undo irrelevant changes

76b2bc0

poyenc added 7 commits

November 13, 2025 00:53


          Add type constraints to make_tile_window()

291cea6


          Remove fmha fwd v3 example

f4d92f1


          Fix wrong product(aiter mha_fwd()) config

2df5019


          Merge branch 'develop' into poyenc/integrate-fmha-fwd-v2-v3-apis

66a874a


          Fix wrong fmha fwd v2/v3 selection logic

1df098d


          Merge branch 'poyenc/integrate-fmha-fwd-v2-v3-apis' of github.com:poy…

0d0a25b

…enc/composable_kernel into poyenc/integrate-fmha-fwd-v2-v3-apis


          Merge branch 'develop' into poyenc/integrate-fmha-fwd-v2-v3-apis

68fd415

poyenc marked this pull request as ready for review

November 16, 2025 09:47

poyenc added 4 commits

November 17, 2025 00:50


          Fix formatting

8e0d9dd


          Merge branch 'poyenc/integrate-fmha-fwd-v2-v3-apis' of github.com:poy…

51c30ba

…enc/composable_kernel into poyenc/integrate-fmha-fwd-v2-v3-apis


          Merge branch 'develop' into poyenc/integrate-fmha-fwd-v2-v3-apis

d33691d


          Merge branch 'develop' into poyenc/integrate-fmha-fwd-v2-v3-apis

72ad9d7

asleepzzz previously approved these changes

View reviewed changes

poyenc and others added 6 commits

November 20, 2025 17:11


          Merge branch 'develop' into poyenc/integrate-fmha-fwd-v2-v3-apis

d0730ba


          Merge branch 'develop' into poyenc/integrate-fmha-fwd-v2-v3-apis

13aee99


          Merge branch 'develop' into poyenc/integrate-fmha-fwd-v2-v3-apis

7ba44fd


          Merge branch 'develop' into poyenc/integrate-fmha-fwd-v2-v3-apis

9c5364d


          Merge branch 'develop' into poyenc/integrate-fmha-fwd-v2-v3-apis

615e4b8


          Merge branch 'develop' into poyenc/integrate-fmha-fwd-v2-v3-apis

poyenc changed the title ~~[CK_TILE][FMHA] Integrate FAv2 & FAv3 in the single fmha_fwd() API~~ [CK_TILE][FMHA] Integrate FAv2 & FAv3 (WIP) in the single fmha_fwd() API

qianfengz reviewed

View reviewed changes

example/ck_tile/01_fmha/codegen/ops/fmha_fwd.py Outdated

Comment on lines 745 to 750

    
                  if short_circuit:

                      for rule in rules:

                          if not rule(problem_ctx, kernel_ctx):

                              return False

                      return True

                  return all(rule(problem_ctx, kernel_ctx) for rule in rules)

Contributor

qianfengz Nov 25, 2025

Is there any real difference between the short_circuit path and all(rule(...)) path ?

Contributor Author

poyenc Dec 1, 2025 •

edited

Loading

There should have no difference because I didn't create a new list as function argument. I'll remove the short_circuit path

example/ck_tile/01_fmha/codegen/ops/fmha_fwd.py

Comment on lines +862 to +868

    
                          is_v3_dedicated_tile = (

                              kernel_ctx.tile.F_bm0 == 256

                              and (kernel_ctx.tile.F_rm0 * kernel_ctx.tile.F_rn0 * kernel_ctx.tile.F_rk0) == 8

                              and (kernel_ctx.tile.F_rm1 * kernel_ctx.tile.F_rn1 * kernel_ctx.tile.F_rk1) == 8

                          )  # fmt: skip

                          is_v3_pipeline = kernel_ctx.pipeline.tag == "qr_async_trload_v3"

                          return is_v3_dedicated_tile == is_v3_pipeline

Contributor

qianfengz Nov 25, 2025

This is not a rule to restrict the problem_ctx and kernel_ctx, can the rule be solved by adding restrictions when constructing the kernel_ctx space ?

example/ck_tile/01_fmha/codegen/ops/fmha_fwd.py

Comment on lines +824 to +825

    
                                      (problem_ctx.hdim, problem_ctx.hdim_v) != (128, 128)

                                      and kernel_ctx.tile.F_bm0 != 128

Contributor

qianfengz Nov 25, 2025

This restriction makes no sense! (bm0=64 should be able to be used with other hdim other 128)

Contributor Author

poyenc Dec 1, 2025

This is a pre-existing check logics. If you'd remove it, we can create another PR to the purpose.

example/ck_tile/01_fmha/codegen/ops/fmha_fwd.py

Comment on lines +781 to +787

    
                          if (problem_ctx.hdim, problem_ctx.hdim_v) == (192, 128):

                              if (

                                  kernel_ctx.pipeline.F_bias != "no"

                                  or kernel_ctx.pipeline.F_dropout == "t"

                              ):

                                  False

                          return True

Contributor

qianfengz Nov 25, 2025

This rule makes no sense! Whether a pipeline can use bias or dropout should have nothing to do with hdim sizes

Contributor Author

poyenc Dec 1, 2025

This is a pre-existing check logics for the qr_async_trload pipeline. If you'd remove it, we can create another PR to the purpose.

example/ck_tile/01_fmha/codegen/ops/fmha_fwd.py

Comment on lines +793 to +799

    
                          if not (

                              (

                                  kernel_ctx.pipeline.F_logits == "t"

                                  and kernel_ctx.pipeline.F_bias == "no"

                              )

                              or kernel_ctx.pipeline.F_logits == "f"

                          ):

Contributor

qianfengz Nov 25, 2025

Can this rule be solved inside the kernel_ctx space since it does not involve problem_ctx ?

Contributor Author

poyenc Dec 1, 2025

That would be another type of check that only consider the kernel_ctx attributes. We can seperate it if we later encounter more checks like this

qianfengz reviewed

View reviewed changes

example/ck_tile/01_fmha/codegen/ops/fmha_fwd.py

    
              template<>

              float fmha_fwd_<trait_{F_idx}, {F_arch.tag}>(const ck_tile::stream_config& s, fmha_fwd_args a)

              float fmha_fwd_<trait, {F_arch.tag}>(const ck_tile::stream_config& s, fmha_fwd_args a)

Contributor

qianfengz Nov 26, 2025

Do we really need trait as a template of fmha_fwd_

Contributor Author

poyenc Dec 1, 2025

Yes. By current design, we use the trait as a instance key to differentiate each template instantiations

Contributor Author

poyenc commented Dec 1, 2025 •

edited

Loading

Need to resolve the conflicts.


          Merge branch 'develop' into poyenc/integrate-fmha-fwd-v2-v3-apis

f8ae943

poyenc dismissed asleepzzz’s stale review via

f8ae943

December 3, 2025 06:01

poyenc added 5 commits

December 3, 2025 01:23


          Add comment to warning v3 kernel users

cf1f135


          Fix wrong codegen logics

608a253


          Remove unnecessary param

02ed663


          Fix format

0e29033


          Merge branch 'develop' into poyenc/integrate-fmha-fwd-v2-v3-apis

5e1f431

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

qianfengz qianfengz left review comments

asleepzzz asleepzzz left review comments

illsilin Awaiting requested review from illsilin illsilin is a code owner

carlushuang Awaiting requested review from carlushuang carlushuang is a code owner

aosewski Awaiting requested review from aosewski aosewski is a code owner

geyyer Awaiting requested review from geyyer geyyer is a code owner

bartekxk Awaiting requested review from bartekxk bartekxk is a code owner

andriy-ca Awaiting requested review from andriy-ca andriy-ca is a code owner

afagaj Awaiting requested review from afagaj afagaj is a code owner

tenpercent Awaiting requested review from tenpercent tenpercent is a code owner

ThomasNing Awaiting requested review from ThomasNing ThomasNing is a code owner

coderfeli Awaiting requested review from coderfeli coderfeli is a code owner

cgmillette Awaiting requested review from cgmillette cgmillette is a code owner

shumway Awaiting requested review from shumway shumway is a code owner

vidyasagar-amd Awaiting requested review from vidyasagar-amd vidyasagar-amd is a code owner

At least 1 approving review is required to merge this pull request.

Labels

None yet