Skip to content

Conversation

@poyenc
Copy link
Contributor

@poyenc poyenc commented Nov 4, 2025

Proposed changes

This PR merges the two APIs—fmha_fwd() and fmha_fwd_v3()—into a single unified interface. The same script, fmha_fwd.py, is now used to generate two underlying implementation functions: fmha_fwd_v2() and fmha_fwd_v3(). The public API fmha_fwd() conditionally dispatches to fmha_fwd_v3(), although the fmha_fwd_v3() path is temporarily disabled for now (the full implementation is not ready to merge due to compiler issues).

In addition, I redesigned the code-generation logic to allow users to generate multiple dispatcher functions and organize pipelines using appropriate filters.

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

  • I have added tests relevant to the introduced functionality, and the unit tests are passing locally
  • I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
  • I have added inline documentation which enables the maintainers with understanding the motivation
  • I have removed the stale documentation which is no longer relevant after this pull request
  • (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
  • I have run clang-format on all changed files
  • Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

@poyenc poyenc self-assigned this Nov 4, 2025
@poyenc poyenc marked this pull request as draft November 4, 2025 10:15
@poyenc poyenc changed the title [CK_TILE][FMHA] Integrate FAv2 & FAv3 in fmha_fwd() API [CK_TILE][FMHA] Integrate FAv2 & FAv3 in single fmha_fwd() API Nov 4, 2025
@poyenc poyenc changed the title [CK_TILE][FMHA] Integrate FAv2 & FAv3 in single fmha_fwd() API [CK_TILE][FMHA] Integrate FAv2 & FAv3 in the single fmha_fwd() API Nov 4, 2025
@poyenc poyenc marked this pull request as ready for review November 16, 2025 09:47
asleepzzz
asleepzzz previously approved these changes Nov 20, 2025
@poyenc poyenc changed the title [CK_TILE][FMHA] Integrate FAv2 & FAv3 in the single fmha_fwd() API [CK_TILE][FMHA] Integrate FAv2 & FAv3 (WIP) in the single fmha_fwd() API Nov 24, 2025
Comment on lines 745 to 750
if short_circuit:
for rule in rules:
if not rule(problem_ctx, kernel_ctx):
return False
return True
return all(rule(problem_ctx, kernel_ctx) for rule in rules)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any real difference between the short_circuit path and all(rule(...)) path ?

Copy link
Contributor Author

@poyenc poyenc Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should have no difference because I didn't create a new list as function argument. I'll remove the short_circuit path

Comment on lines +862 to +868
is_v3_dedicated_tile = (
kernel_ctx.tile.F_bm0 == 256
and (kernel_ctx.tile.F_rm0 * kernel_ctx.tile.F_rn0 * kernel_ctx.tile.F_rk0) == 8
and (kernel_ctx.tile.F_rm1 * kernel_ctx.tile.F_rn1 * kernel_ctx.tile.F_rk1) == 8
) # fmt: skip
is_v3_pipeline = kernel_ctx.pipeline.tag == "qr_async_trload_v3"
return is_v3_dedicated_tile == is_v3_pipeline
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a rule to restrict the problem_ctx and kernel_ctx, can the rule be solved by adding restrictions when constructing the kernel_ctx space ?

Comment on lines +824 to +825
(problem_ctx.hdim, problem_ctx.hdim_v) != (128, 128)
and kernel_ctx.tile.F_bm0 != 128
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This restriction makes no sense! (bm0=64 should be able to be used with other hdim other 128)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pre-existing check logics. If you'd remove it, we can create another PR to the purpose.

Comment on lines +781 to +787
if (problem_ctx.hdim, problem_ctx.hdim_v) == (192, 128):
if (
kernel_ctx.pipeline.F_bias != "no"
or kernel_ctx.pipeline.F_dropout == "t"
):
False
return True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rule makes no sense! Whether a pipeline can use bias or dropout should have nothing to do with hdim sizes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pre-existing check logics for the qr_async_trload pipeline. If you'd remove it, we can create another PR to the purpose.

Comment on lines +793 to +799
if not (
(
kernel_ctx.pipeline.F_logits == "t"
and kernel_ctx.pipeline.F_bias == "no"
)
or kernel_ctx.pipeline.F_logits == "f"
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this rule be solved inside the kernel_ctx space since it does not involve problem_ctx ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be another type of check that only consider the kernel_ctx attributes. We can seperate it if we later encounter more checks like this

template<>
float fmha_fwd_<trait_{F_idx}, {F_arch.tag}>(const ck_tile::stream_config& s, fmha_fwd_args a)
float fmha_fwd_<trait, {F_arch.tag}>(const ck_tile::stream_config& s, fmha_fwd_args a)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need trait as a template of fmha_fwd_

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. By current design, we use the trait as a instance key to differentiate each template instantiations

@poyenc
Copy link
Contributor Author

poyenc commented Dec 1, 2025

Need to resolve the conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants