Wmma support for gemm_bias_add_reduce #3316

EnricoDeg · 2025-11-27T09:13:31Z

Proposed changes

Summary:

Change EpilogueReduceCShuffle to support bias + add operations before reduction (multiple Ds)
Add wmma device struct for gemm_bias_add_reduce
Add instances (xdl parity)
Add tests

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

I have added tests relevant to the introduced functionality, and the unit tests are passing locally
I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
I have added inline documentation which enables the maintainers with understanding the motivation
I have removed the stale documentation which is no longer relevant after this pull request
(If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
I have run clang-format on all changed files
Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

krithalith · 2025-11-28T15:28:40Z

profiler/include/profiler/profile_gemm_bias_add_reduce_impl.hpp

+
+void add_device_gemm_bias_add_mean_squaremean_wmma_cshuffle_f16_f16_f16_f16_f16_f32_f32_km_nk_mn_instances(
+    std::vector<DeviceGemmBiasAddReduceNoOpPtr>&);
+#endif


Can we not just use the instance factory get_device_gemm_add_add_mean_squaremean_instances() here instead of manually using the add_device_xxx_instances() functions?

krithalith · 2025-11-28T15:40:45Z

In the profiler impl we have:

std::size_t num_byte = sizeof(ADataType) * M * K + sizeof(BDataType) * K * N +
                       sizeof(CDataType) * M * N + sizeof(BiasDataType) * M * N +
                       sizeof(D0DataType) * M * N + sizeof(ReduceDataType) * M +
                       sizeof(ReduceDataType) * M;

But I thought the Bias was a simple 1D vector of size N?

wj-laskowski · 2025-12-03T08:20:57Z

include/ck/tensor_operation/gpu/grid/epilogue_cshuffle_v3_reduce_wmma.hpp

+                        make_tuple(I0, I0, I0, I0),
+                        c01_thread_buf);
+
+                    // c = c + c1_functior(c1)


wj-laskowski · 2025-12-03T08:34:04Z

test/gemm_bias_add_reduce/test_gemm_common.hpp

+
+    public:
+    static constexpr bool verify_     = true;
+    static constexpr int init_method_ = 1; // decimal value initialization


nit: this is int value initialization

EnricoDeg requested review from a team, ThomasNing, afagaj, andriy-ca, aosewski, asleepzzz, bartekxk, carlushuang, cgmillette, coderfeli, geyyer, illsilin, poyenc, qianfengz, shumway, tenpercent and vidyasagar-amd as code owners November 27, 2025 09:13

EnricoDeg marked this pull request as draft November 27, 2025 09:13

krithalith assigned krithalith and wj-laskowski Nov 27, 2025

EnricoDeg added the organization: streamhpc label Nov 27, 2025

krithalith requested review from krithalith and wj-laskowski November 27, 2025 13:27

EnricoDeg assigned EnricoDeg and unassigned krithalith and wj-laskowski Nov 27, 2025

EnricoDeg marked this pull request as ready for review November 27, 2025 13:53

krithalith reviewed Nov 28, 2025

View reviewed changes

wj-laskowski reviewed Dec 3, 2025

View reviewed changes

EnricoDeg added 7 commits December 3, 2025 10:39

Add tests for gemm_bias_add_reduce

22706ea

Initial working implementation

92a42df

Generalize implementation of reduce epilogue

e5d3cf0

Add tests for all layouts

cdf40e8

Add instances

0436ae6

Fix test archs

60884a3

Fix xdl bug

7cd0696

EnricoDeg force-pushed the streamhpc/gemm_bias_add_reduce_wmma branch from 90fa9e7 to 7cd0696 Compare December 3, 2025 10:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Wmma support for gemm_bias_add_reduce #3316

Wmma support for gemm_bias_add_reduce #3316

Uh oh!

EnricoDeg commented Nov 27, 2025

Uh oh!

krithalith Nov 28, 2025

Uh oh!

krithalith commented Nov 28, 2025

Uh oh!

wj-laskowski Dec 3, 2025

Uh oh!

wj-laskowski Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Wmma support for gemm_bias_add_reduce #3316

Are you sure you want to change the base?

Wmma support for gemm_bias_add_reduce #3316

Uh oh!

Conversation

EnricoDeg commented Nov 27, 2025

Proposed changes

Checklist

Discussion

Uh oh!

krithalith Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

krithalith commented Nov 28, 2025

Uh oh!

wj-laskowski Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

wj-laskowski Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants