feat: Add BF16 float-accumulator TensorOp epilogue specialization by lyuxy-infra · Pull Request #3287 · NVIDIA/cutlass

lyuxy-infra · 2026-06-01T08:07:39Z

Summary

This PR adds a DefaultIteratorsTensorOp<bfloat16_t, float, 8, ...> specialization for TensorOp epilogues.

CUTLASS already has a half_t, float, 8 specialization that uses TileIteratorTensorOpMixed and SharedLoadIteratorMixed to optimize mixed-precision epilogues with FP32 accumulators and 16-bit outputs. BF16 output with FP32 accumulators has the same 32-bit accumulator / 16-bit output / 8-elements-per-access structure, but currently falls back to the generic iterator path.

This patch mirrors the existing half_t, float, 8 specialization for bfloat16_t, float, 8.

Motivation

For mixed-precision TensorOp epilogues where accumulators are FP32 and outputs are 16-bit, the mixed iterator path uses a shared-memory layout designed to avoid bank conflicts. BF16 output should be able to use the same iterator structure as FP16 output.

Changes

Add DefaultIteratorsTensorOp<bfloat16_t, float, 8, ...>
Use TileIteratorTensorOpMixed<..., float, 32, 16, 8, 8>
Use SharedLoadIteratorMixed<..., float, 32, 16, 8, 8>
Set kFragmentsPerIteration = 2, matching the existing half_t, float, 8 specialization

Notes

This does not change the output operator, numerical conversion, or GEMM mainloop. It only changes the epilogue shared-memory staging iterator selection for this BF16 mixed-precision case.

Add BF16 float-accumulator TensorOp epilogue specialization

e7e198d

lyuxy-infra changed the title ~~Add BF16 float-accumulator TensorOp epilogue specialization~~ feat: Add BF16 float-accumulator TensorOp epilogue specialization Jun 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add BF16 float-accumulator TensorOp epilogue specialization#3287

feat: Add BF16 float-accumulator TensorOp epilogue specialization#3287
lyuxy-infra wants to merge 1 commit into
NVIDIA:mainfrom
lyuxy-infra:bf16-float-epilogue-specialization

lyuxy-infra commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lyuxy-infra commented Jun 1, 2026

Summary

Motivation

Changes

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant