Skip to content

TransformerEngine attention #1715

Open
Open
@janEbert

Description

@janEbert

🚀 Feature Request

TransformerEngine has advanced Attention kernels, including support for FlashAttention-3 and low-precision kernels.

Motivation

Having TransformerEngine's Attention as an attn_impl option would be super nice due to the additional features for H100 users.

[Optional] Implementation

Would require some changes in MPT configuration and adding that new Attention layer.

Additional context

Not yet sure if I am available for the implementation, but wanted to get the request and discussion out there for now. :)

There was a previous PR with a similar proposal here: #803

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions