TransformerEngine attention

## 🚀 Feature Request

TransformerEngine has advanced Attention kernels, including support for FlashAttention-3 and low-precision kernels.

## Motivation

Having TransformerEngine's Attention as an `attn_impl` option would be super nice due to the additional features for H100 users.

## [Optional] Implementation

Would require some changes in MPT configuration and adding that new Attention layer.

## Additional context

Not yet sure if I am available for the implementation, but wanted to get the request and discussion out there for now. :)

There was a previous PR with a similar proposal here: https://github.com/mosaicml/llm-foundry/pull/803

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TransformerEngine attention #1715

🚀 Feature Request

Motivation

[Optional] Implementation

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TransformerEngine attention #1715

Description

🚀 Feature Request

Motivation

[Optional] Implementation

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions