Skip to content

Fix BERT and ViT perf#822

Open
vkovacevicTT wants to merge 3 commits intomainfrom
vkovacevic/attn_implementation
Open

Fix BERT and ViT perf#822
vkovacevicTT wants to merge 3 commits intomainfrom
vkovacevic/attn_implementation

Conversation

@vkovacevicTT
Copy link
Contributor

@vkovacevicTT vkovacevicTT commented Jan 19, 2026

Closes #812
Closes #813
Closes #814

Long-term solution #821

Description

After transformers uplift 4.52.4 -> 4.57.1 we had a significant perf drop in ViT, BGE-M3-Encode and BERT for sentence embedding.

What's changed

Uplifted third_part/tt_forge_models to include changes that allow passing **kwargs when loading model.
Set attn_implementation="eager" for ViT and BERT.

BGE-M3-Encode is specific and requires monkey patching, it is skipped for now.

Copy link
Collaborator

@odjuricicTT odjuricicTT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's explore if there is a cleaner way to do this first.



# TODO(vkovacevic): Issue #804
def patch_transformers_for_eager_attn(cls):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the only way? Is it not possible to pass this as a param somewhere when loading the model?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ViT and BERT we could pass attn_implementation="eager" in tt_forge_models here.

For BGE-m3 I think we need to monkey patch since loading is done internally in FlagEmbedding lib.

@vkovacevicTT vkovacevicTT force-pushed the vkovacevic/attn_implementation branch from 3e99519 to 568f090 Compare February 3, 2026 15:05
@vkovacevicTT vkovacevicTT changed the title Set attn_implementation="eager" for ViT, BERT and BGE-M3 Fix BERT and ViT perf Feb 3, 2026
@vkovacevicTT
Copy link
Contributor Author

vkovacevicTT commented Feb 3, 2026

@vkovacevicTT
Copy link
Contributor Author

Updated, as discussed in offline discussion @odjuricicTT

@vkovacevicTT vkovacevicTT force-pushed the vkovacevic/attn_implementation branch from 383079f to b4cf87f Compare February 4, 2026 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bge_m3_encode perf regression ViT perf regression BERT perf regression

5 participants