Skip to content

Kimi K2 attention and MLP tests#3240

Open
ssaliceTT wants to merge 10 commits intomainfrom
ssalice/kimi-attn-mlp
Open

Kimi K2 attention and MLP tests#3240
ssaliceTT wants to merge 10 commits intomainfrom
ssalice/kimi-attn-mlp

Conversation

@ssaliceTT
Copy link
Contributor

@ssaliceTT ssaliceTT commented Feb 10, 2026

Ticket

tt-xla #2954

Problem description

As a part of ongoing effort to run Kimi K2, attention and MLP tests needed to be added.

What's changed

Test running the attention module of Kimi K2 have been added to run on nightly.

Tests running the MLP of Kimi K2 have been added to run on nightly.
MLP tests are parametrized on MLP type, as it has regular MLP as well as MoE.
MoE is currently xfailed: fails due to a Dynamo compilation error.

Checklist

  • New/Existing tests provide coverage for changes

@codecov-commenter
Copy link

codecov-commenter commented Feb 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 28.33%. Comparing base (aecd257) to head (fcde0ee).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3240   +/-   ##
=======================================
  Coverage   28.33%   28.33%           
=======================================
  Files          33       33           
  Lines        4094     4094           
=======================================
  Hits         1160     1160           
  Misses       2934     2934           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

… single device and TP. Need to test MoE. Also, cleaned up attention test a bit.
… Added seq_len = 32 hardcoding for testing MoE for now. Also, kimi k2 MoE needs to be set to .eval() manually due to some training thing in the model.
… L1 cache error. Xfailed deepseek MoE due to ttnn.sort op not supporting float32. Xfailed kimi k2 MoE for dynamo error.
…h shard for Kimi MLP. Fails due to collect_permute op. Not sure why. Attention tests work fine with Aleks's shard specs
@ssaliceTT ssaliceTT force-pushed the ssalice/kimi-attn-mlp branch from 2e3b912 to fcde0ee Compare February 12, 2026 14:02

# Override for single layer testing
config.num_hidden_layers = 1
config.use_cache = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we want to use cache?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants