Skip to content

feat: support Kimi Linear model#1047

Merged
aolemila merged 5 commits into
mainfrom
feat/support_kimi_linear_model
May 14, 2026
Merged

feat: support Kimi Linear model#1047
aolemila merged 5 commits into
mainfrom
feat/support_kimi_linear_model

Conversation

@aolemila
Copy link
Copy Markdown
Collaborator

@aolemila aolemila commented May 9, 2026

Summary

  • Ports PR [WIP] Epic/support kimi linear #985 Kimi Linear model/config target files into feat/support_kimi_linear_model.
  • Adds KimiLinearConfig and KimiLinearForCausalLM model wiring.
  • Updates model return payloads for the current memory pool interface.

Relates to #1046

Test plan

  • python3 -m py_compile python/sgl_jax/srt/configs/kimi_linear.py python/sgl_jax/srt/models/*.py
  • Targeted Kimi config/model tests checked; no source test file is present in this branch.

🤖 Generated with Claude Code

Accuracy Alignment

The following result is copied from #1072. You can see more reproduction details in #1072.

E2E test results

UTC 2026-05-13 12:04:28 → 2026-05-13 20:10:03, wall **8h02m45s** (eval first prefill at 12:07:18; pod TZ = UTC).

+---------+-----------+-----------------+------------------+-------+---------+---------+
| Model   | Dataset   | Metric          | Subset           |   Num |   Score | Cat.0   |
+=========+===========+=================+==================+=======+=========+=========+
|         | mmlu_pro  | AverageAccuracy | computer science |   410 |  0.6854 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | math             |  1351 |  0.7809 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | chemistry        |  1132 |  0.7323 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | engineering      |   969 |  0.5387 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | law              |  1101 |  0.4124 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | biology          |   717 |  0.795  | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | health           |   818 |  0.6064 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | physics          |  1299 |  0.7236 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | business         |   789 |  0.7148 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | philosophy       |   499 |  0.5832 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | economics        |   844 |  0.718  | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | other            |   924 |  0.6039 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | psychology       |   798 |  0.7043 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | history          |   381 |  0.5669 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | OVERALL          | 12032 |  0.6602 | -       |
+---------+-----------+-----------------+------------------+-------+---------+---------+

@gemini-code-assist
Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@aolemila aolemila force-pushed the feat/support_kimi_linear_model branch from 8a30a3d to fde29f4 Compare May 9, 2026 12:04
@aolemila aolemila changed the title feat: support Kimi Linear model [WIP] feat: support Kimi Linear model May 9, 2026
@MokusMokun
Copy link
Copy Markdown
Contributor

MokusMokun commented May 13, 2026

E2E test result with MMLU-pro

TP=4 DP=4 -> 0.6574

+---------+-----------+-----------------+------------------+-------+---------+---------+
| Model   | Dataset   | Metric          | Subset           |   Num |   Score | Cat.0   |
+=========+===========+=================+==================+=======+=========+=========+
|         | mmlu_pro  | AverageAccuracy | computer science |   410 |  0.7049 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | math             |  1351 |  0.7898 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | chemistry        |  1132 |  0.7341 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | engineering      |   969 |  0.5108 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | law              |  1101 |  0.3951 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | biology          |   717 |  0.7852 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | health           |   818 |  0.6308 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | physics          |  1299 |  0.7121 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | business         |   789 |  0.7262 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | philosophy       |   499 |  0.5752 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | economics        |   844 |  0.7145 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | other            |   924 |  0.6147 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | psychology       |   798 |  0.698  | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | history          |   381 |  0.5276 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | OVERALL          | 12032 |  0.6574 | -       |
+---------+-----------+-----------------+------------------+-------+---------+---------+

TP=4 DP=2 -> 0.662

+---------+-----------+-----------------+------------------+-------+---------+---------+
| Model   | Dataset   | Metric          | Subset           |   Num |   Score | Cat.0   |
+=========+===========+=================+==================+=======+=========+=========+
|         | mmlu_pro  | AverageAccuracy | computer science |   410 |  0.6878 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | math             |  1351 |  0.7868 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | chemistry        |  1132 |  0.727  | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | engineering      |   969 |  0.547  | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | law              |  1101 |  0.3996 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | biology          |   717 |  0.7866 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | health           |   818 |  0.6112 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | physics          |  1299 |  0.7344 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | business         |   789 |  0.725  | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | philosophy       |   499 |  0.5731 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | economics        |   844 |  0.7192 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | other            |   924 |  0.6169 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | psychology       |   798 |  0.7055 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | history          |   381 |  0.5617 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | OVERALL          | 12032 |  0.6622 | -       |
+---------+-----------+-----------------+------------------+-------+---------+---------+

TP=4 DP=1 -> 0.6622

+---------+-----------+-----------------+------------------+-------+---------+---------+
| Model   | Dataset   | Metric          | Subset           |   Num |   Score | Cat.0   |
+=========+===========+=================+==================+=======+=========+=========+
|         | mmlu_pro  | AverageAccuracy | computer science |   410 |  0.6878 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | math             |  1351 |  0.7868 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | chemistry        |  1132 |  0.727  | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | engineering      |   969 |  0.547  | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | law              |  1101 |  0.3996 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | biology          |   717 |  0.7866 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | health           |   818 |  0.6112 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | physics          |  1299 |  0.7344 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | business         |   789 |  0.725  | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | philosophy       |   499 |  0.5731 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | economics        |   844 |  0.7192 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | other            |   924 |  0.6169 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | psychology       |   798 |  0.7055 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | history          |   381 |  0.5617 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | OVERALL          | 12032 |  0.6622 | -       |
+---------+-----------+-----------------+------------------+-------+---------+---------+

@aolemila aolemila force-pushed the feat/support_kimi_linear_model branch from 3fa1179 to a5e498d Compare May 13, 2026 09:54
aolemila and others added 3 commits May 14, 2026 11:55
…1072)

Squash of the local kda-e2e branch onto origin/feat/support_kimi_linear_model:
- KDA dummy-slot pollution guard (set_ssm_state, set_conv_state) — without
  this, DP runs (tp4dp4) collapse from ~0.66 to ~0.27 OVERALL on mmlu_pro.
- HybridLinearAttnBackend.attn_backend_wrapper builds real KDA sub-backend
  (upstream stub returned full_attn_backend unchanged → server crash).
- ModelRunner.linear_recurrent_config detects KimiLinearConfig by hf_config's
  linear_attn_config attribute (upstream property was a stub returning None).
- compilation_manager dummy batch fills recurrent_indices/has_initial_state
  only when has_recurrent_state is set, so non-recurrent backends are
  unaffected (CompilationManager grows a has_recurrent_state flag, plumbed
  from tp_worker via model_runner.linear_recurrent_config).
- gated_rmsnorm helper (used by KimiLinear).

HybridLinearAttnBackend.__call__ kept upstream-clean (no pool kwarg aliasing).

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
@aolemila aolemila changed the title [WIP] feat: support Kimi Linear model feat: support Kimi Linear model May 14, 2026
@aolemila aolemila force-pushed the feat/support_kimi_linear_model branch from 13e6792 to e374cde Compare May 14, 2026 03:56
aolemila and others added 2 commits May 14, 2026 14:02
KDAAttnBackend.__call__ takes `recurrent_state_pool=`, while
RadixLinearAttention.__call__ passes `pool=` (HybridLinearAttnBackend's
calling convention). Production routes through that wrapper which
translates pool→recurrent_state_pool; the unit tests bypass it by
assigning a raw KDAAttnBackend as `forward_batch.attn_backend`, so the
kwarg falls into **kwargs and `recurrent_state_pool` is unbound →
TypeError. Replicate the translation in a test-only shim.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Both KDA test files were carrying an identical copy of the `pool=` →
`recurrent_state_pool=` translation shim added in 466afff. Move it to
test_utils.py and import from both, dropping the local underscore prefix
since it's now a shared helper.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@aolemila aolemila requested a review from JamesBrianD May 14, 2026 07:22
@aolemila aolemila merged commit 2c5d860 into main May 14, 2026
21 checks passed
@aolemila aolemila deleted the feat/support_kimi_linear_model branch May 14, 2026 08:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants