feat: support Kimi Linear model by aolemila · Pull Request #1047 · sgl-project/sglang-jax

aolemila · 2026-05-09T11:03:30Z

Summary

Ports PR [WIP] Epic/support kimi linear #985 Kimi Linear model/config target files into feat/support_kimi_linear_model.
Adds KimiLinearConfig and KimiLinearForCausalLM model wiring.
Updates model return payloads for the current memory pool interface.

Relates to #1046

Test plan

python3 -m py_compile python/sgl_jax/srt/configs/kimi_linear.py python/sgl_jax/srt/models/*.py
Targeted Kimi config/model tests checked; no source test file is present in this branch.

🤖 Generated with Claude Code

Accuracy Alignment

The following result is copied from #1072. You can see more reproduction details in #1072.

E2E test results

UTC 2026-05-13 12:04:28 → 2026-05-13 20:10:03, wall **8h02m45s** (eval first prefill at 12:07:18; pod TZ = UTC).

+---------+-----------+-----------------+------------------+-------+---------+---------+
| Model   | Dataset   | Metric          | Subset           |   Num |   Score | Cat.0   |
+=========+===========+=================+==================+=======+=========+=========+
|         | mmlu_pro  | AverageAccuracy | computer science |   410 |  0.6854 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | math             |  1351 |  0.7809 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | chemistry        |  1132 |  0.7323 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | engineering      |   969 |  0.5387 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | law              |  1101 |  0.4124 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | biology          |   717 |  0.795  | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | health           |   818 |  0.6064 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | physics          |  1299 |  0.7236 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | business         |   789 |  0.7148 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | philosophy       |   499 |  0.5832 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | economics        |   844 |  0.718  | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | other            |   924 |  0.6039 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | psychology       |   798 |  0.7043 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | history          |   381 |  0.5669 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | OVERALL          | 12032 |  0.6602 | -       |
+---------+-----------+-----------------+------------------+-------+---------+---------+

gemini-code-assist · 2026-05-09T11:03:33Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

MokusMokun · 2026-05-13T04:08:11Z

E2E test result with MMLU-pro

using KDA components by feat(kda): add KDA attention components #1051

`TP=4 DP=4` -> `0.6574`

+---------+-----------+-----------------+------------------+-------+---------+---------+
| Model   | Dataset   | Metric          | Subset           |   Num |   Score | Cat.0   |
+=========+===========+=================+==================+=======+=========+=========+
|         | mmlu_pro  | AverageAccuracy | computer science |   410 |  0.7049 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | math             |  1351 |  0.7898 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | chemistry        |  1132 |  0.7341 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | engineering      |   969 |  0.5108 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | law              |  1101 |  0.3951 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | biology          |   717 |  0.7852 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | health           |   818 |  0.6308 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | physics          |  1299 |  0.7121 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | business         |   789 |  0.7262 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | philosophy       |   499 |  0.5752 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | economics        |   844 |  0.7145 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | other            |   924 |  0.6147 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | psychology       |   798 |  0.698  | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | history          |   381 |  0.5276 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | OVERALL          | 12032 |  0.6574 | -       |
+---------+-----------+-----------------+------------------+-------+---------+---------+

`TP=4 DP=2` -> `0.662`

+---------+-----------+-----------------+------------------+-------+---------+---------+
| Model   | Dataset   | Metric          | Subset           |   Num |   Score | Cat.0   |
+=========+===========+=================+==================+=======+=========+=========+
|         | mmlu_pro  | AverageAccuracy | computer science |   410 |  0.6878 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | math             |  1351 |  0.7868 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | chemistry        |  1132 |  0.727  | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | engineering      |   969 |  0.547  | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | law              |  1101 |  0.3996 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | biology          |   717 |  0.7866 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | health           |   818 |  0.6112 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | physics          |  1299 |  0.7344 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | business         |   789 |  0.725  | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | philosophy       |   499 |  0.5731 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | economics        |   844 |  0.7192 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | other            |   924 |  0.6169 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | psychology       |   798 |  0.7055 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | history          |   381 |  0.5617 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | OVERALL          | 12032 |  0.6622 | -       |
+---------+-----------+-----------------+------------------+-------+---------+---------+

`TP=4 DP=1` -> `0.6622`

+---------+-----------+-----------------+------------------+-------+---------+---------+
| Model   | Dataset   | Metric          | Subset           |   Num |   Score | Cat.0   |
+=========+===========+=================+==================+=======+=========+=========+
|         | mmlu_pro  | AverageAccuracy | computer science |   410 |  0.6878 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | math             |  1351 |  0.7868 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | chemistry        |  1132 |  0.727  | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | engineering      |   969 |  0.547  | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | law              |  1101 |  0.3996 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | biology          |   717 |  0.7866 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | health           |   818 |  0.6112 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | physics          |  1299 |  0.7344 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | business         |   789 |  0.725  | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | philosophy       |   499 |  0.5731 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | economics        |   844 |  0.7192 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | other            |   924 |  0.6169 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | psychology       |   798 |  0.7055 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | history          |   381 |  0.5617 | default |
+---------+-----------+-----------------+------------------+-------+---------+---------+
|         | mmlu_pro  | AverageAccuracy | OVERALL          | 12032 |  0.6622 | -       |
+---------+-----------+-----------------+------------------+-------+---------+---------+

…1072) Squash of the local kda-e2e branch onto origin/feat/support_kimi_linear_model: - KDA dummy-slot pollution guard (set_ssm_state, set_conv_state) — without this, DP runs (tp4dp4) collapse from ~0.66 to ~0.27 OVERALL on mmlu_pro. - HybridLinearAttnBackend.attn_backend_wrapper builds real KDA sub-backend (upstream stub returned full_attn_backend unchanged → server crash). - ModelRunner.linear_recurrent_config detects KimiLinearConfig by hf_config's linear_attn_config attribute (upstream property was a stub returning None). - compilation_manager dummy batch fills recurrent_indices/has_initial_state only when has_recurrent_state is set, so non-recurrent backends are unaffected (CompilationManager grows a has_recurrent_state flag, plumbed from tp_worker via model_runner.linear_recurrent_config). - gated_rmsnorm helper (used by KimiLinear). HybridLinearAttnBackend.__call__ kept upstream-clean (no pool kwarg aliasing). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

KDAAttnBackend.__call__ takes `recurrent_state_pool=`, while RadixLinearAttention.__call__ passes `pool=` (HybridLinearAttnBackend's calling convention). Production routes through that wrapper which translates pool→recurrent_state_pool; the unit tests bypass it by assigning a raw KDAAttnBackend as `forward_batch.attn_backend`, so the kwarg falls into **kwargs and `recurrent_state_pool` is unbound → TypeError. Replicate the translation in a test-only shim. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Both KDA test files were carrying an identical copy of the `pool=` → `recurrent_state_pool=` translation shim added in 466afff. Move it to test_utils.py and import from both, dropping the local underscore prefix since it's now a shared helper. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

aolemila force-pushed the feat/support_kimi_linear_model branch from 8a30a3d to fde29f4 Compare May 9, 2026 12:04

aolemila changed the title ~~feat: support Kimi Linear model~~ [WIP] feat: support Kimi Linear model May 9, 2026

MokusMokun mentioned this pull request May 13, 2026

feat(kda): add KDA attention components #1051

Merged

3 tasks

aolemila force-pushed the feat/support_kimi_linear_model branch from 3fa1179 to a5e498d Compare May 13, 2026 09:54

aolemila and others added 3 commits May 14, 2026 11:55

add KimiLinearForCausalLM. Co-authored-by: zhengke.zhou.dev@gmail.com

30bc309

fix lint

7de169b

aolemila changed the title ~~[WIP] feat: support Kimi Linear model~~ feat: support Kimi Linear model May 14, 2026

aolemila force-pushed the feat/support_kimi_linear_model branch from 13e6792 to e374cde Compare May 14, 2026 03:56

aolemila and others added 2 commits May 14, 2026 14:02

aolemila requested a review from JamesBrianD May 14, 2026 07:22

JamesBrianD approved these changes May 14, 2026

View reviewed changes

aolemila merged commit 2c5d860 into main May 14, 2026
21 checks passed

aolemila deleted the feat/support_kimi_linear_model branch May 14, 2026 08:40

MokusMokun mentioned this pull request May 14, 2026

[Feature] Support KDA (Kimi Delta Attention) for Kimi-Linear #948

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support Kimi Linear model#1047

feat: support Kimi Linear model#1047
aolemila merged 5 commits into
mainfrom
feat/support_kimi_linear_model

aolemila commented May 9, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented May 9, 2026

Uh oh!

MokusMokun commented May 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

aolemila commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Accuracy Alignment

Uh oh!

gemini-code-assist Bot commented May 9, 2026

Uh oh!

MokusMokun commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E test result with MMLU-pro

TP=4 DP=4 -> 0.6574

TP=4 DP=2 -> 0.662

TP=4 DP=1 -> 0.6622

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aolemila commented May 9, 2026 •

edited

Loading

MokusMokun commented May 13, 2026 •

edited

Loading

`TP=4 DP=4` -> `0.6574`

`TP=4 DP=2` -> `0.662`

`TP=4 DP=1` -> `0.6622`