feat: support Kimi Linear model#1047
Merged
Merged
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
8a30a3d to
fde29f4
Compare
Contributor
E2E test result with MMLU-pro
|
3 tasks
3fa1179 to
a5e498d
Compare
…1072) Squash of the local kda-e2e branch onto origin/feat/support_kimi_linear_model: - KDA dummy-slot pollution guard (set_ssm_state, set_conv_state) — without this, DP runs (tp4dp4) collapse from ~0.66 to ~0.27 OVERALL on mmlu_pro. - HybridLinearAttnBackend.attn_backend_wrapper builds real KDA sub-backend (upstream stub returned full_attn_backend unchanged → server crash). - ModelRunner.linear_recurrent_config detects KimiLinearConfig by hf_config's linear_attn_config attribute (upstream property was a stub returning None). - compilation_manager dummy batch fills recurrent_indices/has_initial_state only when has_recurrent_state is set, so non-recurrent backends are unaffected (CompilationManager grows a has_recurrent_state flag, plumbed from tp_worker via model_runner.linear_recurrent_config). - gated_rmsnorm helper (used by KimiLinear). HybridLinearAttnBackend.__call__ kept upstream-clean (no pool kwarg aliasing). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
13e6792 to
e374cde
Compare
KDAAttnBackend.__call__ takes `recurrent_state_pool=`, while RadixLinearAttention.__call__ passes `pool=` (HybridLinearAttnBackend's calling convention). Production routes through that wrapper which translates pool→recurrent_state_pool; the unit tests bypass it by assigning a raw KDAAttnBackend as `forward_batch.attn_backend`, so the kwarg falls into **kwargs and `recurrent_state_pool` is unbound → TypeError. Replicate the translation in a test-only shim. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Both KDA test files were carrying an identical copy of the `pool=` → `recurrent_state_pool=` translation shim added in 466afff. Move it to test_utils.py and import from both, dropping the local underscore prefix since it's now a shared helper. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
JamesBrianD
approved these changes
May 14, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
feat/support_kimi_linear_model.KimiLinearConfigandKimiLinearForCausalLMmodel wiring.Relates to #1046
Test plan
python3 -m py_compile python/sgl_jax/srt/configs/kimi_linear.py python/sgl_jax/srt/models/*.py🤖 Generated with Claude Code
Accuracy Alignment
The following result is copied from #1072. You can see more reproduction details in #1072.