Skip to content

support kimi 2.5/6 full param and lora#1057

Open
nanjiangwill wants to merge 1 commit intoradixark:mainfrom
nanjiangwill:kimi25
Open

support kimi 2.5/6 full param and lora#1057
nanjiangwill wants to merge 1 commit intoradixark:mainfrom
nanjiangwill:kimi25

Conversation

@nanjiangwill
Copy link
Copy Markdown

@nanjiangwill nanjiangwill commented Apr 30, 2026

megatron-bridge patch from commit 3fd3768045422d0aa5c97e90a4e6c659aea9acb9
sglang patch from commit bb9223d7c51fa66092b3bcae566ef4ecff309dc6

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for Kimi VL and Kimi K2.5 models, including specialized weight conversion logic and multimodal token expansion for training. It significantly enhances the LoRA weight synchronization mechanism by implementing IPC staging buffers for faster transfers and a chunked streaming approach to handle large adapters with SGLang. Additionally, it adds support for shared-outer grouped-expert LoRA and updates various utility functions and quantization scripts to accommodate vision-tower and projector components. Feedback was provided to improve the robustness of rollout batch processing by using strict zipping.

expanded_total_lengths = []
expanded_response_lengths = []

for i, (token_tensor, loss_mask_tensor) in enumerate(zip(tokens, loss_masks, strict=False)):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using strict=True in zip is safer here to ensure that the number of token tensors and loss mask tensors match exactly, which is expected for a valid rollout batch.

Suggested change
for i, (token_tensor, loss_mask_tensor) in enumerate(zip(tokens, loss_masks, strict=False)):
for i, (token_tensor, loss_mask_tensor) in enumerate(zip(tokens, loss_masks, strict=True)):

@yushengsu-thu yushengsu-thu self-assigned this Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants