Skip to content

Conversation

@a120092009
Copy link
Contributor

No description provided.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for prefix caching and chunked prefill for the deepseek_v32 model on MLU hardware. The changes are well-implemented and include several key improvements. A new reshape_from_cache kernel is added to support gathering KV cache data for chunked prefill. The IndexerImpl is refactored for better clarity and to accommodate the new chunked prefill logic. The data parallelism handling in LLMEngine is made more robust to correctly manage mixed forward types across different ranks. Additionally, safeguards are added to prevent enabling these new features on unsupported model variants, and comprehensive unit tests are included to validate the new functionality. The code quality is high, and the changes appear correct and well-tested.

@a120092009 a120092009 force-pushed the fix/dsv32-chunked-prefill branch from d41314f to 38d3e28 Compare January 8, 2026 10:40
XuZhang99
XuZhang99 previously approved these changes Jan 8, 2026
RobbieLeung
RobbieLeung previously approved these changes Jan 9, 2026
@a120092009 a120092009 dismissed stale reviews from RobbieLeung and XuZhang99 via 633671d January 9, 2026 07:01
@a120092009 a120092009 force-pushed the fix/dsv32-chunked-prefill branch from 38d3e28 to 633671d Compare January 9, 2026 07:01
@XuZhang99 XuZhang99 merged commit 07a67ab into jd-opensource:main Jan 9, 2026
12 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants