feat: support prefix caching and chunked prefill for deepseek v32 on mlu. #660

a120092009 · 2026-01-07T06:27:09Z

No description provided.

gemini-code-assist

Code Review

This pull request introduces support for prefix caching and chunked prefill for the deepseek_v32 model on MLU hardware. The changes are well-implemented and include several key improvements. A new reshape_from_cache kernel is added to support gathering KV cache data for chunked prefill. The IndexerImpl is refactored for better clarity and to accommodate the new chunked prefill logic. The data parallelism handling in LLMEngine is made more robust to correctly manage mixed forward types across different ranks. Additionally, safeguards are added to prevent enabling these new features on unsupported model variants, and comprehensive unit tests are included to validate the new functionality. The code quality is high, and the changes appear correct and well-tested.

xllm/models/llm/deepseek_v2.h

xllm/core/distributed_runtime/llm_engine.cpp

xllm/core/layers/common/tests/indexer_tests.cpp

xllm/core/distributed_runtime/llm_engine.cpp

…mlu.

…efill.

a120092009 requested review from DongheJin, JimHsiung, RobbieLeung, XuZhang99, liutongxuan, walsonyang and yq33victor as code owners January 7, 2026 06:27

gemini-code-assist bot reviewed Jan 7, 2026

View reviewed changes

XuZhang99 reviewed Jan 7, 2026

View reviewed changes

xllm/models/llm/deepseek_v2.h Outdated Show resolved Hide resolved

RobbieLeung reviewed Jan 7, 2026

View reviewed changes

xllm/core/distributed_runtime/llm_engine.cpp Outdated Show resolved Hide resolved

XuZhang99 reviewed Jan 7, 2026

View reviewed changes

xllm/core/distributed_runtime/llm_engine.cpp Outdated Show resolved Hide resolved

xllm/core/layers/common/tests/indexer_tests.cpp Outdated Show resolved Hide resolved

xllm/core/layers/common/tests/indexer_tests.cpp Outdated Show resolved Hide resolved

a120092009 force-pushed the fix/dsv32-chunked-prefill branch from d41314f to 38d3e28 Compare January 8, 2026 10:40

XuZhang99 previously approved these changes Jan 8, 2026

View reviewed changes

RobbieLeung previously approved these changes Jan 9, 2026

View reviewed changes

DongheJin reviewed Jan 9, 2026

View reviewed changes

xllm/core/distributed_runtime/llm_engine.cpp Outdated Show resolved Hide resolved

phantomlei3 added 5 commits January 9, 2026 14:59

feat: support prefix caching and chunked prefill for deepseek v32 on …

81dddf7

…mlu.

bugfix: fill in the correct mixed batch forward type under chunked pr…

28c4da5

…efill.

bugfix: correct the fused moe wrong unit expected value.

3793461

refactor: refactor codes based on the review comments.

4fa7ecc

refactor: use less ricky design for chunked prefill bugs.

633671d

a120092009 dismissed stale reviews from RobbieLeung and XuZhang99 via 633671d January 9, 2026 07:01

a120092009 force-pushed the fix/dsv32-chunked-prefill branch from 38d3e28 to 633671d Compare January 9, 2026 07:01

DongheJin approved these changes Jan 9, 2026

View reviewed changes

XuZhang99 approved these changes Jan 9, 2026

View reviewed changes

XuZhang99 merged commit 07a67ab into jd-opensource:main Jan 9, 2026
12 of 15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support prefix caching and chunked prefill for deepseek v32 on mlu. #660

feat: support prefix caching and chunked prefill for deepseek v32 on mlu. #660

Uh oh!

a120092009 commented Jan 7, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

feat: support prefix caching and chunked prefill for deepseek v32 on mlu. #660

feat: support prefix caching and chunked prefill for deepseek v32 on mlu. #660

Uh oh!

Conversation

a120092009 commented Jan 7, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants