Skip to content

Conversation

@Potabk
Copy link
Collaborator

@Potabk Potabk commented Dec 14, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

  1. fix [v1] Add PrefixLM support to FlexAttention backend vllm#27938
  2. fix [Model][6/N] Improve all pooling task | Support chunked prefill with ALL pooling vllm#27145
    pooling models now supports chunked prefill and prefix caching,
  3. fix [Model] Move multimodal_cpu_fields definition to field config vllm#30181
    define the CPU fields in the field config where they really belong.
  4. fix [Core][MM] Add mechanism to configure multimodal fields which should stay on CPU vllm#28168
    define the CPU fields in the field config where they really belong.
  5. fix kv_transfer: Rename the shared storage connectors vllm#30201
    some moudle rename
  6. fix [MoE][Refactor] Make select_experts a non-static method vllm#29067
    fusedmoe moudle refactor
  7. fix [MoE][Refactor] Remove most arguments to FusedMoEMethodBase.apply vllm#29066
    fusedmoe moudle refactor
  8. fix [Attention] Make seq_lens_cpu optional in CommonAttentionMetadata to enable true async spec-decode vllm#29624

How was this patch tested?

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request primarily focuses on upgrading the vLLM hash and ensuring compatibility with the new version. The changes are well-structured, involving API adaptations, refactoring for backward compatibility (e.g., expert_map handling), and introducing version-conditional logic, particularly for vLLM v0.12.0. Overall, the changes appear correct and necessary for the upgrade. I've identified one minor issue in a test file where a model is duplicated, which should be addressed to avoid redundant test runs.

@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@Potabk Potabk added ready read for review ready-for-test start test by label for PR labels Dec 15, 2025
@Potabk Potabk requested a review from wangxiyuan December 15, 2025 07:46
@Potabk
Copy link
Collaborator Author

Potabk commented Dec 15, 2025

follow up:1. qwen3-next refactor; 2. npu_model_runner get_attn_backend remove

@Potabk
Copy link
Collaborator Author

Potabk commented Dec 15, 2025

Signed-off-by: wangli <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: wangli <[email protected]>
@wangxiyuan wangxiyuan merged commit 8d2998d into vllm-project:main Dec 15, 2025
20 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build documentation Improvements or additions to documentation module:core module:ops module:tests ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants