Skip to content

[Bugfix] Fixed glm-5 sparse-attn bugs#357

Merged
liwei109 merged 1 commit into
baidu:releases/v0.11.0from
KinChow:glm5-fix-bug-2
May 19, 2026
Merged

[Bugfix] Fixed glm-5 sparse-attn bugs#357
liwei109 merged 1 commit into
baidu:releases/v0.11.0from
KinChow:glm5-fix-bug-2

Conversation

@KinChow
Copy link
Copy Markdown

@KinChow KinChow commented May 14, 2026

  • Chunked prefill: use kv_lod for sparse attention causal masking on multi-turn conversations where kv_len != q_len
  • Contiguous: fix non-contiguous kv_cache view in int8_paged_mqa_logits causing wrong block-address calculation when block_id > 0

(cherry picked from commit cdade49)

PR Description

FIX #xxxx


Checklist (Required)

Before submitting this PR, please ensure that all the following items are completed:

  • All code changes pass the pre-commit checks.
  • Commits are signed off using git commit -s.
  • The PR title is properly classified (see below).

PR Type

Please prefix the PR title with one or more of the following labels to help reviewers quickly understand the nature of the change:

  • [Feature] – New features or enhancements (e.g. Attention, Communicator, Kernel, Worker, etc.)
  • [Bugfix] – Bug fixes
  • [CI/Build] – CI, build system, or infrastructure improvements
  • [Doc] – Documentation updates or fixes
  • [Misc] – Other changes that do not fit the above categories (use sparingly)

Note: If the PR spans multiple categories, include all relevant prefixes.


Detailed Checklist (Click to Expand)

Thank you for contributing to vLLM Kunlun! To help us maintain high code quality and streamline the review process, please ensure your PR meets the following requirements.

1. Code Quality

  • All linting and formatting checks pass (pre-commit).
  • The code is well-structured and sufficiently documented.
  • The change is designed with maintainability and readability in mind.

2. Testing

  • Relevant unit tests are added or updated.
  • Integration tests are included when applicable.
  • Existing tests continue to pass.

3. DCO Compliance

This project follows the Developer Certificate of Origin (DCO).

  • All commits include a Signed-off-by: line.
  • Use git commit -s to automatically add the sign-off.

4. Review Expectations

During the review process, maintainers may:

  • Request code refactoring or additional tests.
  • Ask for clarifications on design decisions.
  • Suggest performance, stability, or maintainability improvements.

We appreciate your patience and collaboration throughout the review process!

@KinChow KinChow changed the title Glm5 fix bug 2 Glm5 fix bug May 14, 2026
@KinChow KinChow changed the title Glm5 fix bug [Bugfix] Fixed glm-5 sparse-attn bugs May 14, 2026
- Chunked prefill: use kv_lod for sparse attention causal masking on multi-turn
  conversations where kv_len != q_len
- Contiguous: fix non-contiguous kv_cache view in int8_paged_mqa_logits causing
  wrong block-address calculation when block_id > 0

(cherry picked from commit cdade49)
Signed-off-by: zhouzijian01 <zhouzijian01@baidu.com>
@liwei109 liwei109 merged commit 9316e11 into baidu:releases/v0.11.0 May 19, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants