Skip to content

[Feature] improve random_sample#359

Merged
liwei109 merged 2 commits into
baidu:mainfrom
yuejun-baidu:fix-sampler
May 19, 2026
Merged

[Feature] improve random_sample#359
liwei109 merged 2 commits into
baidu:mainfrom
yuejun-baidu:fix-sampler

Conversation

@yuejun-baidu
Copy link
Copy Markdown
Contributor

@yuejun-baidu yuejun-baidu commented May 18, 2026

Signed-off-by: yuejun yuejun@baidu.com

PR Description

  1. improve random_sample with xspeedgate.ops
  2. remove useless code in vllm_kunlun/v1/attention/backends/kunlun_attn.py
  3. fix kunlun/intel platform confliction in vllm_kunlun/ops/fla/utils.py

Checklist (Required)

Before submitting this PR, please ensure that all the following items are completed:

  • All code changes pass the pre-commit checks.
  • Commits are signed off using git commit -s.
  • The PR title is properly classified (see below).

PR Type

Please prefix the PR title with one or more of the following labels to help reviewers quickly understand the nature of the change:

  • [Feature] – New features or enhancements (e.g. Attention, Communicator, Kernel, Worker, etc.)
  • [Bugfix] – Bug fixes
  • [CI/Build] – CI, build system, or infrastructure improvements
  • [Doc] – Documentation updates or fixes
  • [Misc] – Other changes that do not fit the above categories (use sparingly)

Note: If the PR spans multiple categories, include all relevant prefixes.


Detailed Checklist (Click to Expand)

Thank you for contributing to vLLM Kunlun! To help us maintain high code quality and streamline the review process, please ensure your PR meets the following requirements.

1. Code Quality

  • All linting and formatting checks pass (pre-commit).
  • The code is well-structured and sufficiently documented.
  • The change is designed with maintainability and readability in mind.

2. Testing

  • Relevant unit tests are added or updated.
  • Integration tests are included when applicable.
  • Existing tests continue to pass.

3. DCO Compliance

This project follows the Developer Certificate of Origin (DCO).

  • All commits include a Signed-off-by: line.
  • Use git commit -s to automatically add the sign-off.

4. Review Expectations

During the review process, maintainers may:

  • Request code refactoring or additional tests.
  • Ask for clarifications on design decisions.
  • Suggest performance, stability, or maintainability improvements.

We appreciate your patience and collaboration throughout the review process!

Signed-off-by: yuejun <yuejun@baidu.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Small performance/cleanup PR that swaps the in-place exponential sampling in random_sample to use the xspeedgate_ops.inplace_exponential custom op, removes a dead seq_start_loc computation in the Kunlun attention backend, and corrects the FLA platform mapping so xpu resolves to kunlun instead of intel.

Changes:

  • Replace q[i].exponential_(generator=...) loop with torch.ops.xspeedgate_ops.inplace_exponential to speed up random sampling.
  • Drop unused seq_start_loc/seq_start_loc_tensor construction in KunlunAttentionMetadataBuilder.build.
  • Update _check_platform mapping/return type so xpu maps to "kunlun" (replacing prior "intel" and dropping "musa").

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
vllm_kunlun/v1/sample/ops/topk_topp_sampler.py Use xspeedgate inplace exponential op in the per-generator path.
vllm_kunlun/v1/attention/backends/kunlun_attn.py Remove dead seq_start_loc tensor construction.
vllm_kunlun/ops/fla/utils.py Map xpukunlun (drop intel/musa) in _check_platform.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread vllm_kunlun/v1/attention/backends/kunlun_attn.py
Signed-off-by: yuejun <yuejun@baidu.com>
@liwei109 liwei109 merged commit 4cc631d into baidu:main May 19, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants