Skip to content

Conversation

@Toneymiller
Copy link

@Toneymiller Toneymiller commented Dec 15, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Signed-off-by: zxwang <[email protected]>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a Triton kernel for RMS normalization to support bias addition. The implementation contains a critical bug in the kernel launch configuration that would lead to incorrect computations for inputs with more than 48 rows. I've also pointed out another potential issue with passing None to the kernel which could lead to a runtime error. I've provided suggestions to fix these issues.

Comment on lines 113 to 115
# _, num_vectorcore = get_device_properties()
num_vectorcore = 48
grid = (M if M < num_vectorcore else num_vectorcore,)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The grid for launching the Triton kernel is incorrectly calculated. It's capped at a hardcoded num_vectorcore value of 48. The rms_norm_fwd_kernel processes one row per program instance. With the current grid calculation, if the input tensor has more than 48 rows, only the first 48 rows will be processed, leading to incorrect output for the remaining rows. This is a critical correctness bug.

To process all M rows, the grid size must be (M,). The hardcoded num_vectorcore and the logic that uses it should be removed.

Suggested change
# _, num_vectorcore = get_device_properties()
num_vectorcore = 48
grid = (M if M < num_vectorcore else num_vectorcore,)
grid = (M,)

Comment on lines 126 to 127
residual.stride(0) if residual is not None else None,
residual_out.stride(0) if residual is not None else None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

When residual is None, None is passed to the stride_z_row and stride_z_out_row arguments of the Triton kernel. Triton's JIT compiler expects numerical values for non-pointer arguments and will likely raise a TypeError when it receives None. You should pass a dummy integer value like 0 instead. These arguments are not used when residual is None because of the HAS_Z conditional, so 0 is a safe value.

Suggested change
residual.stride(0) if residual is not None else None,
residual_out.stride(0) if residual is not None else None,
residual.stride(0) if residual is not None else 0,
residual_out.stride(0) if residual is not None else 0,

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant