-
Notifications
You must be signed in to change notification settings - Fork 661
rmsaddbias #5012
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
rmsaddbias #5012
Conversation
Signed-off-by: zxwang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a Triton kernel for RMS normalization to support bias addition. The implementation contains a critical bug in the kernel launch configuration that would lead to incorrect computations for inputs with more than 48 rows. I've also pointed out another potential issue with passing None to the kernel which could lead to a runtime error. I've provided suggestions to fix these issues.
vllm_ascend/ops/layernorm.py
Outdated
| # _, num_vectorcore = get_device_properties() | ||
| num_vectorcore = 48 | ||
| grid = (M if M < num_vectorcore else num_vectorcore,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The grid for launching the Triton kernel is incorrectly calculated. It's capped at a hardcoded num_vectorcore value of 48. The rms_norm_fwd_kernel processes one row per program instance. With the current grid calculation, if the input tensor has more than 48 rows, only the first 48 rows will be processed, leading to incorrect output for the remaining rows. This is a critical correctness bug.
To process all M rows, the grid size must be (M,). The hardcoded num_vectorcore and the logic that uses it should be removed.
| # _, num_vectorcore = get_device_properties() | |
| num_vectorcore = 48 | |
| grid = (M if M < num_vectorcore else num_vectorcore,) | |
| grid = (M,) |
vllm_ascend/ops/layernorm.py
Outdated
| residual.stride(0) if residual is not None else None, | ||
| residual_out.stride(0) if residual is not None else None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When residual is None, None is passed to the stride_z_row and stride_z_out_row arguments of the Triton kernel. Triton's JIT compiler expects numerical values for non-pointer arguments and will likely raise a TypeError when it receives None. You should pass a dummy integer value like 0 instead. These arguments are not used when residual is None because of the HAS_Z conditional, so 0 is a safe value.
| residual.stride(0) if residual is not None else None, | |
| residual_out.stride(0) if residual is not None else None, | |
| residual.stride(0) if residual is not None else 0, | |
| residual_out.stride(0) if residual is not None else 0, |
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
Signed-off-by: zxwang <[email protected]>
What this PR does / why we need it?
Does this PR introduce any user-facing change?
How was this patch tested?