Skip to content

[vLLM IR] 3/N fused_add_rms_norm and maybe_inplace#36823

Draft
ProExpertProg wants to merge 4 commits intoluka/vllm-ir/rms-norm-batch-invariantfrom
luka/vllm-ir/rms-norm-inplace
Draft

[vLLM IR] 3/N fused_add_rms_norm and maybe_inplace#36823
ProExpertProg wants to merge 4 commits intoluka/vllm-ir/rms-norm-batch-invariantfrom
luka/vllm-ir/rms-norm-inplace

Conversation

@ProExpertProg
Copy link
Collaborator

@ProExpertProg ProExpertProg commented Mar 11, 2026

Purpose

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify bot added the nvidia label Mar 11, 2026
@mergify
Copy link

mergify bot commented Mar 11, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ProExpertProg.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Luka Govedič <lgovedic@redhat.com>
@ProExpertProg ProExpertProg force-pushed the luka/vllm-ir/rms-norm-inplace branch from 4b47060 to 837d6f3 Compare March 11, 2026 21:38
@mergify mergify bot removed the needs-rebase label Mar 11, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a maybe_inplace mechanism to the vLLM IR, allowing for performance optimizations through in-place operations while maintaining functional semantics for default op calls. It also adds a new fused_add_rms_norm op that leverages this new capability. The changes are extensive, touching the IR definition, compiler passes, and kernel implementations. My review has identified a critical safety issue where a potential misuse of an in-place operation only triggers a warning instead of an error, and a high-severity issue related to an incomplete compiler pass (CloneCleanupPass) that is being added.

Comment on lines +69 to +73
logger.warning(
"Node %s (input to %s) has another use", arg, node
)
# TODO raise error, this is undefined behavior, which should not be allowed.
# Users can just use the default overload if they want to keep activation inputs untouched.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The check for other users of an activation input that is about to be modified in-place currently only logs a warning. This can lead to silent correctness issues if the input tensor is used elsewhere after being modified. As the TODO comment suggests, this should raise an error to prevent such undefined behavior. Allowing compilation to proceed with a warning could introduce hard-to-debug errors downstream.

Suggested change
logger.warning(
"Node %s (input to %s) has another use", arg, node
)
# TODO raise error, this is undefined behavior, which should not be allowed.
# Users can just use the default overload if they want to keep activation inputs untouched.
raise ValueError(
f"Node {arg} (input to {node}) has another use in {user}. "
f"Using maybe_inplace on an input with multiple users is not allowed. "
f"Use the default overload if you want to keep activation inputs untouched."
)

Comment on lines +169 to +172
continue # TODO
node.replace_all_uses_with(node.args[0])
graph.erase_node(node)
count += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The CloneCleanupPass is added to the pass manager but its implementation is currently a no-op due to the continue # TODO statement. The logic to remove clone nodes is commented out. Merging incomplete or placeholder code can lead to confusion and makes it unclear if the feature is intended to be active. The pass should either be fully implemented or removed from the pass manager until it's ready.

Suggested change
continue # TODO
node.replace_all_uses_with(node.args[0])
graph.erase_node(node)
count += 1
# A clone is safe to remove if its input has no other users.
# This is a conservative check. A more sophisticated analysis
# could trace back to the `maybe_inplace` call and its metadata.
if len(node.args[0].users) == 1:
node.replace_all_uses_with(node.args[0])
graph.erase_node(node)
count += 1

Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
@ProExpertProg ProExpertProg force-pushed the luka/vllm-ir/rms-norm-inplace branch from 837d6f3 to d5e968e Compare March 11, 2026 21:44
@ProExpertProg ProExpertProg added torch.compile vllm-ir vLLM IR: intermediate representation and kernel registration labels Mar 11, 2026
@ProExpertProg ProExpertProg changed the title Draft [vLLM IR] 3/N fused_add_rms_norm and maybe_inplace [vLLM IR] 3/N fused_add_rms_norm and maybe_inplace Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

nvidia torch.compile vllm-ir vLLM IR: intermediate representation and kernel registration

Projects

Status: No status
Status: To triage

Development

Successfully merging this pull request may close these issues.

1 participant