[vLLM IR] 3/N fused_add_rms_norm and maybe_inplace by ProExpertProg · Pull Request #36823 · vllm-project/vllm

ProExpertProg · 2026-03-11T21:35:45Z

Purpose

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

mergify · 2026-03-11T21:36:36Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ProExpertProg.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Luka Govedič <lgovedic@redhat.com>

gemini-code-assist

Code Review

This pull request introduces a maybe_inplace mechanism to the vLLM IR, allowing for performance optimizations through in-place operations while maintaining functional semantics for default op calls. It also adds a new fused_add_rms_norm op that leverages this new capability. The changes are extensive, touching the IR definition, compiler passes, and kernel implementations. My review has identified a critical safety issue where a potential misuse of an in-place operation only triggers a warning instead of an error, and a high-severity issue related to an incomplete compiler pass (CloneCleanupPass) that is being added.

gemini-code-assist · 2026-03-11T21:39:21Z

vllm/compilation/passes/ir/inplace_raising.py

+                        logger.warning(
+                            "Node %s (input to %s) has another use", arg, node
+                        )
+                        # TODO raise error, this is undefined behavior, which should not be allowed.
+                        #  Users can just use the default overload if they want to keep activation inputs untouched.


The check for other users of an activation input that is about to be modified in-place currently only logs a warning. This can lead to silent correctness issues if the input tensor is used elsewhere after being modified. As the TODO comment suggests, this should raise an error to prevent such undefined behavior. Allowing compilation to proceed with a warning could introduce hard-to-debug errors downstream.

Suggested change

logger.warning(

"Node %s (input to %s) has another use", arg, node

)

# TODO raise error, this is undefined behavior, which should not be allowed.

# Users can just use the default overload if they want to keep activation inputs untouched.

raise ValueError(

f"Node {arg} (input to {node}) has another use in {user}. "

f"Using maybe_inplace on an input with multiple users is not allowed. "

f"Use the default overload if you want to keep activation inputs untouched."

)

gemini-code-assist · 2026-03-11T21:39:21Z

vllm/compilation/passes/ir/lowering_pass.py

+            continue  # TODO
+            node.replace_all_uses_with(node.args[0])
+            graph.erase_node(node)
+            count += 1


The CloneCleanupPass is added to the pass manager but its implementation is currently a no-op due to the continue # TODO statement. The logic to remove clone nodes is commented out. Merging incomplete or placeholder code can lead to confusion and makes it unclear if the feature is intended to be active. The pass should either be fully implemented or removed from the pass manager until it's ready.

Suggested change

continue # TODO

node.replace_all_uses_with(node.args[0])

graph.erase_node(node)

count += 1

# A clone is safe to remove if its input has no other users.

# This is a conservative check. A more sophisticated analysis

# could trace back to the `maybe_inplace` call and its metadata.

if len(node.args[0].users) == 1:

node.replace_all_uses_with(node.args[0])

graph.erase_node(node)

count += 1

Signed-off-by: Luka Govedič <lgovedic@redhat.com>

mergify bot added the nvidia label Mar 11, 2026

mergify bot added the needs-rebase label Mar 11, 2026

github-project-automation bot added this to NVIDIA Mar 11, 2026

Add maybe_inplace overload and tests

e5bac45

Signed-off-by: Luka Govedič <lgovedic@redhat.com>

ProExpertProg force-pushed the luka/vllm-ir/rms-norm-inplace branch from 4b47060 to 837d6f3 Compare March 11, 2026 21:38

mergify bot removed the needs-rebase label Mar 11, 2026

gemini-code-assist bot reviewed Mar 11, 2026

View reviewed changes

ProExpertProg added 3 commits March 11, 2026 17:44

fused_add_rms_norm, raising pass, fix lowering for inplace impls

26ba4ad

Signed-off-by: Luka Govedič <lgovedic@redhat.com>

remove aiter rmsnorm registrations

7cf5683

Signed-off-by: Luka Govedič <lgovedic@redhat.com>

tmp maybe_inplace

d5e968e

ProExpertProg force-pushed the luka/vllm-ir/rms-norm-inplace branch from 837d6f3 to d5e968e Compare March 11, 2026 21:44

ProExpertProg mentioned this pull request Mar 11, 2026

[vLLM IR] 3/N fused_add_rms_norm and maybe_inplace #34068

Closed

5 tasks

ProExpertProg added torch.compile vllm-ir vLLM IR: intermediate representation and kernel registration labels Mar 11, 2026

github-project-automation bot added this to torch.compile integration Mar 11, 2026

github-project-automation bot moved this to To triage in torch.compile integration Mar 11, 2026

ProExpertProg changed the title ~~Draft [vLLM IR] 3/N fused_add_rms_norm and maybe_inplace~~ [vLLM IR] 3/N fused_add_rms_norm and maybe_inplace Mar 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[vLLM IR] 3/N fused_add_rms_norm and maybe_inplace#36823

[vLLM IR] 3/N fused_add_rms_norm and maybe_inplace#36823
ProExpertProg wants to merge 4 commits intoluka/vllm-ir/rms-norm-batch-invariantfrom
luka/vllm-ir/rms-norm-inplace

ProExpertProg commented Mar 11, 2026 •

edited by github-actions bot

Loading

Uh oh!

mergify bot commented Mar 11, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 11, 2026

Uh oh!

gemini-code-assist bot Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ProExpertProg commented Mar 11, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

mergify bot commented Mar 11, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ProExpertProg commented Mar 11, 2026 •

edited by github-actions bot

Loading