[vLLM IR] 3/N fused_add_rms_norm and maybe_inplace#36823
[vLLM IR] 3/N fused_add_rms_norm and maybe_inplace#36823ProExpertProg wants to merge 4 commits intoluka/vllm-ir/rms-norm-batch-invariantfrom
Conversation
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
4b47060 to
837d6f3
Compare
There was a problem hiding this comment.
Code Review
This pull request introduces a maybe_inplace mechanism to the vLLM IR, allowing for performance optimizations through in-place operations while maintaining functional semantics for default op calls. It also adds a new fused_add_rms_norm op that leverages this new capability. The changes are extensive, touching the IR definition, compiler passes, and kernel implementations. My review has identified a critical safety issue where a potential misuse of an in-place operation only triggers a warning instead of an error, and a high-severity issue related to an incomplete compiler pass (CloneCleanupPass) that is being added.
| logger.warning( | ||
| "Node %s (input to %s) has another use", arg, node | ||
| ) | ||
| # TODO raise error, this is undefined behavior, which should not be allowed. | ||
| # Users can just use the default overload if they want to keep activation inputs untouched. |
There was a problem hiding this comment.
The check for other users of an activation input that is about to be modified in-place currently only logs a warning. This can lead to silent correctness issues if the input tensor is used elsewhere after being modified. As the TODO comment suggests, this should raise an error to prevent such undefined behavior. Allowing compilation to proceed with a warning could introduce hard-to-debug errors downstream.
| logger.warning( | |
| "Node %s (input to %s) has another use", arg, node | |
| ) | |
| # TODO raise error, this is undefined behavior, which should not be allowed. | |
| # Users can just use the default overload if they want to keep activation inputs untouched. | |
| raise ValueError( | |
| f"Node {arg} (input to {node}) has another use in {user}. " | |
| f"Using maybe_inplace on an input with multiple users is not allowed. " | |
| f"Use the default overload if you want to keep activation inputs untouched." | |
| ) |
| continue # TODO | ||
| node.replace_all_uses_with(node.args[0]) | ||
| graph.erase_node(node) | ||
| count += 1 |
There was a problem hiding this comment.
The CloneCleanupPass is added to the pass manager but its implementation is currently a no-op due to the continue # TODO statement. The logic to remove clone nodes is commented out. Merging incomplete or placeholder code can lead to confusion and makes it unclear if the feature is intended to be active. The pass should either be fully implemented or removed from the pass manager until it's ready.
| continue # TODO | |
| node.replace_all_uses_with(node.args[0]) | |
| graph.erase_node(node) | |
| count += 1 | |
| # A clone is safe to remove if its input has no other users. | |
| # This is a conservative check. A more sophisticated analysis | |
| # could trace back to the `maybe_inplace` call and its metadata. | |
| if len(node.args[0].users) == 1: | |
| node.replace_all_uses_with(node.args[0]) | |
| graph.erase_node(node) | |
| count += 1 |
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
837d6f3 to
d5e968e
Compare
Purpose
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.