Skip to content

Conversation

@wujingyue
Copy link
Collaborator

@wujingyue wujingyue commented Nov 4, 2025

Summary

  • store HostIrContainer top-level expressions in std::list<Expr*> for cheaper insertions
  • adapt Host IR passes/tests to iterate over std::list instead of assuming std::vector
  • switch host IR tests to gmock matcher style so they no longer rely on random access helpers
  • simplify lowering.cpp and insert_deallocations.cpp with the new API

@wujingyue
Copy link
Collaborator Author

!test

@github-actions
Copy link

github-actions bot commented Nov 4, 2025

Review updated until commit 1b73a94

Description

  • Switch top-level expressions to std::list for efficient insertions

  • Replace insertExprAfter with iterator-based insertExprBefore

  • Update host IR passes to use list iterators instead of indices

  • Modernize tests with gmock matchers for better readability

Changes walkthrough

Relevant files
Enhancement
5 files
container.cpp
Switch to std::list and update insertion API                         
+11/-5   
lowering.cpp
Adapt to list iterator-based insertion                                     
+63/-58 
convert_op_to_communication.cpp
Update pass to use std::list                                                         
+6/-3     
stream_parallel_type.cpp
Migrate to std::list in stream pass                                           
+16/-13 
container.h
Update container interface for std::list                                 
+17/-19 
Bug fix
2 files
insert_deallocations.cpp
Use list iterators for deallocation insertion                       
+37/-25 
test_multidevice_stream_parallel_type.cpp
Fix iterator usage in multidevice test                                     
+3/-1     
Tests
2 files
test_host_ir_integration.cpp
Update integration tests for list API                                       
+3/-5     
test_host_ir_stream_lowering.cpp
Modernize tests with gmock matchers                                           
+142/-84

PR Reviewer Guide

Here are some key observations to aid the review process:

🧪 PR contains tests
⚡ Recommended focus areas for review
API Consistency

The method pushBackTopLevelExprs returns an iterator, which is consistent with list::insert, but the naming might be confusing as it does not clearly indicate it returns an iterator. Consider aligning the method name with its behavior or documenting the return value clearly.

std::list<Expr*>::const_iterator HostIrContainer::pushBackTopLevelExprs(
    Expr* expr) {
  assertInContainer(expr, "Cannot add expr, ");
  return top_level_exprs_.insert(top_level_exprs_.end(), expr);
Iterator Invalidation

The use of std::prev on an iterator obtained from a list that may be modified during iteration should be reviewed to ensure iterator validity, especially since insertions are happening in the same container.

auto prev = std::prev(insertion_point);
Expr* e = *prev;
Namespace Declaration

The namespace nvfuser::hir is opened in the header but not closed, which may lead to declaration issues in other translation units. Ensure proper namespace scoping.

namespace nvfuser::hir {

// HostIrContainer is used to represent a host program.
// 1) It inherits from Fusion, so that (Host) IRs can be registered to it.
// 2) It holds a list of Host Expressions `top_level_exprs_` that represent
// the host program.
class HostIrContainer final : public Fusion {
 public:
  HostIrContainer() = default;
  HostIrContainer(const HostIrContainer&) = delete;
  HostIrContainer& operator=(const HostIrContainer&) = delete;

  // Print to an output stream
  std::ostream& print(std::ostream& os) const;

  const std::list<Expr*>& topLevelExprs() const;
  // Appends `expr` and returns the iterator pointing to `expr`.
  std::list<Expr*>::const_iterator pushBackTopLevelExprs(Expr* expr);
  void insertExprBefore(std::list<Expr*>::const_iterator position, Expr* expr);
  // Only used for MultiDeviceExecutor. While convenient, it should generally
  // be avoided because it implicitly modifies `top_level_exprs_`, making the
  // code harder to reason about.
  void resetTopLevelExprs(std::list<Expr*> exprs) {
    top_level_exprs_ = std::move(exprs);
  }

  void addKernelExecutor(std::unique_ptr<KernelExecutor> ke);
  bool hasKernelExecutor(int64_t group_id) const;
  KernelExecutor& getKernelExecutor(int64_t group_id) const;

  Stream* getDefaultStream();

 private:
  std::list<Expr*> top_level_exprs_;

  // Indexed by group ID. This way, parallel compilation can write to disjoint
  // locations without having to precompute a global index.
  std::vector<std::unique_ptr<KernelExecutor>> kernel_executors_;

  Stream* default_stream_ = nullptr;
};

} // namespace nvfuser::hir

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR refactors HostIrContainer to use std::list<Expr*> instead of std::vector<Expr*> for storing top-level expressions, achieving O(1) insertions compared to O(n) with vectors. The change systematically updates all host IR passes and tests to work with list iterators.

Key changes:

  • Replaced std::vector with std::list in container storage
  • Updated insertExprAfter to use std::next for iterator arithmetic with bounds checking
  • Adapted all passes to iterate over lists instead of using random access
  • Converted tests to gmock matchers (ElementsAre) eliminating index-based assertions

Issues identified:

  • Performance regression in insert_deallocations.cpp: The deallocation pass calls insertExprAfter with integer indices in a loop, resulting in O(n²) complexity since each call now requires O(n) list traversal with std::next

Confidence Score: 3/5

  • Generally safe but contains a performance regression that should be addressed
  • The refactoring is well-executed with comprehensive test updates, but the O(n²) performance issue in insert_deallocations.cpp is a real concern that degrades from the original O(n) complexity
  • csrc/host_ir/pass/insert_deallocations.cpp needs optimization to avoid O(n²) complexity with list-based implementation

Important Files Changed

File Analysis

Filename Score Overview
csrc/host_ir/container.cpp 3/5 Changed insertExprAfter to use std::next for list iterator arithmetic, added bounds checking. Potential O(n²) performance regression in insert_deallocations.cpp
csrc/host_ir/container.h 4/5 Changed storage from std::vector<Expr*> to std::list<Expr*>, removed obsolete comment about considering linkedlist
csrc/host_ir/pass/insert_deallocations.cpp 2/5 Updated type to std::list. Contains O(n²) performance issue: insertExprAfter is called in loop with indexed access on list

Sequence Diagram

sequenceDiagram
    participant Client as Client Code
    participant Container as HostIrContainer
    participant List as std::list<Expr*>
    
    Note over Client,List: Insertion Flow (New Implementation)
    
    Client->>Container: insertExprAfter(index, expr)
    Container->>Container: Check bounds: index >= -1 && index < size
    alt index == -1
        Container->>List: insert at begin()
    else index >= 0
        Container->>List: std::next(begin(), index + 1)
        Note over Container,List: O(n) linear traversal
        Container->>List: insert(insert_pos, expr)
    end
    
    Note over Client,List: Deallocation Pass Pattern (O(n²) issue)
    
    Client->>Container: topLevelExprs()
    Container-->>Client: const std::list<Expr*>&
    Client->>Client: Build last_use_by_index (O(n))
    Client->>Client: Sort by index
    loop For each index in reverse
        Client->>Container: insertExprAfter(i, deallocate)
        Note over Container,List: Each call: O(n) traversal
    end
    Note over Client: Total: O(n²) complexity
Loading

Additional Comments (1)

  1. csrc/host_ir/pass/insert_deallocations.cpp, line 54-56 (link)

    logic: This loop has O(n²) complexity with the list implementation. insertExprAfter(i, ...) calls std::next(top_level_exprs_.begin(), index + 1) which is O(n) for lists. Since this loops through indices in reverse, each call walks the list from the beginning.

    Consider refactoring to use iterators or convert last_use_by_index to store iterators instead of indices to maintain O(n) complexity.

8 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

We can now efficiently insert an allocation before the for loop.
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR refactors HostIrContainer to use std::list<Expr*> instead of std::vector<Expr*> for storing top-level expressions, enabling O(1) insertion performance instead of O(n).

Key changes:

  • Replaced insertExprAfter(index, expr) with insertExprBefore(iterator, expr) for safer, more idiomatic list manipulation
  • Modified pushBackTopLevelExprs to return an iterator to the inserted element, enabling efficient insertion point tracking
  • Updated insert_deallocations.cpp to traverse in reverse using iterators instead of maintaining index-based positions
  • Converted tests from index-based access (at()) to gmock's ElementsAre matchers for better compatibility with list iteration
  • All Host IR passes and lowering logic now work with iterators instead of indices

Confidence Score: 5/5

  • This PR is safe to merge with no issues identified
  • The refactoring is well-designed and consistently applied across all files. The API change from index-based to iterator-based operations is a clear improvement that eliminates O(n) insertion costs. All call sites were properly updated, tests were migrated to gmock matchers, and the semantics remain correct. No logic bugs or edge cases were introduced.
  • No files require special attention

Important Files Changed

File Analysis

Filename Score Overview
csrc/host_ir/container.h 5/5 Changed top_level_exprs_ from std::vector to std::list, updated API to use iterators instead of indices. API changes are clean and consistent.
csrc/host_ir/container.cpp 5/5 Implemented list-based insertion methods. pushBackTopLevelExprs now returns iterator to inserted element, insertExprBefore replaces insertExprAfter with proper iterator-based insertion.
csrc/host_ir/lowering.cpp 5/5 Updated to use iterator-based insertion API. Captures insertion point from pushBackTopLevelExprs and uses insertExprBefore to add allocations before for-loops. Logic is correct.
csrc/host_ir/pass/insert_deallocations.cpp 5/5 Refactored from index-based to reverse-iterator-based traversal. Inserts deallocations using insertExprBefore. Algorithm maintains correct semantics with better performance.
tests/cpp/test_host_ir_stream_lowering.cpp 5/5 Migrated from index-based assertions (at()) to gmock ElementsAre matchers. Tests now work with list interface and are more idiomatic.

Sequence Diagram

sequenceDiagram
    participant Client
    participant HostIrContainer
    participant TopLevelExprs as std::list<Expr*>
    
    Note over HostIrContainer,TopLevelExprs: New API Design
    
    Client->>HostIrContainer: pushBackTopLevelExprs(for_loop)
    HostIrContainer->>TopLevelExprs: insert(end(), for_loop)
    TopLevelExprs-->>HostIrContainer: iterator to for_loop
    HostIrContainer-->>Client: return iterator (insertion_point)
    
    Note over Client: Later: need to insert before for_loop
    
    Client->>HostIrContainer: insertExprBefore(insertion_point, allocate)
    HostIrContainer->>TopLevelExprs: insert(insertion_point, allocate)
    TopLevelExprs-->>HostIrContainer: iterator to allocate
    
    Note over TopLevelExprs: Result: [allocate, for_loop, ...]
    Note over HostIrContainer: O(1) insertion vs O(n) with vector
Loading

5 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@wujingyue
Copy link
Collaborator Author

!test

@wujingyue wujingyue requested a review from mdavis36 November 4, 2025 07:57
@wujingyue
Copy link
Collaborator Author

!test

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Replaces std::vector with std::list for storing top-level expressions in HostIrContainer to optimize insertion performance.

Key Changes:

  • API modernization: Replaced index-based insertExprAfter(int64_t) with iterator-based insertExprBefore(iterator)
  • Return value enhancement: pushBackTopLevelExprs now returns an iterator to enable efficient insertions
  • Algorithm refactoring: insert_deallocations.cpp switched from map-based tracking with sorting to backward iteration with set-based last-use detection
  • Improved code clarity: Updated comments to accurately reflect list-based storage, removed TODO about considering linked list

Benefits:

  • O(1) insertion vs O(n) with vectors
  • More idiomatic C++ with iterator-based APIs
  • Simpler deallocation pass without index management
  • Better encapsulation with const iterator returns

Confidence Score: 5/5

  • This PR is safe to merge with high confidence - it's a well-executed refactoring with clear performance benefits
  • The refactoring is clean, well-tested through the existing test suite, and maintains semantic equivalence while improving performance. The switch from vector to list is appropriate for the use case where insertions are frequent. Iterator safety is properly handled, and all API changes are internal to the affected files.
  • No files require special attention - all changes are straightforward refactorings

Important Files Changed

File Analysis

Filename Score Overview
csrc/host_ir/container.h 5/5 Refactored to use std::list instead of std::vector for better insertion performance. Changed API from insertExprAfter(index) to insertExprBefore(iterator), pushBackTopLevelExprs now returns iterator. Clean implementation with appropriate const-correctness.
csrc/host_ir/lowering.cpp 5/5 Adapted to use new list-based API. Key change: captures iterator from pushBackTopLevelExprs to enable inserting allocations before the for loop. Logic correctly maintains allocation ordering before loop execution.
csrc/host_ir/pass/insert_deallocations.cpp 5/5 Refactored from index-based to iterator-based traversal. Iterates backwards using iterators, inserts deallocations after last use. Uses unordered_set instead of map for cleaner last-use tracking. Semantically equivalent to original.

Sequence Diagram

sequenceDiagram
    participant Client
    participant HostIrContainer
    participant List as std::list<Expr*>
    
    Note over Client,List: Initialization Phase
    Client->>HostIrContainer: create()
    HostIrContainer->>List: initialize empty list
    
    Note over Client,List: Building Expressions (lowering.cpp)
    Client->>HostIrContainer: pushBackTopLevelExprs(for_loop)
    HostIrContainer->>List: insert(end(), for_loop)
    List-->>HostIrContainer: return iterator
    HostIrContainer-->>Client: return iterator to for_loop
    
    loop For each output needing allocation
        Client->>HostIrContainer: insertExprBefore(iterator, allocate)
        HostIrContainer->>List: insert(iterator, allocate)
        Note over List: O(1) insertion before for_loop
    end
    
    Note over Client,List: Deallocation Pass (insert_deallocations.cpp)
    Client->>HostIrContainer: topLevelExprs()
    HostIrContainer-->>Client: const std::list<Expr*>&
    
    Client->>Client: insertion_point = end()
    loop Backwards iteration
        Client->>Client: prev = std::prev(insertion_point)
        Client->>Client: check if inputs need deallocation
        opt Input is last use
            Client->>HostIrContainer: insertExprBefore(insertion_point, deallocate)
            HostIrContainer->>List: insert(insertion_point, deallocate)
            Note over List: Inserts after last use (before next expr)
        end
        Client->>Client: insertion_point = prev
    end
Loading

3 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

hic.pushBackTopLevelExprs(wait);
}
} break;
break;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NFC. I was told

case ...: {
  ...;
  break;
}

is more conventional than

case ...: {
  ...;
} break;

Comment on lines -107 to -109
// for each non-input TensorView:
// if it needs an out-of-loop allocation:
// create an Allocate and append it to the top level
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to remove this extra loop because Allocates can be inserted later right before the hir::ForLoop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants