Use std::list for HostIrContainer top-level exprs #5476

wujingyue · 2025-11-04T01:07:19Z

Summary

store HostIrContainer top-level expressions in std::list<Expr*> for cheaper insertions
adapt Host IR passes/tests to iterate over std::list instead of assuming std::vector
switch host IR tests to gmock matcher style so they no longer rely on random access helpers
simplify lowering.cpp and insert_deallocations.cpp with the new API

wujingyue · 2025-11-04T01:07:59Z

!test

github-actions · 2025-11-04T01:08:52Z

Review updated until commit 1b73a94

Description

Switch top-level expressions to std::list for efficient insertions
Replace insertExprAfter with iterator-based insertExprBefore
Update host IR passes to use list iterators instead of indices
Modernize tests with gmock matchers for better readability

Changes walkthrough

Relevant files

Enhancement

5 files

container.cpp `Switch to std::list and update insertion API`	+11/-5
lowering.cpp `Adapt to list iterator-based insertion`	+63/-58
convert_op_to_communication.cpp `Update pass to use std::list`	+6/-3
stream_parallel_type.cpp `Migrate to std::list in stream pass`	+16/-13
container.h `Update container interface for std::list`	+17/-19

Bug fix

2 files

insert_deallocations.cpp `Use list iterators for deallocation insertion`	+37/-25
test_multidevice_stream_parallel_type.cpp `Fix iterator usage in multidevice test`	+3/-1

Tests

2 files

test_host_ir_integration.cpp `Update integration tests for list API`	+3/-5
test_host_ir_stream_lowering.cpp `Modernize tests with gmock matchers`	+142/-84

PR Reviewer Guide

Here are some key observations to aid the review process:

🧪 PR contains tests

⚡ Recommended focus areas for review

API Consistency

The method pushBackTopLevelExprs returns an iterator, which is consistent with list::insert, but the naming might be confusing as it does not clearly indicate it returns an iterator. Consider aligning the method name with its behavior or documenting the return value clearly.

std::list<Expr*>::const_iterator HostIrContainer::pushBackTopLevelExprs(
    Expr* expr) {
  assertInContainer(expr, "Cannot add expr, ");
  return top_level_exprs_.insert(top_level_exprs_.end(), expr);

Iterator Invalidation

The use of std::prev on an iterator obtained from a list that may be modified during iteration should be reviewed to ensure iterator validity, especially since insertions are happening in the same container.

auto prev = std::prev(insertion_point);
Expr* e = *prev;

Namespace Declaration

The namespace nvfuser::hir is opened in the header but not closed, which may lead to declaration issues in other translation units. Ensure proper namespace scoping.

namespace nvfuser::hir {

// HostIrContainer is used to represent a host program.
// 1) It inherits from Fusion, so that (Host) IRs can be registered to it.
// 2) It holds a list of Host Expressions `top_level_exprs_` that represent
// the host program.
class HostIrContainer final : public Fusion {
 public:
  HostIrContainer() = default;
  HostIrContainer(const HostIrContainer&) = delete;
  HostIrContainer& operator=(const HostIrContainer&) = delete;

  // Print to an output stream
  std::ostream& print(std::ostream& os) const;

  const std::list<Expr*>& topLevelExprs() const;
  // Appends `expr` and returns the iterator pointing to `expr`.
  std::list<Expr*>::const_iterator pushBackTopLevelExprs(Expr* expr);
  void insertExprBefore(std::list<Expr*>::const_iterator position, Expr* expr);
  // Only used for MultiDeviceExecutor. While convenient, it should generally
  // be avoided because it implicitly modifies `top_level_exprs_`, making the
  // code harder to reason about.
  void resetTopLevelExprs(std::list<Expr*> exprs) {
    top_level_exprs_ = std::move(exprs);
  }

  void addKernelExecutor(std::unique_ptr<KernelExecutor> ke);
  bool hasKernelExecutor(int64_t group_id) const;
  KernelExecutor& getKernelExecutor(int64_t group_id) const;

  Stream* getDefaultStream();

 private:
  std::list<Expr*> top_level_exprs_;

  // Indexed by group ID. This way, parallel compilation can write to disjoint
  // locations without having to precompute a global index.
  std::vector<std::unique_ptr<KernelExecutor>> kernel_executors_;

  Stream* default_stream_ = nullptr;
};

} // namespace nvfuser::hir

greptile-apps

Greptile Overview

Greptile Summary

This PR refactors HostIrContainer to use std::list<Expr*> instead of std::vector<Expr*> for storing top-level expressions, achieving O(1) insertions compared to O(n) with vectors. The change systematically updates all host IR passes and tests to work with list iterators.

Key changes:

Replaced std::vector with std::list in container storage
Updated insertExprAfter to use std::next for iterator arithmetic with bounds checking
Adapted all passes to iterate over lists instead of using random access
Converted tests to gmock matchers (ElementsAre) eliminating index-based assertions

Issues identified:

Performance regression in insert_deallocations.cpp: The deallocation pass calls insertExprAfter with integer indices in a loop, resulting in O(n²) complexity since each call now requires O(n) list traversal with std::next

Confidence Score: 3/5

Generally safe but contains a performance regression that should be addressed
The refactoring is well-executed with comprehensive test updates, but the O(n²) performance issue in insert_deallocations.cpp is a real concern that degrades from the original O(n) complexity
csrc/host_ir/pass/insert_deallocations.cpp needs optimization to avoid O(n²) complexity with list-based implementation

Important Files Changed

File Analysis

Filename	Score	Overview
csrc/host_ir/container.cpp	3/5	Changed `insertExprAfter` to use `std::next` for list iterator arithmetic, added bounds checking. Potential O(n²) performance regression in `insert_deallocations.cpp`
csrc/host_ir/container.h	4/5	Changed storage from `std::vector<Expr>` to `std::list<Expr>`, removed obsolete comment about considering linkedlist
csrc/host_ir/pass/insert_deallocations.cpp	2/5	Updated type to `std::list`. Contains O(n²) performance issue: `insertExprAfter` is called in loop with indexed access on list

Sequence Diagram

sequenceDiagram
    participant Client as Client Code
    participant Container as HostIrContainer
    participant List as std::list<Expr*>
    
    Note over Client,List: Insertion Flow (New Implementation)
    
    Client->>Container: insertExprAfter(index, expr)
    Container->>Container: Check bounds: index >= -1 && index < size
    alt index == -1
        Container->>List: insert at begin()
    else index >= 0
        Container->>List: std::next(begin(), index + 1)
        Note over Container,List: O(n) linear traversal
        Container->>List: insert(insert_pos, expr)
    end
    
    Note over Client,List: Deallocation Pass Pattern (O(n²) issue)
    
    Client->>Container: topLevelExprs()
    Container-->>Client: const std::list<Expr*>&
    Client->>Client: Build last_use_by_index (O(n))
    Client->>Client: Sort by index
    loop For each index in reverse
        Client->>Container: insertExprAfter(i, deallocate)
        Note over Container,List: Each call: O(n) traversal
    end
    Note over Client: Total: O(n²) complexity

Additional Comments (1)

csrc/host_ir/pass/insert_deallocations.cpp, line 54-56 (link)

logic: This loop has O(n²) complexity with the list implementation. insertExprAfter(i, ...) calls std::next(top_level_exprs_.begin(), index + 1) which is O(n) for lists. Since this loops through indices in reverse, each call walks the list from the beginning.

Consider refactoring to use iterators or convert last_use_by_index to store iterators instead of indices to maintain O(n) complexity.

_{8 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

We can now efficiently insert an allocation before the for loop.

greptile-apps

Greptile Overview

Greptile Summary

This PR refactors HostIrContainer to use std::list<Expr*> instead of std::vector<Expr*> for storing top-level expressions, enabling O(1) insertion performance instead of O(n).

Key changes:

Replaced insertExprAfter(index, expr) with insertExprBefore(iterator, expr) for safer, more idiomatic list manipulation
Modified pushBackTopLevelExprs to return an iterator to the inserted element, enabling efficient insertion point tracking
Updated insert_deallocations.cpp to traverse in reverse using iterators instead of maintaining index-based positions
Converted tests from index-based access (at()) to gmock's ElementsAre matchers for better compatibility with list iteration
All Host IR passes and lowering logic now work with iterators instead of indices

Confidence Score: 5/5

This PR is safe to merge with no issues identified
The refactoring is well-designed and consistently applied across all files. The API change from index-based to iterator-based operations is a clear improvement that eliminates O(n) insertion costs. All call sites were properly updated, tests were migrated to gmock matchers, and the semantics remain correct. No logic bugs or edge cases were introduced.
No files require special attention

Important Files Changed

File Analysis

Filename	Score	Overview
csrc/host_ir/container.h	5/5	Changed top_level_exprs_ from std::vector to std::list, updated API to use iterators instead of indices. API changes are clean and consistent.
csrc/host_ir/container.cpp	5/5	Implemented list-based insertion methods. pushBackTopLevelExprs now returns iterator to inserted element, insertExprBefore replaces insertExprAfter with proper iterator-based insertion.
csrc/host_ir/lowering.cpp	5/5	Updated to use iterator-based insertion API. Captures insertion point from pushBackTopLevelExprs and uses insertExprBefore to add allocations before for-loops. Logic is correct.
csrc/host_ir/pass/insert_deallocations.cpp	5/5	Refactored from index-based to reverse-iterator-based traversal. Inserts deallocations using insertExprBefore. Algorithm maintains correct semantics with better performance.
tests/cpp/test_host_ir_stream_lowering.cpp	5/5	Migrated from index-based assertions (at()) to gmock ElementsAre matchers. Tests now work with list interface and are more idiomatic.

Sequence Diagram

sequenceDiagram
    participant Client
    participant HostIrContainer
    participant TopLevelExprs as std::list<Expr*>
    
    Note over HostIrContainer,TopLevelExprs: New API Design
    
    Client->>HostIrContainer: pushBackTopLevelExprs(for_loop)
    HostIrContainer->>TopLevelExprs: insert(end(), for_loop)
    TopLevelExprs-->>HostIrContainer: iterator to for_loop
    HostIrContainer-->>Client: return iterator (insertion_point)
    
    Note over Client: Later: need to insert before for_loop
    
    Client->>HostIrContainer: insertExprBefore(insertion_point, allocate)
    HostIrContainer->>TopLevelExprs: insert(insertion_point, allocate)
    TopLevelExprs-->>HostIrContainer: iterator to allocate
    
    Note over TopLevelExprs: Result: [allocate, for_loop, ...]
    Note over HostIrContainer: O(1) insertion vs O(n) with vector

_{5 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

wujingyue · 2025-11-04T07:57:43Z

!test

wujingyue · 2025-11-04T08:18:52Z

!test

greptile-apps

Greptile Overview

Greptile Summary

Replaces std::vector with std::list for storing top-level expressions in HostIrContainer to optimize insertion performance.

Key Changes:

API modernization: Replaced index-based insertExprAfter(int64_t) with iterator-based insertExprBefore(iterator)
Return value enhancement: pushBackTopLevelExprs now returns an iterator to enable efficient insertions
Algorithm refactoring: insert_deallocations.cpp switched from map-based tracking with sorting to backward iteration with set-based last-use detection
Improved code clarity: Updated comments to accurately reflect list-based storage, removed TODO about considering linked list

Benefits:

O(1) insertion vs O(n) with vectors
More idiomatic C++ with iterator-based APIs
Simpler deallocation pass without index management
Better encapsulation with const iterator returns

Confidence Score: 5/5

This PR is safe to merge with high confidence - it's a well-executed refactoring with clear performance benefits
The refactoring is clean, well-tested through the existing test suite, and maintains semantic equivalence while improving performance. The switch from vector to list is appropriate for the use case where insertions are frequent. Iterator safety is properly handled, and all API changes are internal to the affected files.
No files require special attention - all changes are straightforward refactorings

Important Files Changed

File Analysis

Filename	Score	Overview
csrc/host_ir/container.h	5/5	Refactored to use `std::list` instead of `std::vector` for better insertion performance. Changed API from `insertExprAfter(index)` to `insertExprBefore(iterator)`, `pushBackTopLevelExprs` now returns iterator. Clean implementation with appropriate const-correctness.
csrc/host_ir/lowering.cpp	5/5	Adapted to use new list-based API. Key change: captures iterator from `pushBackTopLevelExprs` to enable inserting allocations before the for loop. Logic correctly maintains allocation ordering before loop execution.
csrc/host_ir/pass/insert_deallocations.cpp	5/5	Refactored from index-based to iterator-based traversal. Iterates backwards using iterators, inserts deallocations after last use. Uses `unordered_set` instead of map for cleaner last-use tracking. Semantically equivalent to original.

Sequence Diagram

sequenceDiagram
    participant Client
    participant HostIrContainer
    participant List as std::list<Expr*>
    
    Note over Client,List: Initialization Phase
    Client->>HostIrContainer: create()
    HostIrContainer->>List: initialize empty list
    
    Note over Client,List: Building Expressions (lowering.cpp)
    Client->>HostIrContainer: pushBackTopLevelExprs(for_loop)
    HostIrContainer->>List: insert(end(), for_loop)
    List-->>HostIrContainer: return iterator
    HostIrContainer-->>Client: return iterator to for_loop
    
    loop For each output needing allocation
        Client->>HostIrContainer: insertExprBefore(iterator, allocate)
        HostIrContainer->>List: insert(iterator, allocate)
        Note over List: O(1) insertion before for_loop
    end
    
    Note over Client,List: Deallocation Pass (insert_deallocations.cpp)
    Client->>HostIrContainer: topLevelExprs()
    HostIrContainer-->>Client: const std::list<Expr*>&
    
    Client->>Client: insertion_point = end()
    loop Backwards iteration
        Client->>Client: prev = std::prev(insertion_point)
        Client->>Client: check if inputs need deallocation
        opt Input is last use
            Client->>HostIrContainer: insertExprBefore(insertion_point, deallocate)
            HostIrContainer->>List: insert(insertion_point, deallocate)
            Note over List: Inserts after last use (before next expr)
        end
        Client->>Client: insertion_point = prev
    end

_{3 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

wujingyue · 2025-11-04T08:00:39Z

csrc/host_ir/lowering.cpp

        hic.pushBackTopLevelExprs(wait);
      }
-    } break;
+      break;


NFC. I was told

case ...: { ...; break; }

is more conventional than

case ...: { ...; } break;

wujingyue · 2025-11-04T08:20:42Z

csrc/host_ir/lowering.cpp

-      // for each non-input TensorView:
-      //   if it needs an out-of-loop allocation:
-      //     create an Allocate and append it to the top level


I was able to remove this extra loop because Allocates can be inserted later right before the hir::ForLoop.

Use std::list for HostIrContainer top-level exprs

086233a

greptile-apps bot reviewed Nov 4, 2025

View reviewed changes

wujingyue added 2 commits November 3, 2025 20:58

Replace insertExprAfter with insertExprBefore

b9e15d7

Simplify lowerSegment

b7dee6c

We can now efficiently insert an allocation before the for loop.

greptile-apps bot reviewed Nov 4, 2025

View reviewed changes

wujingyue requested a review from mdavis36 November 4, 2025 07:57

Minor

1b73a94

greptile-apps bot reviewed Nov 4, 2025

View reviewed changes

wujingyue commented Nov 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use std::list for HostIrContainer top-level exprs #5476

Use std::list for HostIrContainer top-level exprs #5476

wujingyue commented Nov 4, 2025 •

edited

Loading

Uh oh!

wujingyue commented Nov 4, 2025

Uh oh!

github-actions bot commented Nov 4, 2025 •

edited

Loading

Changes walkthrough

PR Reviewer Guide

Uh oh!

greptile-apps bot left a comment •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

wujingyue commented Nov 4, 2025

Uh oh!

wujingyue commented Nov 4, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

wujingyue Nov 4, 2025

Uh oh!

wujingyue Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Use std::list for HostIrContainer top-level exprs #5476

Are you sure you want to change the base?

Use std::list for HostIrContainer top-level exprs #5476

Conversation

wujingyue commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

wujingyue commented Nov 4, 2025

Uh oh!

github-actions bot commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes walkthrough

PR Reviewer Guide

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Additional Comments (1)

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

wujingyue commented Nov 4, 2025

Uh oh!

wujingyue commented Nov 4, 2025

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

wujingyue Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

wujingyue Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wujingyue commented Nov 4, 2025 •

edited

Loading

github-actions bot commented Nov 4, 2025 •

edited

Loading

greptile-apps bot left a comment •

edited

Loading