- 
                Notifications
    
You must be signed in to change notification settings  - Fork 68
 
Use std::list for HostIrContainer top-level exprs #5476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| 
           !test  | 
    
| 
          
 Review updated until commit 1b73a94 Description
 
 | 
    
| Relevant files | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Enhancement | 5 files
  | ||||||||||
| Bug fix | |||||||||||
| Tests | 
 PR Reviewer Guide 
Here are some key observations to aid the review process:
| 🧪 PR contains tests | 
| ⚡ Recommended focus areas for review | 
API Consistency
 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR refactors HostIrContainer to use std::list<Expr*> instead of std::vector<Expr*> for storing top-level expressions, achieving O(1) insertions compared to O(n) with vectors. The change systematically updates all host IR passes and tests to work with list iterators.
Key changes:
- Replaced 
std::vectorwithstd::listin container storage - Updated 
insertExprAfterto usestd::nextfor iterator arithmetic with bounds checking - Adapted all passes to iterate over lists instead of using random access
 - Converted tests to gmock matchers (
ElementsAre) eliminating index-based assertions 
Issues identified:
- Performance regression in 
insert_deallocations.cpp: The deallocation pass callsinsertExprAfterwith integer indices in a loop, resulting in O(n²) complexity since each call now requires O(n) list traversal withstd::next 
Confidence Score: 3/5
- Generally safe but contains a performance regression that should be addressed
 - The refactoring is well-executed with comprehensive test updates, but the O(n²) performance issue in insert_deallocations.cpp is a real concern that degrades from the original O(n) complexity
 - csrc/host_ir/pass/insert_deallocations.cpp needs optimization to avoid O(n²) complexity with list-based implementation
 
Important Files Changed
File Analysis
| Filename | Score | Overview | 
|---|---|---|
| csrc/host_ir/container.cpp | 3/5 | Changed insertExprAfter to use std::next for list iterator arithmetic, added bounds checking. Potential O(n²) performance regression in insert_deallocations.cpp | 
| csrc/host_ir/container.h | 4/5 | Changed storage from std::vector<Expr*> to std::list<Expr*>, removed obsolete comment about considering linkedlist | 
| csrc/host_ir/pass/insert_deallocations.cpp | 2/5 | Updated type to std::list. Contains O(n²) performance issue: insertExprAfter is called in loop with indexed access on list | 
Sequence Diagram
sequenceDiagram
    participant Client as Client Code
    participant Container as HostIrContainer
    participant List as std::list<Expr*>
    
    Note over Client,List: Insertion Flow (New Implementation)
    
    Client->>Container: insertExprAfter(index, expr)
    Container->>Container: Check bounds: index >= -1 && index < size
    alt index == -1
        Container->>List: insert at begin()
    else index >= 0
        Container->>List: std::next(begin(), index + 1)
        Note over Container,List: O(n) linear traversal
        Container->>List: insert(insert_pos, expr)
    end
    
    Note over Client,List: Deallocation Pass Pattern (O(n²) issue)
    
    Client->>Container: topLevelExprs()
    Container-->>Client: const std::list<Expr*>&
    Client->>Client: Build last_use_by_index (O(n))
    Client->>Client: Sort by index
    loop For each index in reverse
        Client->>Container: insertExprAfter(i, deallocate)
        Note over Container,List: Each call: O(n) traversal
    end
    Note over Client: Total: O(n²) complexity
    Additional Comments (1)
- 
csrc/host_ir/pass/insert_deallocations.cpp, line 54-56 (link)logic: This loop has O(n²) complexity with the list implementation.
insertExprAfter(i, ...)callsstd::next(top_level_exprs_.begin(), index + 1)which is O(n) for lists. Since this loops through indices in reverse, each call walks the list from the beginning.Consider refactoring to use iterators or convert
last_use_by_indexto store iterators instead of indices to maintain O(n) complexity. 
8 files reviewed, 1 comment
We can now efficiently insert an allocation before the for loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR refactors HostIrContainer to use std::list<Expr*> instead of std::vector<Expr*> for storing top-level expressions, enabling O(1) insertion performance instead of O(n).
Key changes:
- Replaced 
insertExprAfter(index, expr)withinsertExprBefore(iterator, expr)for safer, more idiomatic list manipulation - Modified 
pushBackTopLevelExprsto return an iterator to the inserted element, enabling efficient insertion point tracking - Updated 
insert_deallocations.cppto traverse in reverse using iterators instead of maintaining index-based positions - Converted tests from index-based access (
at()) to gmock'sElementsArematchers for better compatibility with list iteration - All Host IR passes and lowering logic now work with iterators instead of indices
 
Confidence Score: 5/5
- This PR is safe to merge with no issues identified
 - The refactoring is well-designed and consistently applied across all files. The API change from index-based to iterator-based operations is a clear improvement that eliminates O(n) insertion costs. All call sites were properly updated, tests were migrated to gmock matchers, and the semantics remain correct. No logic bugs or edge cases were introduced.
 - No files require special attention
 
Important Files Changed
File Analysis
| Filename | Score | Overview | 
|---|---|---|
| csrc/host_ir/container.h | 5/5 | Changed top_level_exprs_ from std::vector to std::list, updated API to use iterators instead of indices. API changes are clean and consistent. | 
| csrc/host_ir/container.cpp | 5/5 | Implemented list-based insertion methods. pushBackTopLevelExprs now returns iterator to inserted element, insertExprBefore replaces insertExprAfter with proper iterator-based insertion. | 
| csrc/host_ir/lowering.cpp | 5/5 | Updated to use iterator-based insertion API. Captures insertion point from pushBackTopLevelExprs and uses insertExprBefore to add allocations before for-loops. Logic is correct. | 
| csrc/host_ir/pass/insert_deallocations.cpp | 5/5 | Refactored from index-based to reverse-iterator-based traversal. Inserts deallocations using insertExprBefore. Algorithm maintains correct semantics with better performance. | 
| tests/cpp/test_host_ir_stream_lowering.cpp | 5/5 | Migrated from index-based assertions (at()) to gmock ElementsAre matchers. Tests now work with list interface and are more idiomatic. | 
Sequence Diagram
sequenceDiagram
    participant Client
    participant HostIrContainer
    participant TopLevelExprs as std::list<Expr*>
    
    Note over HostIrContainer,TopLevelExprs: New API Design
    
    Client->>HostIrContainer: pushBackTopLevelExprs(for_loop)
    HostIrContainer->>TopLevelExprs: insert(end(), for_loop)
    TopLevelExprs-->>HostIrContainer: iterator to for_loop
    HostIrContainer-->>Client: return iterator (insertion_point)
    
    Note over Client: Later: need to insert before for_loop
    
    Client->>HostIrContainer: insertExprBefore(insertion_point, allocate)
    HostIrContainer->>TopLevelExprs: insert(insertion_point, allocate)
    TopLevelExprs-->>HostIrContainer: iterator to allocate
    
    Note over TopLevelExprs: Result: [allocate, for_loop, ...]
    Note over HostIrContainer: O(1) insertion vs O(n) with vector
    5 files reviewed, no comments
| 
           !test  | 
    
| 
           !test  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
Replaces std::vector with std::list for storing top-level expressions in HostIrContainer to optimize insertion performance.
Key Changes:
- API modernization: Replaced index-based 
insertExprAfter(int64_t)with iterator-basedinsertExprBefore(iterator) - Return value enhancement: 
pushBackTopLevelExprsnow returns an iterator to enable efficient insertions - Algorithm refactoring: 
insert_deallocations.cppswitched from map-based tracking with sorting to backward iteration with set-based last-use detection - Improved code clarity: Updated comments to accurately reflect list-based storage, removed TODO about considering linked list
 
Benefits:
- O(1) insertion vs O(n) with vectors
 - More idiomatic C++ with iterator-based APIs
 - Simpler deallocation pass without index management
 - Better encapsulation with const iterator returns
 
Confidence Score: 5/5
- This PR is safe to merge with high confidence - it's a well-executed refactoring with clear performance benefits
 - The refactoring is clean, well-tested through the existing test suite, and maintains semantic equivalence while improving performance. The switch from vector to list is appropriate for the use case where insertions are frequent. Iterator safety is properly handled, and all API changes are internal to the affected files.
 - No files require special attention - all changes are straightforward refactorings
 
Important Files Changed
File Analysis
| Filename | Score | Overview | 
|---|---|---|
| csrc/host_ir/container.h | 5/5 | Refactored to use std::list instead of std::vector for better insertion performance. Changed API from insertExprAfter(index) to insertExprBefore(iterator), pushBackTopLevelExprs now returns iterator. Clean implementation with appropriate const-correctness. | 
| csrc/host_ir/lowering.cpp | 5/5 | Adapted to use new list-based API. Key change: captures iterator from pushBackTopLevelExprs to enable inserting allocations before the for loop. Logic correctly maintains allocation ordering before loop execution. | 
| csrc/host_ir/pass/insert_deallocations.cpp | 5/5 | Refactored from index-based to iterator-based traversal. Iterates backwards using iterators, inserts deallocations after last use. Uses unordered_set instead of map for cleaner last-use tracking. Semantically equivalent to original. | 
Sequence Diagram
sequenceDiagram
    participant Client
    participant HostIrContainer
    participant List as std::list<Expr*>
    
    Note over Client,List: Initialization Phase
    Client->>HostIrContainer: create()
    HostIrContainer->>List: initialize empty list
    
    Note over Client,List: Building Expressions (lowering.cpp)
    Client->>HostIrContainer: pushBackTopLevelExprs(for_loop)
    HostIrContainer->>List: insert(end(), for_loop)
    List-->>HostIrContainer: return iterator
    HostIrContainer-->>Client: return iterator to for_loop
    
    loop For each output needing allocation
        Client->>HostIrContainer: insertExprBefore(iterator, allocate)
        HostIrContainer->>List: insert(iterator, allocate)
        Note over List: O(1) insertion before for_loop
    end
    
    Note over Client,List: Deallocation Pass (insert_deallocations.cpp)
    Client->>HostIrContainer: topLevelExprs()
    HostIrContainer-->>Client: const std::list<Expr*>&
    
    Client->>Client: insertion_point = end()
    loop Backwards iteration
        Client->>Client: prev = std::prev(insertion_point)
        Client->>Client: check if inputs need deallocation
        opt Input is last use
            Client->>HostIrContainer: insertExprBefore(insertion_point, deallocate)
            HostIrContainer->>List: insert(insertion_point, deallocate)
            Note over List: Inserts after last use (before next expr)
        end
        Client->>Client: insertion_point = prev
    end
    3 files reviewed, no comments
| hic.pushBackTopLevelExprs(wait); | ||
| } | ||
| } break; | ||
| break; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NFC. I was told
case ...: {
  ...;
  break;
}
is more conventional than
case ...: {
  ...;
} break;
| // for each non-input TensorView: | ||
| // if it needs an out-of-loop allocation: | ||
| // create an Allocate and append it to the top level | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to remove this extra loop because Allocates can be inserted later right before the hir::ForLoop.
Summary