Skip to content

Conversation

@jjsjann123
Copy link
Collaborator

@jjsjann123 jjsjann123 commented Oct 23, 2025

Fixes: #5391

The issue is root cause from having un-connected IDs in allocation domain, triggering loop promotion assert on ID not covered by loop domain. However, since loop domain should only check coverage on logical sizes, we shouldn't included allocation domain in loop graph in the first place.

Changes in this PR:

  1. exclude IDs only on path to allocation domain from LOOP and IEL graphs. This is done by modifying TensorDomain::allIds method to exclude allocation domain from the pairwise path traversal.
  2. minor changes to add inline support for layout op.

@jjsjann123 jjsjann123 requested a review from naoyam October 23, 2025 18:34
@jjsjann123
Copy link
Collaborator Author

!test

@github-actions
Copy link

github-actions bot commented Oct 23, 2025

Review updated until commit 808a19b

Description

  • Exclude allocation domain IDs from loop graph to fix loop promotion assert

  • Add permissive mapping support when building intersection of ID graphs

  • Add test case for loop promotion issue in inference benchmark

  • Fix dtype mismatch in layout op test comparison


Changes walkthrough 📝

Relevant files
Bug fix
id_model.cpp
Add ID graph initialization excluding allocation domains 

csrc/id_model/id_model.cpp

  • Added discoverIdsExceptAllocation to traverse IDs excluding allocation
    domain
  • Introduced findAllIdsExceptAllocation to get IDs without allocation
    paths
  • Implemented initializeIdGraphExcludeAllocation for loop graph
    initialization
  • Modified buildIntersection to support permissive mapping and
    allocation exclusion
  • +130/-9 
    loop_promotion.cpp
    Fix loop promotion with permissive ID mapping                       

    csrc/id_model/loop_promotion.cpp

  • Updated buildIntersection call to use permissive mapping
  • Added check for group existence in loop graph before comparison
  • Modified loop promotion to handle incomplete ID coverage
  • +13/-2   
    Tests
    test_layout_op.cpp
    Add loop promotion test and fix dtype comparison                 

    tests/cpp/test_layout_op.cpp

  • Fixed dtype mismatch in layout op test comparison
  • Added new test for inference benchmark loop promotion issue
  • Created test case with grouped matmul input preprocessing
  • +65/-1   
    Enhancement
    id_model.h
    Declare new ID graph initialization methods                           

    csrc/id_model/id_model.h

  • Added declaration for initializeIdGraphExcludeAllocation
  • Extended buildIntersection with permissive and exclusion parameters
  • Updated documentation for new ID graph initialization methods
  • +9/-1     
    logical_domain_map.h
    Add layout op handling in logical domain map                         

    csrc/logical_domain_map.h

  • Added handler for PreprocessGroupedMatmulInputSf op
  • Implemented pointwise mapping for new layout op
  • +4/-0     

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    🧪 PR contains tests
    ⚡ Recommended focus areas for review

    Possible Issue

    The function discoverIdsExceptAllocation builds a list of domains to traverse but includes additionalIDs() unconditionally, without checking if it's related to allocation. This may inadvertently include allocation-related IDs if additionalIDs() contains them, undermining the purpose of excluding allocation paths.

    std::vector<const std::vector<IterDomain*>*> all_domains = {
        &tv->getLoopDomain(),
        &tv->getLogicalDomain(),
        &tv->getInitialLoopDomain(),
        &tv->domain()->additionalIDs()};
    if (tv->hasRoot()) {
      all_domains.push_back(&tv->getRootDomain());
    }
    if (tv->getAlternateLoopDomain().has_value()) {
      all_domains.push_back(&tv->getAlternateLoopDomain().value());
    }
    Logic Gap

    The method initializeIdGraphExcludeAllocation filters active uses based on whether an output ID is in loop_ids, but it does not validate if input IDs of expressions are also included in the graph. This could lead to inconsistencies if an expression uses an input ID that is excluded due to allocation path filtering.

    for (const auto& use : uses_it->second) {
      if (std::any_of(
              use->outputs().begin(), use->outputs().end(), [&](Val* output) {
                return output->isA<IterDomain>() &&
                    loop_ids.has(output->as<IterDomain>());
              })) {
        active_uses.pushBack(use);
      }
    }
    Redundant Check

    The check idGraph(IdMappingMode::LOOP).hasGroup(out) in LoopPromotionMapBuilder may be redundant because toGroup(out) already implies the group exists. If not, it could lead to undefined behavior, suggesting a need for either assertion or removal of the redundant check.

    return idGraph(IdMappingMode::LOOP).hasGroup(out) &&
        group != idGraph(IdMappingMode::LOOP).toGroup(out);

    @jjsjann123
    Copy link
    Collaborator Author

    jjsjann123 commented Oct 27, 2025

    To run @crcrpar 's example with nvfp4, i needed a few small fixes.

    So we would want to use thunder branch in Lightning-AI/lightning-thunder#2691

    I used nvfuser code in this PR, as well as the small fix for nvfp4 in #5428 (already approved, will update comment after merge).

    In order to run the benchmark, the command to use (run this in Thunder's directory). The program did run to completion, but I haven't got around to verify the model:

    NVFUSER_DISABLE=parallel_compile NVFUSER_ENABLE="id_model(all)" python thunder/benchmarks/benchmark_inference.py --mode thunder --enable-nv-linear --enable-nvfp4 --output-length 2
    

    Note: we need to add --output-length 2 otherwise the run takes forever....

    tagging @naoyam @protonu

    @jjsjann123
    Copy link
    Collaborator Author

    !test

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    None yet

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    IdModel assert during buildLoopGraph `nvfuser::LoopPromotionMapBuilder::findPromotionOfLoopGroup

    2 participants