Skip to content

Loop-invariant computations and inner-loop branching in BlockMonolithicSolve CSR assembly #785

@rzhang20

Description

@rzhang20

Description

While inspecting the block matrix solvers implementation, specifically the BlockMonolithicSolve subroutine in BlockSolve.F90, I noticed that loop-invariant expressions and static branch conditions are continuously evaluated inside the innermost CSR matrix assembly loop.

Problematic code location: fem/src/BlockSolve.F90 (inside SUBROUTINE BlockMonolithicSolve, specifically the complex/damped eigen system assembly section).

In the current implementation, the code iterates over the non-zero entries of the sparse submatrices. Inside the innermost DO j loop, it recalculates an offset expression and evaluates a condition based on the outer loop variable c:

DO NoCol = 1,NoVar
A => TotMatrix % SubMatrix( NoRow, NoCol ) % Mat
IF( .NOT. ASSOCIATED( A ) ) CYCLE
IF( .NOT. ASSOCIATED( A % Rows ) ) CYCLE
DO j=A % Rows(i),A % Rows(i+1)-1
! If we have the imaginary row add the multiplier of imaginary value first
IF( c == 2 ) THEN
k = k + 1
CollMat % Cols(k) = 2*TotMatrix % Offset(NoCol) + 2*A % Cols(j)-1
IF( ASSOCIATED( A % DampValues) ) THEN
CollMat % Values(k) = A % DampValues(j)
END IF
END IF
k = k + 1
CollMat % Values(k) = A % Values(j)
CollMat % Cols(k) = 2*TotMatrix % Offset(NoCol) + 2*(A % Cols(j)-1)+c
IF( ASSOCIATED( A % MassValues ) ) THEN
CollMat % MassValues(k) = A % MassValues(j)
END IF
! If we have the real row add the multiplier of imaginary value last
IF( c == 1 ) THEN
k = k + 1
CollMat % Cols(k) = 2*TotMatrix % Offset(NoCol) + 2*A % Cols(j)
IF( ASSOCIATED( A % DampValues) ) THEN
CollMat % Values(k) = -A % DampValues(j)
END IF
END IF
END DO
END DO

For any given execution of the DO j loop:

  1. The variables NoCol and c remain completely unchanged.
  2. The expression 2 * TotMatrix % Offset(NoCol) is mathematically invariant.

Despite this, the current code repeatedly executes the integer multiplication, the memory lookup for the Offset array, and the IF(c == ...) evaluations for every single non-zero entry of the sparse matrix.

Impact

From a performance and maintainability perspective, this has several drawbacks:

  • Unnecessary repeated work: The same invariant integer arithmetic and memory access are rebuilt on every call, accumulating overhead linearly with the number of non-zero entries ($N_{nz}$).
  • Branch prediction penalties: Placing static conditions (IF (c == 2) and IF (c == 1)) inside the innermost loop forces the CPU's branch predictor to evaluate them millions of times unnecessarily, potentially disrupting pipeline efficiency.
  • Missed optimization opportunities: While modern Fortran compilers (via -O2/-O3) might attempt Loop-Invariant Code Motion (LICM), complex pointer aliasing and derived type accesses in Fortran often prevent the compiler from safely unswitching loops or hoisting array lookups automatically.

Conceptually, the current logic is equivalent to asking the CPU to check "which part of the complex number am I building?" and calculating the "base offset" individually for millions of matrix entries, instead of deciding once and applying it to the whole block.

Suggested fix

DO NoCol = 1,NoVar
              A => TotMatrix % SubMatrix( NoRow, NoCol ) % Mat
              IF( .NOT. ASSOCIATED( A ) ) CYCLE
              IF( .NOT. ASSOCIATED( A % Rows ) ) CYCLE
              
              BLOCK
                INTEGER :: BaseColOffset
                ! 1. Hoist loop-invariant computation
                !    'NoCol' is invariant within this block, so the offset evaluation is constant.
                BaseColOffset = 2 * TotMatrix % Offset(NoCol)
                
                ! 2. Loop unswitching
                !    Extract the static branch condition 'c' outside the innermost loop 
                !    to avoid branch prediction penalties.
                IF ( c == 2 ) THEN
                  DO j = A % Rows(i), A % Rows(i+1)-1
                    
                    ! If we have the imaginary row, add the multiplier of imaginary value first
                    k = k + 1
                    CollMat % Cols(k) = BaseColOffset + 2*A % Cols(j) - 1
                    IF( ASSOCIATED( A % DampValues) ) THEN
                      CollMat % Values(k) = A % DampValues(j)
                    END IF
                    
                    ! Add the primary matrix value
                    k = k + 1
                    CollMat % Values(k) = A % Values(j)
                    ! Note: Since c == 2 in this branch, the constant 2 is explicitly embedded.
                    CollMat % Cols(k) = BaseColOffset + 2*(A % Cols(j)-1) + 2
                    IF( ASSOCIATED( A % MassValues ) ) THEN
                      CollMat % MassValues(k) = A % MassValues(j)
                    END IF
                    
                  END DO
                  
                ELSE IF ( c == 1 ) THEN
                  DO j = A % Rows(i), A % Rows(i+1)-1
                    
                    ! Add the primary matrix value
                    k = k + 1
                    CollMat % Values(k) = A % Values(j)
                    ! Note: Since c == 1 in this branch, the constant 1 is explicitly embedded.
                    CollMat % Cols(k) = BaseColOffset + 2*(A % Cols(j)-1) + 1
                    IF( ASSOCIATED( A % MassValues ) ) THEN
                      CollMat % MassValues(k) = A % MassValues(j)
                    END IF
                    
                    ! If we have the real row, add the multiplier of imaginary value last
                    k = k + 1
                    CollMat % Cols(k) = BaseColOffset + 2*A % Cols(j)
                    IF( ASSOCIATED( A % DampValues) ) THEN
                      CollMat % Values(k) = -A % DampValues(j)
                    END IF
                    
                  END DO
                END IF
              END BLOCK
          END DO

This refactoring explicitly reflects the mathematical invariance in the code, guaranteeing fewer CPU cycles per non-zero entry during monolithic assembly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions