-
Notifications
You must be signed in to change notification settings - Fork 370
Description
Description
While inspecting the block matrix solvers implementation, specifically the BlockMonolithicSolve subroutine in BlockSolve.F90, I noticed that loop-invariant expressions and static branch conditions are continuously evaluated inside the innermost CSR matrix assembly loop.
Problematic code location: fem/src/BlockSolve.F90 (inside SUBROUTINE BlockMonolithicSolve, specifically the complex/damped eigen system assembly section).
In the current implementation, the code iterates over the non-zero entries of the sparse submatrices. Inside the innermost DO j loop, it recalculates an offset expression and evaluates a condition based on the outer loop variable c:
elmerfem/fem/src/BlockSolve.F90
Lines 4793 to 4824 in 3592118
| DO NoCol = 1,NoVar | |
| A => TotMatrix % SubMatrix( NoRow, NoCol ) % Mat | |
| IF( .NOT. ASSOCIATED( A ) ) CYCLE | |
| IF( .NOT. ASSOCIATED( A % Rows ) ) CYCLE | |
| DO j=A % Rows(i),A % Rows(i+1)-1 | |
| ! If we have the imaginary row add the multiplier of imaginary value first | |
| IF( c == 2 ) THEN | |
| k = k + 1 | |
| CollMat % Cols(k) = 2*TotMatrix % Offset(NoCol) + 2*A % Cols(j)-1 | |
| IF( ASSOCIATED( A % DampValues) ) THEN | |
| CollMat % Values(k) = A % DampValues(j) | |
| END IF | |
| END IF | |
| k = k + 1 | |
| CollMat % Values(k) = A % Values(j) | |
| CollMat % Cols(k) = 2*TotMatrix % Offset(NoCol) + 2*(A % Cols(j)-1)+c | |
| IF( ASSOCIATED( A % MassValues ) ) THEN | |
| CollMat % MassValues(k) = A % MassValues(j) | |
| END IF | |
| ! If we have the real row add the multiplier of imaginary value last | |
| IF( c == 1 ) THEN | |
| k = k + 1 | |
| CollMat % Cols(k) = 2*TotMatrix % Offset(NoCol) + 2*A % Cols(j) | |
| IF( ASSOCIATED( A % DampValues) ) THEN | |
| CollMat % Values(k) = -A % DampValues(j) | |
| END IF | |
| END IF | |
| END DO | |
| END DO |
For any given execution of the DO j loop:
- The variables
NoColandcremain completely unchanged. - The expression
2 * TotMatrix % Offset(NoCol)is mathematically invariant.
Despite this, the current code repeatedly executes the integer multiplication, the memory lookup for the Offset array, and the IF(c == ...) evaluations for every single non-zero entry of the sparse matrix.
Impact
From a performance and maintainability perspective, this has several drawbacks:
-
Unnecessary repeated work: The same invariant integer arithmetic and memory access are rebuilt on every call, accumulating overhead linearly with the number of non-zero entries (
$N_{nz}$ ). -
Branch prediction penalties: Placing static conditions (
IF (c == 2)andIF (c == 1)) inside the innermost loop forces the CPU's branch predictor to evaluate them millions of times unnecessarily, potentially disrupting pipeline efficiency. -
Missed optimization opportunities: While modern Fortran compilers (via
-O2/-O3) might attempt Loop-Invariant Code Motion (LICM), complex pointer aliasing and derived type accesses in Fortran often prevent the compiler from safely unswitching loops or hoisting array lookups automatically.
Conceptually, the current logic is equivalent to asking the CPU to check "which part of the complex number am I building?" and calculating the "base offset" individually for millions of matrix entries, instead of deciding once and applying it to the whole block.
Suggested fix
DO NoCol = 1,NoVar
A => TotMatrix % SubMatrix( NoRow, NoCol ) % Mat
IF( .NOT. ASSOCIATED( A ) ) CYCLE
IF( .NOT. ASSOCIATED( A % Rows ) ) CYCLE
BLOCK
INTEGER :: BaseColOffset
! 1. Hoist loop-invariant computation
! 'NoCol' is invariant within this block, so the offset evaluation is constant.
BaseColOffset = 2 * TotMatrix % Offset(NoCol)
! 2. Loop unswitching
! Extract the static branch condition 'c' outside the innermost loop
! to avoid branch prediction penalties.
IF ( c == 2 ) THEN
DO j = A % Rows(i), A % Rows(i+1)-1
! If we have the imaginary row, add the multiplier of imaginary value first
k = k + 1
CollMat % Cols(k) = BaseColOffset + 2*A % Cols(j) - 1
IF( ASSOCIATED( A % DampValues) ) THEN
CollMat % Values(k) = A % DampValues(j)
END IF
! Add the primary matrix value
k = k + 1
CollMat % Values(k) = A % Values(j)
! Note: Since c == 2 in this branch, the constant 2 is explicitly embedded.
CollMat % Cols(k) = BaseColOffset + 2*(A % Cols(j)-1) + 2
IF( ASSOCIATED( A % MassValues ) ) THEN
CollMat % MassValues(k) = A % MassValues(j)
END IF
END DO
ELSE IF ( c == 1 ) THEN
DO j = A % Rows(i), A % Rows(i+1)-1
! Add the primary matrix value
k = k + 1
CollMat % Values(k) = A % Values(j)
! Note: Since c == 1 in this branch, the constant 1 is explicitly embedded.
CollMat % Cols(k) = BaseColOffset + 2*(A % Cols(j)-1) + 1
IF( ASSOCIATED( A % MassValues ) ) THEN
CollMat % MassValues(k) = A % MassValues(j)
END IF
! If we have the real row, add the multiplier of imaginary value last
k = k + 1
CollMat % Cols(k) = BaseColOffset + 2*A % Cols(j)
IF( ASSOCIATED( A % DampValues) ) THEN
CollMat % Values(k) = -A % DampValues(j)
END IF
END DO
END IF
END BLOCK
END DOThis refactoring explicitly reflects the mathematical invariance in the code, guaranteeing fewer CPU cycles per non-zero entry during monolithic assembly.