Skip to content

Conversation

@kaanolgu
Copy link
Collaborator

@kaanolgu kaanolgu commented Nov 26, 2025

During the TGV case run I tested for 1000 iterations and AMD uProfiler output showed 10 hottest functions and it turns out to be the reorder_omp is the second highest time consuming function which repeatedly calls the module m_ordering. Before diving deep with algorithmic optimisations I wanted to try something with slight modifications. I noticed that it uses select case for decisions instead of if else. This is not ideal for the compiler. So I modified them to be if/else statement.

Disclaimer would be this change is not necessary for other modules because they are not the first or second hottest functions.

The 1000 TGV run

Before:

 solver instantiated
 initial conditions
 time =   0.0000000000000000      iteration =           0
 enstrophy:  0.37500056314210506     
 div u max mean:   7.4241182236900240E-014   4.6425442329974923E-015
 start run
 time =  0.10000000000000001      iteration =         100
 enstrophy:  0.37525051849886970     
 div u max mean:   1.2185232149686254E-007   3.5138289281208653E-009
 time =  0.20000000000000001      iteration =         200
 enstrophy:  0.37628353027785910     
 div u max mean:   1.2342342865789835E-007   3.5031468385433071E-009
 time =  0.29999999999999999      iteration =         300
 enstrophy:  0.37810777470606399     
 div u max mean:   1.2608932925539662E-007   3.4874846289820097E-009
 time =  0.40000000000000002      iteration =         400
 enstrophy:  0.38073816178863462     
 div u max mean:   1.2987100272976448E-007   3.4683619522790331E-009
 time =  0.50000000000000000      iteration =         500
 enstrophy:  0.38419607045527204     
 div u max mean:   1.3479555077688943E-007   3.4475698888332965E-009
 time =  0.59999999999999998      iteration =         600
 enstrophy:  0.38850900827352475     
 div u max mean:   1.4089259348093464E-007   3.4270327631455799E-009
 time =  0.70000000000000007      iteration =         700
 enstrophy:  0.39371024738962596     
 div u max mean:   1.4818927601689680E-007   3.4086645130049231E-009
 time =  0.80000000000000004      iteration =         800
 enstrophy:  0.39983846518902949     
 div u max mean:   1.5670384551080829E-007   3.3943682946129801E-009
 time =  0.90000000000000002      iteration =         900
 enstrophy:  0.40693741438884812     
 div u max mean:   1.6643768474544629E-007   3.3860065608543613E-009
 time =   1.0000000000000000      iteration =        1000
 enstrophy:  0.41505564012994600     
 div u max mean:   1.7736599844386802E-007   3.3851674985669417E-009
 run end
 Time:    575.58670499999994 

After:

solver instantiated
initial conditions
time =   0.0000000000000000      iteration =           0
enstrophy:  0.37500056314210506     
div u max mean:   7.4241182236900240E-014   4.6425442329974923E-015
start run
time =  0.10000000000000001      iteration =         100
enstrophy:  0.37525051849886970     
div u max mean:   1.2185232149686254E-007   3.5138289281208653E-009
time =  0.20000000000000001      iteration =         200
enstrophy:  0.37628353027785910     
div u max mean:   1.2342342865789835E-007   3.5031468385433071E-009
time =  0.29999999999999999      iteration =         300
enstrophy:  0.37810777470606399     
div u max mean:   1.2608932925539662E-007   3.4874846289820097E-009
time =  0.40000000000000002      iteration =         400
enstrophy:  0.38073816178863462     
div u max mean:   1.2987100272976448E-007   3.4683619522790331E-009
time =  0.50000000000000000      iteration =         500
enstrophy:  0.38419607045527204     
div u max mean:   1.3479555077688943E-007   3.4475698888332965E-009
time =  0.59999999999999998      iteration =         600
enstrophy:  0.38850900827352475     
div u max mean:   1.4089259348093464E-007   3.4270327631455799E-009
time =  0.70000000000000007      iteration =         700
enstrophy:  0.39371024738962596     
div u max mean:   1.4818927601689680E-007   3.4086645130049231E-009
time =  0.80000000000000004      iteration =         800
enstrophy:  0.39983846518902949     
div u max mean:   1.5670384551080829E-007   3.3943682946129801E-009
time =  0.90000000000000002      iteration =         900
enstrophy:  0.40693741438884812     
div u max mean:   1.6643768474544629E-007   3.3860065608543613E-009
time =   1.0000000000000000      iteration =        1000
enstrophy:  0.41505564012994600     
div u max mean:   1.7736599844386802E-007   3.3851674985669417E-009
run end
Time:    564.87784799999997 

Overall, for 20000 case I would expect at least ~3 minute improvement with execution time which is not massive but good small step forward

Compiled with GNU 13.3.0 and OpenMPI 5.0.5, Using 24 MPI Rank and OMP_NUM_THREADS 1 with AMD EPYC 7443 CPU  which is full socket run

@kaanolgu kaanolgu added omp Related to openMP backend performance labels Nov 26, 2025
@pbartholomew08
Copy link
Member

Can you remove, rather than comment, the old code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

omp Related to openMP backend performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants