Skip to content

Commit 0fc6e8a

Browse files
committed
Merge remote-tracking branch 'ESCOMP/cam_development' into remove-messages
2 parents 57ca820 + 11d0035 commit 0fc6e8a

File tree

4 files changed

+391
-349
lines changed

4 files changed

+391
-349
lines changed

cime_config/config_pes.xml

Lines changed: 0 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -572,43 +572,6 @@
572572
</pes>
573573
</mach>
574574
</grid>
575-
<grid name="a%T31">
576-
<mach name="hera|sierra" >
577-
<pes pesize="any" compset="any">
578-
<comment>none</comment>
579-
<ntasks>
580-
<ntasks_atm>64</ntasks_atm>
581-
<ntasks_lnd>64</ntasks_lnd>
582-
<ntasks_rof>64</ntasks_rof>
583-
<ntasks_ice>64</ntasks_ice>
584-
<ntasks_ocn>64</ntasks_ocn>
585-
<ntasks_glc>64</ntasks_glc>
586-
<ntasks_wav>64</ntasks_wav>
587-
<ntasks_cpl>64</ntasks_cpl>
588-
</ntasks>
589-
<nthrds>
590-
<nthrds_atm>1</nthrds_atm>
591-
<nthrds_lnd>1</nthrds_lnd>
592-
<nthrds_rof>1</nthrds_rof>
593-
<nthrds_ice>1</nthrds_ice>
594-
<nthrds_ocn>1</nthrds_ocn>
595-
<nthrds_glc>1</nthrds_glc>
596-
<nthrds_wav>1</nthrds_wav>
597-
<nthrds_cpl>1</nthrds_cpl>
598-
</nthrds>
599-
<rootpe>
600-
<rootpe_atm>0</rootpe_atm>
601-
<rootpe_lnd>0</rootpe_lnd>
602-
<rootpe_rof>0</rootpe_rof>
603-
<rootpe_ice>0</rootpe_ice>
604-
<rootpe_ocn>0</rootpe_ocn>
605-
<rootpe_glc>0</rootpe_glc>
606-
<rootpe_wav>0</rootpe_wav>
607-
<rootpe_cpl>0</rootpe_cpl>
608-
</rootpe>
609-
</pes>
610-
</mach>
611-
</grid>
612575
<grid name="a%1.9x2.5">
613576
<mach name="derecho">
614577
<pes pesize='any' compset='any'>

doc/ChangeLog

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,86 @@
1+
===============================================================
2+
3+
Tag name: cam6_4_131
4+
Originator(s): johnmauff, pel, cacraig, nusbaume
5+
Date: Nov 26, 2025
6+
One-line Summary: Performance improvements for CSLAM
7+
Github PR URL: https://github.com/ESCOMP/CAM/pull/1365
8+
9+
Purpose of changes (include the issue number and title text for each relevant GitHub issue):
10+
Excessive data movement in extend_panel_interpolate (CSLAM): https://github.com/ESCOMP/CAM/issues/1360
11+
12+
The subroutine extend_panel_interpolate is written such that the compiler will generate more data movement than is necessary.
13+
This excessive data movement intensifies a computational load imbalance in the CSLAM advection. While it is impossible to eliminate
14+
the load imbalance that is caused by the special treatment of panels at the corners of the cubed sphere, it is possible to reduce
15+
the cost of this subroutine by changing the way that the subroutine is written.
16+
17+
Describe any changes made to build system: N/A
18+
19+
Describe any changes made to the namelist: N/A
20+
21+
List any changes to the defaults for the boundary datasets: N/A
22+
23+
Describe any substantial timing or memory changes: N/A
24+
25+
Code reviewed by: nusbaume
26+
27+
List all files eliminated: N/A
28+
29+
List all files added and what they do: N/A
30+
31+
List all existing files that have been modified, and describe the changes:
32+
33+
M cime_config/config_pes.xml
34+
- remove dead config code originally used by Eulerian dycore
35+
36+
M src/dynamics/se/dycore/fvm_consistent_se_cslam.F90
37+
M src/dynamics/se/dycore/fvm_reconstruction_mod.F90
38+
- mods for SE dycore as described above
39+
40+
If there were any failures reported from running test_driver.sh on any test
41+
platform, and checkin with these failures has been OK'd by the gatekeeper,
42+
then copy the lines from the td.*.status files for the failed tests to the
43+
appropriate machine below. All failed tests must be justified.
44+
45+
derecho/intel/aux_cam:
46+
SMS_D_Ln9_P1536x1.ne0CONUSne30x8_ne0CONUSne30x8_mt12.FCHIST.derecho_intel.cam-outfrq9s (Overall: FAIL) details:
47+
- intermittent failure in CTSM code (lnd_set_decomp_and_domain.F90)
48+
49+
ERC_D_Ln9.ne30pg2_ne30pg2_mt232.QPC7.derecho_intel.cam-outfrq9s (Overall: DIFF) details:
50+
ERC_D_Ln9.ne30pg3_ne30pg3_mt232.F1850C_LTso.derecho_intel.cam-outfrq9s (Overall: DIFF) details:
51+
ERI_D_Ln18.ne16pg3_ne16pg3_mt232.FHIST_C4.derecho_intel.cam-outfrq3s_eri (Overall: DIFF) details:
52+
ERI_D_Ln18.ne30pg3_ne30pg3_mt232.FHISTC_LTso.derecho_intel.cam-outfrq3s_eri (Overall: DIFF) details:
53+
ERP_D_Ln9.ne30pg3_ne30pg3_mt232.F1850C_MTso.derecho_intel.cam-outfrq9s (Overall: DIFF) details:
54+
ERP_Ld3.ne16pg3_ne16pg3_mg17.FHISTC_WAt1ma.derecho_intel.cam-reduced_hist1d (Overall: DIFF) details:
55+
ERP_Ld3.ne30pg3_ne30pg3_mt232.FHISTC_MTt4s.derecho_intel.cam-outfrq1d_aoa (Overall: DIFF) details:
56+
ERP_Ln9.ne30pg3_ne30pg3_mg17.FCnudged.derecho_intel.cam-outfrq9s (Overall: DIFF) details:
57+
ERP_Ln9.ne30pg3_ne30pg3_mg17.FHISTC_WAma.derecho_intel.cam-outfrq9s (Overall: DIFF) details:
58+
ERR_Ln9.ne16pg3_ne16pg3_mt232.FHISTC_LTso.derecho_intel.cam-outfrq9s_bwic (Overall: DIFF) details:
59+
ERS_Ln9.ne30pg3_ne30pg3_mg17.FHISTC_WXma.derecho_intel.cam-outfrq9s_ctem (Overall: DIFF) details:
60+
SMS_C2_D_Ln9.ne16pg3_ne16pg3_mg17.FHISTC_WXma.derecho_intel.cam-outfrq9s (Overall: DIFF) details:
61+
SMS_D_Ln9.ne30pg3_ne30pg3_mt232.FHISTC_MTso.derecho_intel.cam-outfrq9s (Overall: DIFF) details:
62+
SMS_D_Ln9_P1280x1.ne30pg3_ne30pg3_mt232.FHISTC_MTt1s.derecho_intel.cam-outfrq9s_Leung_dust (Overall: DIFF) details:
63+
SMS_Ld1.ne30pg3_ne30pg3_mg17.FC2010climo.derecho_intel.cam-outfrq1d (Overall: DIFF) details:
64+
SMS_Ln9.ne30pg3_ne30pg3_mg17.FW2000climo.derecho_intel.cam-outfrq9s_rrtmgp (Overall: DIFF) details:
65+
- answer differences for CSLAM runs
66+
67+
derecho/nvhpc/aux_cam:
68+
ERS_Ln9.ne30pg3_ne30pg3_mt232.FHISTC_LTso.derecho_nvhpc.cam-outfrq9s_gpu_default (Overall: FAIL) details:
69+
- timing issue - Jian has determined this is due to changes on derecho with the the last upgrade. He has
70+
reported the issue to CISL. Note cam6_4_128 is the last CAM tag with baselines to use for comparison
71+
but answer changes are expected starting with cam6_4_130
72+
73+
izumi/nag/aux_cam:
74+
ERC_D_Ln27.ne3pg3_ne3pg3_mt232.FKESSLER.izumi_nag.cam-outfrq9s (Overall: DIFF) details:
75+
ERC_D_Ln9.ne3pg3_ne3pg3_mt232.FHISTC_LTso.izumi_nag.cam-cosp_rad_diags (Overall: DIFF) details:
76+
SMS_D_Ln3.ne5pg3_ne5pg3_mg37.QPX2000.izumi_nag.cam-outfrq3s (Overall: DIFF) details:
77+
- answer differences for CSLAM runs
78+
79+
80+
izumi/gnu/aux_cam: all BFB
81+
82+
Summarize any changes to answers:
83+
Answer changing, bug not climate changing as reported by pel
184

285
===============================================================
386

src/dynamics/se/dycore/fvm_consistent_se_cslam.F90

Lines changed: 29 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
#define FVM_TIMERS .FALSE.
12
module fvm_consistent_se_cslam
23
use shr_kind_mod, only: r8=>shr_kind_r8
34
use dimensions_mod, only: nc, nhe, nlev, ntrac, np, nhr, nhc, ngpc, ns, nht
@@ -107,7 +108,7 @@ subroutine run_consistent_se_cslam(elem,fvm,hybrid,dt_fvm,tl,nets,nete,hvcoord,&
107108
endif
108109

109110
kblk = kmax-kmin+1
110-
!call t_startf('fvm:before_Qnhc')
111+
if(FVM_TIMERS) call t_startf('fvm:before_Qnhc')
111112
do ie=nets,nete
112113
do k=kmin,kmax
113114
elem(ie)%sub_elem_mass_flux(:,:,:,k) = dt_fvm*elem(ie)%sub_elem_mass_flux(:,:,:,k)*fvm(ie)%dp_ref_inverse(k)
@@ -120,11 +121,11 @@ subroutine run_consistent_se_cslam(elem,fvm,hybrid,dt_fvm,tl,nets,nete,hvcoord,&
120121
call ghostpack(ghostbufQnhc,fvm(ie)%c(1-nhc:nc+nhc,1-nhc:nc+nhc,kmin:kmax,q),kblk,kptr,ie)
121122
enddo
122123
end do
123-
!call t_stopf('fvm:before_Qnhc')
124-
!call t_startf('fvm:ghost_exchange:Qnhc')
124+
if(FVM_TIMERS) call t_stopf('fvm:before_Qnhc')
125+
if(FVM_TIMERS) call t_startf('fvm:ghost_exchange:Qnhc')
125126
call ghost_exchange(hybridnew,ghostbufQnhc,location='ghostbufQnhc')
126-
!call t_stopf('fvm:ghost_exchange:Qnhc')
127-
!call t_startf('fvm:orthogonal_swept_areas')
127+
if(FVM_TIMERS) call t_stopf('fvm:ghost_exchange:Qnhc')
128+
if(FVM_TIMERS) call t_startf('fvm:orthogonal_swept_areas')
128129
do ie=nets,nete
129130
do k=kmin,kmax
130131
fvm(ie)%se_flux (1:nc,1:nc,:,k) = elem(ie)%sub_elem_mass_flux(:,:,:,k)
@@ -152,14 +153,14 @@ subroutine run_consistent_se_cslam(elem,fvm,hybrid,dt_fvm,tl,nets,nete,hvcoord,&
152153
end do
153154
enddo
154155

155-
!call t_stopf('fvm:orthogonal_swept_areas')
156+
if(FVM_TIMERS) call t_stopf('fvm:orthogonal_swept_areas')
156157
do ie=nets,nete
157158
! Intel compiler version 2023.0.0 on derecho had significant slowdown on subroutine interface without
158159
! these pointers.
159160
fcube => fvm(ie)%c(:,:,:,:)
160161
spherecentroid => fvm(ie)%spherecentroid(:,1-nhe:nc+nhe,1-nhe:nc+nhe)
161162
do k=kmin,kmax
162-
!call t_startf('FVM:tracers_reconstruct')
163+
if(FVM_TIMERS) call t_startf('FVM:tracers_reconstruct')
163164
call reconstruction(fcube,nlev,k,&
164165
ctracer(:,:,:,:),irecons_tracer,llimiter,ntrac,&
165166
nc,nhe,nhr,nhc,nht,ns,nhr+(nhe-1),&
@@ -170,10 +171,10 @@ subroutine run_consistent_se_cslam(elem,fvm,hybrid,dt_fvm,tl,nets,nete,hvcoord,&
170171
fvm(ie)%rot_matrix,fvm(ie)%centroid_stretch,&
171172
fvm(ie)%vertex_recons_weights,fvm(ie)%vtx_cart,&
172173
irecons_tracer_lev(k))
173-
!call t_stopf('FVM:tracers_reconstruct')
174-
!call t_startf('fvm:swept_flux')
174+
if(FVM_TIMERS) call t_stopf('FVM:tracers_reconstruct')
175+
if(FVM_TIMERS) call t_startf('fvm:swept_flux')
175176
call swept_flux(elem(ie),fvm(ie),k,ctracer,irecons_tracer_lev(k),gsweights,gspts)
176-
!call t_stopf('fvm:swept_flux')
177+
if(FVM_TIMERS) call t_stopf('fvm:swept_flux')
177178
end do
178179
end do
179180
!
@@ -193,7 +194,7 @@ subroutine run_consistent_se_cslam(elem,fvm,hybrid,dt_fvm,tl,nets,nete,hvcoord,&
193194
!
194195
!
195196
if (large_Courant_incr) then
196-
!call t_startf('fvm:fill_halo_fvm:large_Courant')
197+
if(FVM_TIMERS) call t_startf('fvm:fill_halo_fvm:large_Courant')
197198
!if (kmin_jet<kmin.or.kmax_jet>kmax) then
198199
! call endrun('ERROR: kmax_jet must be .le. kmax passed to run_consistent_se_cslam')
199200
!end if
@@ -203,19 +204,19 @@ subroutine run_consistent_se_cslam(elem,fvm,hybrid,dt_fvm,tl,nets,nete,hvcoord,&
203204
kmax_jet_local = min(kmax_jet,kmax)
204205
klev = kmax_jet-kmin_jet+1
205206
call fill_halo_fvm(ghostbufQ1,elem,fvm,hybridnew,nets,nete,1,kmin_jet_local,kmax_jet_local,klev,active=ActiveJetThread)
206-
!call t_stopf('fvm:fill_halo_fvm:large_Courant')
207-
!call t_startf('fvm:large_Courant_number_increment')
207+
if(FVM_TIMERS) call t_stopf('fvm:fill_halo_fvm:large_Courant')
208+
if(FVM_TIMERS) call t_startf('fvm:large_Courant_number_increment')
208209
if(ActiveJetThread) then
209210
do k=kmin_jet_local,kmax_jet_local !1,nlev
210211
do ie=nets,nete
211212
call large_courant_number_increment(fvm(ie),k)
212213
end do
213214
end do
214215
endif
215-
!call t_stopf('fvm:large_Courant_number_increment')
216+
if(FVM_TIMERS) call t_stopf('fvm:large_Courant_number_increment')
216217
end if
217218

218-
!call t_startf('fvm:end_of_reconstruct_subroutine')
219+
if(FVM_TIMERS) call t_startf('fvm:end_of_reconstruct_subroutine')
219220
do k=kmin,kmax
220221
!
221222
! convert to mixing ratio
@@ -251,7 +252,7 @@ subroutine run_consistent_se_cslam(elem,fvm,hybrid,dt_fvm,tl,nets,nete,hvcoord,&
251252
elem(ie)%sub_elem_mass_flux(:,:,:,k)=0
252253
end do
253254
end do
254-
!call t_stopf('fvm:end_of_reconstruct_subroutine')
255+
if(FVM_TIMERS) call t_stopf('fvm:end_of_reconstruct_subroutine')
255256
!$OMP END PARALLEL
256257
call omp_set_nested(.false.)
257258
end subroutine run_consistent_se_cslam
@@ -281,7 +282,7 @@ subroutine swept_flux(elem,fvm,ilev,ctracer,irecons_tracer_actual,gsweights,gspt
281282
REAL(KIND=r8), dimension(2,8) :: x_start, dgam_vec
282283
REAL(KIND=r8) :: gamma_max, displ_first_guess
283284

284-
REAL(KIND=r8) :: flux,flux_tracer(ntrac)
285+
REAL(KIND=r8) :: flux,flux_tracer(ntrac),w
285286

286287
REAL(KIND=r8), dimension(num_area) :: dp_area
287288

@@ -306,7 +307,6 @@ subroutine swept_flux(elem,fvm,ilev,ctracer,irecons_tracer_actual,gsweights,gspt
306307
!
307308
! prepare for air/tracer update
308309
!
309-
! dp = fvm%dp_fvm(1-nhe:nc+nhe,1-nhe:nc+nhe,ilev)
310310
dp = fvm%dp_fvm(1-nhc:nc+nhc,1-nhc:nc+nhc,ilev)
311311
fvm%dp_fvm(1:nc,1:nc,ilev) = fvm%dp_fvm(1:nc,1:nc,ilev)*fvm%area_sphere
312312
do itr=1,ntrac
@@ -538,14 +538,14 @@ subroutine swept_flux(elem,fvm,ilev,ctracer,irecons_tracer_actual,gsweights,gspt
538538
!
539539
! iterate to get flux area
540540
!
541-
!call t_startf('fvm:swept_area:get_gamma')
541+
if(FVM_TIMERS) call t_startf('fvm:swept_area:get_gamma')
542542
do iarea=1,num_area
543543
dp_area(iarea) = dp(idx(1,iarea,i,j,iside),idx(2,iarea,i,j,iside))
544544
end do
545545
call get_flux_segments_area_iterate(x,x_static,dx_static,dx,x_start,dgam_vec,num_seg,num_seg_static,&
546546
num_seg_max,num_area,dp_area,flowcase,gamma,mass_flux_se(i,j,iside),0.0_r8,gamma_max, &
547547
gsweights,gspts,ilev)
548-
!call t_stopf('fvm:swept_area:get_gamma')
548+
if(FVM_TIMERS) call t_stopf('fvm:swept_area:get_gamma')
549549
!
550550
! pack segments for high-order weights computation
551551
!
@@ -560,27 +560,28 @@ subroutine swept_flux(elem,fvm,ilev,ctracer,irecons_tracer_actual,gsweights,gspt
560560
!
561561
! compute higher-order weights
562562
!
563-
!call t_startf('fvm:swept_area:get_high_order_w')
563+
if(FVM_TIMERS) call t_startf('fvm:swept_area:get_high_order_w')
564564
call get_high_order_weights_over_areas(x,dx,num_seg,num_seg_max,num_area,weights,ngpc,&
565565
gsweights, gspts,irecons_tracer)
566-
!call t_stopf('fvm:swept_area:get_high_order_w')
566+
if(FVM_TIMERS) call t_stopf('fvm:swept_area:get_high_order_w')
567567
!
568568
!**************************************************
569569
!
570570
! remap air and tracers
571571
!
572572
!**************************************************
573573
!
574-
!call t_startf('fvm:swept_area:remap')
574+
if(FVM_TIMERS) call t_startf('fvm:swept_area:remap')
575575
flux=0.0_r8; flux_tracer=0.0_r8
576576
do iarea=1,num_area
577577
if (num_seg(iarea)>0) then
578578
ii=idx(1,iarea,i,j,iside); jj=idx(2,iarea,i,j,iside)
579579
flux=flux+weights(1,iarea)*dp(ii,jj)
580-
do itr=1,ntrac
581-
do iw=1,irecons_tracer_actual
582-
flux_tracer(itr) = flux_tracer(itr)+weights(iw,iarea)*ctracer(iw,ii,jj,itr)
583-
end do
580+
do iw=1,irecons_tracer_actual
581+
w = weights(iw,iarea)
582+
do itr=1,ntrac
583+
flux_tracer(itr) = flux_tracer(itr)+w*ctracer(iw,ii,jj,itr)
584+
end do
584585
end do
585586
end if
586587
end do
@@ -614,7 +615,7 @@ subroutine swept_flux(elem,fvm,ilev,ctracer,irecons_tracer_actual,gsweights,gspt
614615
fvm%dp_fvm(i-1,j,ilev ) = fvm%dp_fvm(i-1,j,ilev )+flux
615616
fvm% c(i-1,j,ilev,1:ntrac) = fvm% c(i-1,j,ilev,1:ntrac)+flux_tracer(1:ntrac)
616617
end if
617-
!call t_stopf('fvm:swept_area:remap')
618+
if(FVM_TIMERS) call t_stopf('fvm:swept_area:remap')
618619
end if
619620
end do
620621
end do

0 commit comments

Comments
 (0)