Skip to content

Conversation

@edoyango
Copy link

@edoyango edoyango commented Jul 9, 2025

Hi @claireyung,

This is a pull request which tracks some of the config improvements that you could leverage (on top of the improvements to the exe done here). I'll populate this description as results are found.

Change Result (SUs) Comment
na 2937.88 (best of 1) Base case - only change is using fastest binary from my PR
Choose core count for MOM6 for better IO_LAYOUT 2807.54 (best of 1) Minghang pointed out that IO was being serialized onto a single core. This required guessing an appropriate core count. NB This could be faster if PARALLEL_RESTART = True, but that doesn't work with payu Minghang's looking at it. Is answer changing
Using Minghang's updated nuopc.runseq 2576.43 (best of 1) Huge improvement!
Optimise CICE block sizes notyet doubling block size ([x, y] = [60, 54]) SU seemed to increase a tiny bit, but reported CICE time (from ice.log says CICE walltime went down by 5% to 478.34). For more ice scenarios, having larger block sizes seemed to improve performance significantly. See zulip comment for more details.
Try CICE sectrobin distribution 2603.24 (best of 1) CICE docs says distribution_type=sectrobin is a bit better for PE layout in terms of neighbours/communication, but a bit worse for load balancing. This reduced CICE time (as per ice.log to 449.34s walltime).
Optimise CICE/nuopc cores TODO Waiting on Minghang's cool tool to see how much time each om3 component is waiting for eachother.

@edoyango edoyango force-pushed the cpu-ratio-1428ocn branch from 9a0f58d to a023cb1 Compare July 9, 2025 07:44
@claireyung
Copy link
Owner

Thanks so much @edoyango this is awesome!

Since I needed to modify my run a little after 8 years due to a salt restoring file bug I found, I decided to swap to SR with your current optimisation for the last 2 years of my spinup. I merged your commits in this PR into my spinup config (which I'd previously done on cascade lake) in this branch https://github.com/claireyung/access-om3-configs/tree/8km_jra_ryf_obc2-sapphirerapid-Charrassin-newparams-rerun-Wright-spinup-accessom2IC-yr9

The cost before (cascade lake, Helen's executable, DT=600) was 10600 SU/month and the new sim (sapphire rapid pr113-27, DT=600, Ed's improvements) is 8700 SU/month^ which is a great improvement! Thank you so much!!

Is this the kind of speed that was expected? Note 2603/10days x 31days ~ 8100 SU *

Naively, I guess the netcdf files are bigger at the end when you run for a month vs 10 days which maybe slows down the final steps of the model and makes it slightly more than the raw scaling?

*this is not really a fair comparison, because the config I gave you has DT 450 and I bumped mine up to DT 600, adding in 3 cice dynamic timesteps/mom timestep with ndtd = 3. So actually I'd expect my config with fewer ocean steps to be faster, unless CICE is now being waited for.... (I did get a significant speed up on Cascade Lake from DT 600 to DT 450 + ndtd=3, but they had a different PE_LAYOUT). I guess Minghang's tool will reveal all?

^all quoted numbers compare Januarys, but I haven't looked at the range

@edoyango
Copy link
Author

Is this the kind of speed that was expected? Note 2603/10days x 31days ~ 8100 SU *

Naively, I guess the netcdf files are bigger at the end when you run for a month vs 10 days which maybe slows down the final steps of the model and makes it slightly more than the raw scaling?

Yes I think this is roughly what I would expect. Some of the changes, especially the IO_layout improvements, primarily affect the final dumping of restart files (which take a lot of time). Since your runs are much longer, you won't see as much of a benefit from those.

*this is not really a fair comparison, because the config I gave you has DT 450 and I bumped mine up to DT 600, adding in 3 cice dynamic timesteps/mom timestep with ndtd = 3. So actually I'd expect my config with fewer ocean steps to be faster, unless CICE is now being waited for.... (I did get a significant speed up on Cascade Lake from DT 600 to DT 450 + ndtd=3, but they had a different PE_LAYOUT). I guess Minghang's tool will reveal all?

This is an interesting change in the config! This will probably affect who's waiting for who. I'll do more testing - the PE_LAYOUT from before was assuming MOM was the bottleneck, but now that CICE has more dynamic steps, giving more cores to CICE might be beneficial?

@edoyango
Copy link
Author

edoyango commented Jul 23, 2025

Hi @claireyung, just following up on this

This is an interesting change in the config! This will probably affect who's waiting for who. I'll do more testing - the PE_LAYOUT from before was assuming MOM was the bottleneck, but now that CICE has more dynamic steps, giving more cores to CICE might be beneficial?

Just to say that I couldn't get a conclusive answer on this - increasing/decreasing cores assigned to MOM didn't seem to change much.

@claireyung
Copy link
Owner

Hey @edoyango, thanks for looking at it and for the update! I guess maybe this means we are approaching an optimum model cost...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants