Skip to content

Conversation

@minghangli-uni
Copy link
Collaborator

@minghangli-uni minghangli-uni commented Sep 8, 2025

Same flags as in #82 (comment)


🚀 The latest prerelease access-om3/pr148-18 at 927e963 is here: #148 (comment) 🚀

@github-actions
Copy link

github-actions bot commented Sep 8, 2025

🚀 Attempted to deploy access-om3 Prerelease pr148-1 with commit 241b5bd

🖥️ Gadi Deployment ✔️

Usage Instructions

access-om3, defined in ./spack.yaml, will be deployed to Gadi as:

  • 2025.08.001 as a Release (when merged).
  • pr148-1 as a Prerelease (during this PR).

This Prerelease is accessible on Gadi using:

module use /g/data/vk83/prerelease/modules
module load access-om3/pr148-1

When using the above modules, the binaries shall be on your $PATH.

For advanced users, this Prerelease is also accessible on Gadi via /g/data/vk83/prerelease/apps/spack/0.22/spack in the access-om3-pr148-1 environment.
Due to inode-saving measures, one will have to manually untar the environment metadata before environment activation with tar -xf .spack-env .spack-env.tar. It will require one to have write privileges.

Configuration Information

This Prerelease is deployed using:

If the above was not what was expected, commit changes to config/versions.json in this PR.

@minghangli-uni
Copy link
Collaborator Author

It takes ~2 hours to build the code... I don't think this is what we expect? Or there is something I'm missing?

@anton-seaice
Copy link
Collaborator

It’s possibly just a transient Gadi thing ? That is a bit long , I don’t think it’s going to be related to changing the compiler flags.

@github-actions
Copy link

🚀 Attempted to deploy access-om3 Prerelease pr148-2 with commit c8adb30

🖥️ Gadi Deployment ❌

@minghangli-uni
Copy link
Collaborator Author

!redeploy

@minghangli-uni
Copy link
Collaborator Author

I thought the long esmf build issue had already been resolved - is it still a problem, or is there a workaround to speed it up?

@github-actions
Copy link

🚀 Attempted to deploy access-om3 Prerelease pr148-3 with commit c8adb30

🖥️ Gadi Deployment ✔️

Usage Instructions

access-om3, defined in ./spack.yaml, will be deployed to Gadi as:

  • 2025.08.001 as a Release (when merged).
  • pr148-3 as a Prerelease (during this PR).

This Prerelease is accessible on Gadi using:

module use /g/data/vk83/prerelease/modules
module load access-om3/pr148-3

When using the above modules, the binaries shall be on your $PATH.

For advanced users, this Prerelease is also accessible on Gadi via /g/data/vk83/prerelease/apps/spack/0.22/spack in the access-om3-pr148-3 environment.
Due to inode-saving measures, one will have to manually untar the environment metadata before environment activation with tar -xf .spack-env .spack-env.tar. It will require one to have write privileges.

Configuration Information

This Prerelease is deployed using:

If the above was not what was expected, commit changes to config/versions.json in this PR.

@github-actions
Copy link

🚀 Attempted to deploy access-om3 Prerelease pr148-4 with commit 2c12228

🖥️ Gadi Deployment ❌

@minghangli-uni
Copy link
Collaborator Author

!redeploy

@github-actions
Copy link

🚀 Attempted to deploy access-om3 Prerelease pr148-5 with commit 2c12228

🖥️ Gadi Deployment ❌

@github-actions
Copy link

🚀 Attempted to deploy access-om3 Prerelease pr148-6 with commit 07df87a

🖥️ Gadi Deployment ❌

@minghangli-uni
Copy link
Collaborator Author

!redeploy

@github-actions
Copy link

🚀 Attempted to deploy access-om3 Prerelease pr148-7 with commit 07df87a

🖥️ Gadi Deployment ✔️

Usage Instructions

access-om3, defined in ./spack.yaml, will be deployed to Gadi as:

  • 2025.08.001 as a Release (when merged).
  • pr148-7 as a Prerelease (during this PR).

This Prerelease is accessible on Gadi using:

module use /g/data/vk83/prerelease/modules
module load access-om3/pr148-7

When using the above modules, the binaries shall be on your $PATH.

For advanced users, this Prerelease is also accessible on Gadi via /g/data/vk83/prerelease/apps/spack/0.22/spack in the access-om3-pr148-7 environment.
Due to inode-saving measures, one will have to manually untar the environment metadata before environment activation with tar -xf .spack-env .spack-env.tar. It will require one to have write privileges.

Configuration Information

This Prerelease is deployed using:

If the above was not what was expected, commit changes to config/versions.json in this PR.

@minghangli-uni
Copy link
Collaborator Author

minghangli-uni commented Oct 8, 2025

With the help of https://github.com/ACCESS-NRI/esmf-trace, I was able to extract timestep-level performance data, rather than relying on coarse wall-clock runtime. This allows each test case to be much shorter (only 2 days of simulation for below tests), while still producing reliable performance statistics.

I tested three compiler configurations, each repeated five times (repeat in config.yaml).
Each 2-day run produces 192 timesteps -> 192 * 5 samples per setup.

  1. compiler_flag_1 - existing
  2. compiler_flag_2 - pr148-1
  3. compiler_flag_3 - pr148-7

In each run, the first timestep appears very very fast (not shown in below plot). I’m not entirely certain of the cause, but likely reflects nuopc bookkeeping or something I am not sure (happy to take comments from others on this).

The second timestep is then the first timestep where the full dynamics is active and it is consistently the slowest “real” step. This is supported where fms timing in access-om3.out only reports 191 timesteps, so implying that the model does not treat the first timestep as part of the main timing window.

Similarly, the final timestep is noticeably slower due to I/O things.

image

So to avoid biasing the statistics with these startup/shutdown artifacts, I exclude steps 1–2 and the final timestep, leaving 189 consistent timesteps per run for the statistical comparison.

image

From the box plot, compiler_flag_2 consistently achieves the lowest mean and median timestep runtime while also showing a tighter variance and no strong outliers, in contrast to the other two compiler setups.

@minghangli-uni
Copy link
Collaborator Author

minghangli-uni commented Oct 8, 2025

Below shows the setup for the three configurations using experiment-generator

model_type: access-om3
repository_url: [email protected]:ACCESS-NRI/access-om3-configs.git
start_point: "7851c1e" # Control commit hash for new branches
test_path: "." # All control and perturbation experiment repositories will be created here; can be relative, absolute or ~ (user-defined)
repository_directory: mom6-cice6-compiler_flags_25km_ryf # Local directory name for the central repository (user-defined)
control_branch_name: ctrl
Control_Experiment:

Perturbation_Experiment:
  Parameter_block_compiler_flags_2days_from_scratch:
    branches:
      - "compiler_flag_1" # default
      - "compiler_flag_2" # pr148-1
      - "compiler_flag_3" # pr148-7

    config.yaml:
      manifest:
        reproduce:
          exe: False

      modules:
        use:
          - - PRESERVE
          - - /g/data/vk83/prerelease/modules
            - /g/data/vk83/modules 
          - - /g/data/vk83/prerelease/modules
            - /g/data/vk83/modules
        load:
          - - PRESERVE
          - - access-om3/pr148-1 # not adding model-tools/mppnccombine-fast because the positional indexing
          - - access-om3/pr148-7 # not adding model-tools/mppnccombine-fast because the positional indexing

      env:
        ESMF_RUNTIME_PROFILE: "on"
        ESMF_RUNTIME_TRACE: "on"
        ESMF_RUNTIME_TRACE_PETLIST: "0 1 291-292 1975"
        ESMF_RUNTIME_PROFILE_OUTPUT: "SUMMARY"

      repeat: True

    nuopc.runconfig:
      CLOCK_attributes:
        restart_n: 2
        restart_option: ndays
        stop_n: 2
        stop_option: ndays

And this is the yaml plan for the experiment-runner

test_path: /g/data/tm70/ml0072/COMMON/git_repos/access-experiment-generator/performance_runnings_ncmas/om3 # All control and perturbation experiment repositories.
repository_directory: mom6-cice6-compiler_flags_25km_ryf # Local directory name for the central repository, where the running_branches are forked from.
keep_uuid: True

running_branches: # List of experiment branches to run.
  - compiler_flag_1 # default
  - compiler_flag_2 # pr148-1
  - compiler_flag_3 # pr148-7

nruns: # Number of runs for each branch; must match the order of running_branches.
  - 5
  - 5
  - 5

startfrom_restart:
  - cold
  - cold
  - cold

@github-actions
Copy link

github-actions bot commented Oct 8, 2025

🚀 Attempted to deploy access-om3 Prerelease pr148-8 with commit 97ae9f0

🖥️ Gadi Deployment ❌

@minghangli-uni
Copy link
Collaborator Author

!redeploy

@github-actions
Copy link

github-actions bot commented Oct 8, 2025

🚀 Attempted to deploy access-om3 Prerelease pr148-9 with commit 97ae9f0

🖥️ Gadi Deployment ❌

@minghangli-uni
Copy link
Collaborator Author

minghangli-uni commented Oct 9, 2025

This test only tested historical repro, is there another for restart repro ?

repro: All available reproducibility tests (all repro_ test markers but repro_determinism_restart).

I would expect all avail repro tests had been done using !test repro, but it seems not.

@anton-seaice
Copy link
Collaborator

Ah right - it runs the tests in the ci.json file - see https://github.com/ACCESS-NRI/access-om3-configs/blob/69b60c0c5719270d83a27a8d70b6d67b47e0f62b/config/ci.json#L26

That is some confusing re-use of terminology though !

@minghangli-uni
Copy link
Collaborator Author

so can i do !test repro_restart explicitly?

@CodeGat
Copy link
Member

CodeGat commented Oct 9, 2025

@minghangli-uni - @anton-seaice is right regarding not being able to apply flags in the spack.packages.all.require section. I don't think there is a syntactically neater way (at least, not in this version of spack)

@dougiesquire
Copy link
Collaborator

so can i do !test repro_restart explicitly?

Nope, unfortunately you can't choose which tests are run from within the PR

@anton-seaice
Copy link
Collaborator

In the most recent failed build, from /scratch/tm70/***/tmp/spack-stage/spack-stage-access3-2025.08.000-nii42jey5otzjpajtdcmbkcjqe3uyha4/spack-build-out.txt

It says:

[ 66%] Linking Fortran static library libOM3_cesm_driver_MOM6-CICE6.a
/apps/cmake/3.24.2/bin/cmake -P CMakeFiles/OM3_cesm_driver_MOM6-CICE6.dir/cmake_clean_target.cmake
/apps/cmake/3.24.2/bin/cmake -E cmake_link_script CMakeFiles/OM3_cesm_driver_MOM6-CICE6.dir/link.txt --verbose=1
/bin/ar qc libOM3_cesm_driver_MOM6-CICE6.a "CMakeFiles/OM3_cesm_driver_MOM6-CICE6.dir/CMEPS/CMEPS/cesm/driver/esm.F90.o" "CMakeFiles/OM3_cesm_driver_MOM6-CICE6.dir/CMEPS/CMEPS/cesm/driver/ensemble_driver.F90.o" "CMakeFiles/OM3_cesm_driver_MOM6-CICE6.dir/CMEPS/CMEPS/cesm/driver/esm_time_mod.F90.o"
bfd plugin: LLVM gold plugin has failed to create LTO module: Unknown attribute kind (102) (Producer: 'Intel.oneAPI.DPCPP.Compiler_2025.2.0' Reader: 'LLVM 19.1.7')
bfd plugin: LLVM gold plugin has failed to create LTO module: Unknown attribute kind (102) (Producer: 'Intel.oneAPI.DPCPP.Compiler_2025.2.0' Reader: 'LLVM 19.1.7')
bfd plugin: LLVM gold plugin has failed to create LTO module: Unknown attribute kind (102) (Producer: 'Intel.oneAPI.DPCPP.Compiler_2025.2.0' Reader: 'LLVM 19.1.7')
/bin/ranlib libOM3_cesm_driver_MOM6-CICE6.a
bfd plugin: LLVM gold plugin has failed to create LTO module: Unknown attribute kind (102) (Producer: 'Intel.oneAPI.DPCPP.Compiler_2025.2.0' Reader: 'LLVM 19.1.7')
bfd plugin: LLVM gold plugin has failed to create LTO module: Unknown attribute kind (102) (Producer: 'Intel.oneAPI.DPCPP.Compiler_2025.2.0' Reader: 'LLVM 19.1.7')
bfd plugin: LLVM gold plugin has failed to create LTO module: Unknown attribute kind (102) (Producer: 'Intel.oneAPI.DPCPP.Compiler_2025.2.0' Reader: 'LLVM 19.1.7')
make[2]: Leaving directory '/scratch/tm70/tm70_ci/tmp/spack-stage/spack-stage-access3-2025.08.000-nii42jey5otzjpajtdcmbkcjqe3uyha4/spack-build-nii42je'

@manodeep says:

That means -flto was not applied. You likely need to use the -fuse-ld=lld flag at the link-line

Even though 'ldflags="-march=sapphirerapids -mtune=sapphirerapids -unroll -O3 -flto -fuse-ld=lld"' is in spack.yaml - maybe its not getting through to the linker ?

@manodeep
Copy link

manodeep commented Oct 9, 2025

In the most recent failed build, from /scratch/tm70/***/tmp/spack-stage/spack-stage-access3-2025.08.000-nii42jey5otzjpajtdcmbkcjqe3uyha4/spack-build-out.txt

It says:

[ 66%] Linking Fortran static library libOM3_cesm_driver_MOM6-CICE6.a
/apps/cmake/3.24.2/bin/cmake -P CMakeFiles/OM3_cesm_driver_MOM6-CICE6.dir/cmake_clean_target.cmake
/apps/cmake/3.24.2/bin/cmake -E cmake_link_script CMakeFiles/OM3_cesm_driver_MOM6-CICE6.dir/link.txt --verbose=1
/bin/ar qc libOM3_cesm_driver_MOM6-CICE6.a "CMakeFiles/OM3_cesm_driver_MOM6-CICE6.dir/CMEPS/CMEPS/cesm/driver/esm.F90.o" "CMakeFiles/OM3_cesm_driver_MOM6-CICE6.dir/CMEPS/CMEPS/cesm/driver/ensemble_driver.F90.o" "CMakeFiles/OM3_cesm_driver_MOM6-CICE6.dir/CMEPS/CMEPS/cesm/driver/esm_time_mod.F90.o"
bfd plugin: LLVM gold plugin has failed to create LTO module: Unknown attribute kind (102) (Producer: 'Intel.oneAPI.DPCPP.Compiler_2025.2.0' Reader: 'LLVM 19.1.7')
bfd plugin: LLVM gold plugin has failed to create LTO module: Unknown attribute kind (102) (Producer: 'Intel.oneAPI.DPCPP.Compiler_2025.2.0' Reader: 'LLVM 19.1.7')
bfd plugin: LLVM gold plugin has failed to create LTO module: Unknown attribute kind (102) (Producer: 'Intel.oneAPI.DPCPP.Compiler_2025.2.0' Reader: 'LLVM 19.1.7')
/bin/ranlib libOM3_cesm_driver_MOM6-CICE6.a
bfd plugin: LLVM gold plugin has failed to create LTO module: Unknown attribute kind (102) (Producer: 'Intel.oneAPI.DPCPP.Compiler_2025.2.0' Reader: 'LLVM 19.1.7')
bfd plugin: LLVM gold plugin has failed to create LTO module: Unknown attribute kind (102) (Producer: 'Intel.oneAPI.DPCPP.Compiler_2025.2.0' Reader: 'LLVM 19.1.7')
bfd plugin: LLVM gold plugin has failed to create LTO module: Unknown attribute kind (102) (Producer: 'Intel.oneAPI.DPCPP.Compiler_2025.2.0' Reader: 'LLVM 19.1.7')
make[2]: Leaving directory '/scratch/tm70/tm70_ci/tmp/spack-stage/spack-stage-access3-2025.08.000-nii42jey5otzjpajtdcmbkcjqe3uyha4/spack-build-nii42je'

@manodeep says:

That means -flto was not applied. You likely need to use the -fuse-ld=lld flag at the link-line

Even though 'ldflags="-march=sapphirerapids -mtune=sapphirerapids -unroll -O3 -flto -fuse-ld=lld"' is in spack.yaml - maybe its not getting through to the linker ?

Looks like the line is calling the archiver. Instead of ar, you need to invoke the llvm-ar - the full path to which you can get by using ifx --print-prog-name=llvm-ar (or mpifort instead of ifx)

@anton-seaice
Copy link
Collaborator

Looks like the line is calling the archiver. Instead of ar, you need to invoke the llvm-ar - the full path to which you can get by using ifx --print-prog-name=llvm-ar (or mpifort instead of ifx)

Can you test @minghangli-uni ?

It looks like it should be

/apps/intel-tools/.packages/2025.2.0.575/compiler/2025.2/bin/compiler/llvm-ar ?

(and is maybe missing from https://github.com/ACCESS-NRI/spack-config/blob/main/common/gadi/linux/compilers.yaml ?)

But the best way to test it is not clear to me?

There is a CMAKE_AR variable, but i don't know how to set that from a spack.yaml file

@github-actions
Copy link

🚀 Attempted to deploy access-om3 Prerelease pr148-15 with commit df8640f

🖥️ Gadi Deployment ✔️

Usage Instructions

access-om3, defined in ./spack.yaml, will be deployed to Gadi as:

  • 2025.08.002/x4lh6v3ng as a Release (when merged).
  • pr148-15 as a Prerelease (during this PR).

This Prerelease is accessible on Gadi using:

module use /g/data/vk83/prerelease/modules
module load access-om3/pr148-15

When using the above modules, the binaries shall be on your $PATH.

For advanced users, this Prerelease is also accessible on Gadi via /g/data/vk83/prerelease/apps/spack/0.22/spack in the access-om3-pr148-15 environment.
Due to inode-saving measures, one will have to manually untar the environment metadata before environment activation with tar -xf .spack-env .spack-env.tar. It will require one to have write privileges.

Configuration Information

This Prerelease is deployed using:

If the above was not what was expected, commit changes to config/versions.json in this PR.

@github-actions
Copy link

🚀 Attempted to deploy access-om3 Prerelease pr148-16 with commit 679c5dc

🖥️ Gadi Deployment ✔️

Usage Instructions

access-om3, defined in ./spack.yaml, will be deployed to Gadi as:

  • 2025.08.002/x4lh6v3ng as a Release (when merged).
  • pr148-16 as a Prerelease (during this PR).

This Prerelease is accessible on Gadi using:

module use /g/data/vk83/prerelease/modules
module load access-om3/pr148-16

When using the above modules, the binaries shall be on your $PATH.

For advanced users, this Prerelease is also accessible on Gadi via /g/data/vk83/prerelease/apps/spack/0.22/spack in the access-om3-pr148-16 environment.
Due to inode-saving measures, one will have to manually untar the environment metadata before environment activation with tar -xf .spack-env .spack-env.tar. It will require one to have write privileges.

Configuration Information

This Prerelease is deployed using:

If the above was not what was expected, commit changes to config/versions.json in this PR.

@minghangli-uni
Copy link
Collaborator Author

Both pr148-15 and pr148-16 build successfully when using either the released version of access3 (default) or forcing llvm-ar (if my setting is correct)

It seems likely that the previous failure was caused still by filesystem rather than the build setup itself?

@github-actions
Copy link

🚀 Attempted to deploy access-om3 Prerelease pr148-17 with commit 2c87c1f

🖥️ Gadi Deployment ❌

@github-actions
Copy link

🚀 Attempted to deploy access-om3 Prerelease pr148-18 with commit 927e963

🖥️ Gadi Deployment ✔️

Usage Instructions

access-om3, defined in ./spack.yaml, will be deployed to Gadi as:

  • 2025.08.002/x4lh6v3ng as a Release (when merged).
  • pr148-18 as a Prerelease (during this PR).

This Prerelease is accessible on Gadi using:

module use /g/data/vk83/prerelease/modules
module load access-om3/pr148-18

When using the above modules, the binaries shall be on your $PATH.

For advanced users, this Prerelease is also accessible on Gadi via /g/data/vk83/prerelease/apps/spack/0.22/spack in the access-om3-pr148-18 environment.
Due to inode-saving measures, one will have to manually untar the environment metadata before environment activation with tar -xf .spack-env .spack-env.tar. It will require one to have write privileges.

Configuration Information

This Prerelease is deployed using:

If the above was not what was expected, commit changes to config/versions.json in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Blocked

Development

Successfully merging this pull request may close these issues.

8 participants