Skip to content

CMake 3.27.9 causes SCORPIO configuration errors on Frontier with crayclanggpu when OMP_NUM_THREADS > 1 #6750

@dqwu

Description

@dqwu

PR #6689 explicitly loads the Core/24.07 module on Frontier. The only available CMake module with Core/24.07 is cmake/3.27.9. This version breaks the crayclanggpu build when OMP_NUM_THREADS > 1, particularly after PR #6747 re-enabled PIO_ENABLE_TOOLS for SCORPIO.

Steps to Reproduce on Frontier

git clone https://github.com/E3SM-Project/E3SM.git
cd E3SM

git submodule update --init --recursive

cd cime/scripts

./create_newcase --machine=frontier --compiler=crayclanggpu --case X_f19_g16 --compset X --res f19_g16
cd X_f19_g16

./xmlchange LND_NTHRDS=2

./case.setup

./case.build

CMake Error Message

CMake Error at /autofs/nccs-svm1_sw/frontier/spack-envs/core-24.07/opt/gcc-7.5.0/cmake-3.27.9-pyxnvhiskwepbw5itqyipzyhhfw3yitk/share/cmake-3.27/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find MPI (missing: MPI_Fortran_FOUND) (found version "3.1")
Call Stack (most recent call first):
  /autofs/nccs-svm1_sw/frontier/spack-envs/core-24.07/opt/gcc-7.5.0/cmake-3.27.9-pyxnvhiskwepbw5itqyipzyhhfw3yitk/share/cmake-3.27/Modules/FindPackageHandleStandardArgs.cmake:600 (_FPHSA_FAILURE_MESSAGE)
  /autofs/nccs-svm1_sw/frontier/spack-envs/core-24.07/opt/gcc-7.5.0/cmake-3.27.9-pyxnvhiskwepbw5itqyipzyhhfw3yitk/share/cmake-3.27/Modules/FindMPI.cmake:1837 (find_package_handle_standard_args)
  tools/spio_finfo/CMakeLists.txt:21 (find_package)

This issue is also reproducible with standalone SCORPIO builds. It seems related to CMake versions 3.22 or higher, as described in E3SM-Project/scorpio#517, which mentions a similar issue occurring when CMAKE_SYSTEM_NAME is set to Catamount.

Tests with Different CMake Versions

[Failing with CMake/3.27.9]

. /usr/share/lmod/lmod/init/sh
module reset
module switch Core Core/24.07
module load cmake/3.27.9
module load craype-accel-amd-gfx90a rocm/5.4.0

git clone https://github.com/E3SM-Project/scorpio.git
cd scorpio

mkdir build1
cd build1

FC=ftn CC=cc CXX=mpicxx \
LDFLAGS="-fopenmp" \
cmake -Wno-dev \
-DWITH_NETCDF=OFF \
-DPnetCDF_PATH=/opt/cray/pe/parallel-netcdf/1.12.3.1/crayclang/14.0 \
..

[Failing with CMake/3.22.2]

. /usr/share/lmod/lmod/init/sh
module reset
module switch Core Core/24.00
module load cmake/3.22.2
module load craype-accel-amd-gfx90a rocm/5.4.0

git clone https://github.com/E3SM-Project/scorpio.git
cd scorpio

mkdir build2
cd build2

FC=ftn CC=cc CXX=mpicxx \
LDFLAGS="-fopenmp" \
cmake -Wno-dev \
-DWITH_NETCDF=OFF \
-DPnetCDF_PATH=/opt/cray/pe/parallel-netcdf/1.12.3.1/crayclang/14.0 \
..

[Working with CMake/3.21.3]

. /usr/share/lmod/lmod/init/sh
module reset
module switch Core Core/24.00
module load cmake/3.21.3
module load craype-accel-amd-gfx90a rocm/5.4.0

git clone https://github.com/E3SM-Project/scorpio.git
cd scorpio

mkdir build3
cd build3

FC=ftn CC=cc CXX=mpicxx \
LDFLAGS="-fopenmp" \
cmake -Wno-dev \
-DWITH_NETCDF=OFF \
-DPnetCDF_PATH=/opt/cray/pe/parallel-netcdf/1.12.3.1/crayclang/14.0 \
..

[Working with /usr/bin/cmake (3.20.4)]

. /usr/share/lmod/lmod/init/sh
module reset
module switch Core Core/24.07
module load craype-accel-amd-gfx90a rocm/5.4.0

git clone https://github.com/E3SM-Project/scorpio.git
cd scorpio

mkdir build4
cd build4

FC=ftn CC=cc CXX=mpicxx \
LDFLAGS="-fopenmp" \
/usr/bin/cmake -Wno-dev \
-DWITH_NETCDF=OFF \
-DPnetCDF_PATH=/opt/cray/pe/parallel-netcdf/1.12.3.1/crayclang/14.0 \
..

Possible Fixes

  1. Switch to the older Core/24.00 module to use cmake/3.21.3 with the crayclanggpu compiler.
  2. Continue using the latest Core/24.07, but use the default system CMake (version 3.20.4, located at /usr/bin/cmake).

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions