-
Notifications
You must be signed in to change notification settings - Fork 446
Closed
Labels
CMake build systemFrontierOLCF machine FrontierOLCF machine FrontierSCORPIOThe E3SM I/O library (derived from PIO)The E3SM I/O library (derived from PIO)
Description
PR #6689 explicitly loads the Core/24.07 module on Frontier. The only available CMake module with Core/24.07 is cmake/3.27.9. This version breaks the crayclanggpu build when OMP_NUM_THREADS > 1, particularly after PR #6747 re-enabled PIO_ENABLE_TOOLS for SCORPIO.
Steps to Reproduce on Frontier
git clone https://github.com/E3SM-Project/E3SM.git
cd E3SM
git submodule update --init --recursive
cd cime/scripts
./create_newcase --machine=frontier --compiler=crayclanggpu --case X_f19_g16 --compset X --res f19_g16
cd X_f19_g16
./xmlchange LND_NTHRDS=2
./case.setup
./case.build
CMake Error Message
CMake Error at /autofs/nccs-svm1_sw/frontier/spack-envs/core-24.07/opt/gcc-7.5.0/cmake-3.27.9-pyxnvhiskwepbw5itqyipzyhhfw3yitk/share/cmake-3.27/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
Could NOT find MPI (missing: MPI_Fortran_FOUND) (found version "3.1")
Call Stack (most recent call first):
/autofs/nccs-svm1_sw/frontier/spack-envs/core-24.07/opt/gcc-7.5.0/cmake-3.27.9-pyxnvhiskwepbw5itqyipzyhhfw3yitk/share/cmake-3.27/Modules/FindPackageHandleStandardArgs.cmake:600 (_FPHSA_FAILURE_MESSAGE)
/autofs/nccs-svm1_sw/frontier/spack-envs/core-24.07/opt/gcc-7.5.0/cmake-3.27.9-pyxnvhiskwepbw5itqyipzyhhfw3yitk/share/cmake-3.27/Modules/FindMPI.cmake:1837 (find_package_handle_standard_args)
tools/spio_finfo/CMakeLists.txt:21 (find_package)
This issue is also reproducible with standalone SCORPIO builds. It seems related to CMake versions 3.22 or higher, as described in E3SM-Project/scorpio#517, which mentions a similar issue occurring when CMAKE_SYSTEM_NAME is set to Catamount.
Tests with Different CMake Versions
[Failing with CMake/3.27.9]
. /usr/share/lmod/lmod/init/sh
module reset
module switch Core Core/24.07
module load cmake/3.27.9
module load craype-accel-amd-gfx90a rocm/5.4.0
git clone https://github.com/E3SM-Project/scorpio.git
cd scorpio
mkdir build1
cd build1
FC=ftn CC=cc CXX=mpicxx \
LDFLAGS="-fopenmp" \
cmake -Wno-dev \
-DWITH_NETCDF=OFF \
-DPnetCDF_PATH=/opt/cray/pe/parallel-netcdf/1.12.3.1/crayclang/14.0 \
..
[Failing with CMake/3.22.2]
. /usr/share/lmod/lmod/init/sh
module reset
module switch Core Core/24.00
module load cmake/3.22.2
module load craype-accel-amd-gfx90a rocm/5.4.0
git clone https://github.com/E3SM-Project/scorpio.git
cd scorpio
mkdir build2
cd build2
FC=ftn CC=cc CXX=mpicxx \
LDFLAGS="-fopenmp" \
cmake -Wno-dev \
-DWITH_NETCDF=OFF \
-DPnetCDF_PATH=/opt/cray/pe/parallel-netcdf/1.12.3.1/crayclang/14.0 \
..
[Working with CMake/3.21.3]
. /usr/share/lmod/lmod/init/sh
module reset
module switch Core Core/24.00
module load cmake/3.21.3
module load craype-accel-amd-gfx90a rocm/5.4.0
git clone https://github.com/E3SM-Project/scorpio.git
cd scorpio
mkdir build3
cd build3
FC=ftn CC=cc CXX=mpicxx \
LDFLAGS="-fopenmp" \
cmake -Wno-dev \
-DWITH_NETCDF=OFF \
-DPnetCDF_PATH=/opt/cray/pe/parallel-netcdf/1.12.3.1/crayclang/14.0 \
..
[Working with /usr/bin/cmake (3.20.4)]
. /usr/share/lmod/lmod/init/sh
module reset
module switch Core Core/24.07
module load craype-accel-amd-gfx90a rocm/5.4.0
git clone https://github.com/E3SM-Project/scorpio.git
cd scorpio
mkdir build4
cd build4
FC=ftn CC=cc CXX=mpicxx \
LDFLAGS="-fopenmp" \
/usr/bin/cmake -Wno-dev \
-DWITH_NETCDF=OFF \
-DPnetCDF_PATH=/opt/cray/pe/parallel-netcdf/1.12.3.1/crayclang/14.0 \
..
Possible Fixes
- Switch to the older Core/24.00 module to use cmake/3.21.3 with the crayclanggpu compiler.
- Continue using the latest Core/24.07, but use the default system CMake (version 3.20.4, located at /usr/bin/cmake).
Metadata
Metadata
Assignees
Labels
CMake build systemFrontierOLCF machine FrontierOLCF machine FrontierSCORPIOThe E3SM I/O library (derived from PIO)The E3SM I/O library (derived from PIO)