Skip to content
Draft
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
739d2a6
Update to v1.12.0rc1
andrewdnolan Jul 17, 2025
e398e69
Update deployment for mache 1.31.0
andrewdnolan Jul 21, 2025
5c017d4
Update to v1.12.0rc2
andrewdnolan Jul 23, 2025
1c6a9f3
Clean up `meta.yaml`
andrewdnolan Jul 23, 2025
c5deaec
Update TempestExtremes to v2.3.1
andrewdnolan Aug 1, 2025
64eb787
Move conda installation to versioned directory
andrewdnolan Sep 18, 2025
72dce2d
Put compiler and mpi in spack directory name
andrewdnolan Sep 30, 2025
188973e
Apply suggestions from code review
andrewdnolan Oct 6, 2025
4b5dc34
Manually apply more suggestions from code review
andrewdnolan Oct 6, 2025
a4303a1
Explicitly add notebook as dependecy
andrewdnolan Oct 13, 2025
6c4ef8d
Add zppy-interfaces dev label
andrewdnolan Oct 10, 2025
e8696ac
Remove testing command for nb_conda
andrewdnolan Oct 14, 2025
c31915c
Sync meta.yaml with confluence page
andrewdnolan Oct 14, 2025
7113180
Add "Verify compute-node activation" step to deployment docs
xylar Oct 16, 2025
047c730
If depolying a RC add the dev label to channels
andrewdnolan Oct 14, 2025
c0f810e
fix return variables name
andrewdnolan Oct 15, 2025
4f314d9
Bump mpi4py version
andrewdnolan Oct 15, 2025
2cc66fa
uncomment file permsission updates
andrewdnolan Oct 20, 2025
1f21059
Update to v1.12.0rc2
andrewdnolan Oct 20, 2025
f5ffbad
Shorten e3sm-unified deployment path
xylar Oct 23, 2025
d6fdb3a
Clean up get_conda_base()
xylar Oct 23, 2025
5f9abfe
Update paths the need permission changes
andrewdnolan Oct 23, 2025
14d1df4
Start user restricted perms at machine dir level
andrewdnolan Oct 23, 2025
ced1fb5
Add `pre_conda_script` to load scripts
xylar Oct 24, 2025
24ebabe
Update symlinks created for NCO
andrewdnolan Oct 27, 2025
c0ae19f
Update to v1.12.0rc3
andrewdnolan Oct 29, 2025
fef77bf
Update version number in meta.yaml
andrewdnolan Oct 29, 2025
cfc804c
Add check for ${PBS_JOBID} to templates
andrewdnolan Oct 30, 2025
40b30ac
Update recipes/e3sm-unified/meta.yaml
andrewdnolan Nov 5, 2025
fec404f
Update to 1.12.0rc4
andrewdnolan Nov 5, 2025
851c1d9
Ensure prebuilt wheel for mpi4py is not installed
andrewdnolan Nov 3, 2025
577f929
Update base path permission non recursively
andrewdnolan Nov 3, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/build_workflow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ jobs:
shell: bash -l {0}
strategy:
matrix:
python-version: ["3.9", "3.10"]
python-version: ["3.11", "3.12", "3.13"]
mpi: ["hpc", "nompi", "mpich", "openmpi"]
fail-fast: false
steps:
Expand Down
71 changes: 69 additions & 2 deletions docs/releasing/testing/deploying-on-hpcs.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,12 +142,79 @@ used during deployment.
* Permissions have been updated successfully (read only for everyone
except the E3SM-Unified maintainer)

4. **Manually test** tools in the installed environment
4. **Verify compute-node activation**

The activation scripts are designed to load a no-MPI environment on login
nodes and the MPI-enabled environment on compute nodes (detected via
scheduler variables like `$SLURM_JOB_ID` or `$COBALT_JOBID`). Before manual
testing, confirm that sourcing the script on a compute node loads the MPI
environment as expected.

Steps:

* Start an interactive job on a compute node, for example:

- Slurm:

```bash
salloc -N 1 -t 10:00
```

- Cobalt:

```bash
qsub -I -n 1 -t 10
```

- PBS (example):

```bash
qsub -I -l select=1:ncpus=1:mpiprocs=1,walltime=00:10:00
```

* On the compute node, source the activation script:

- Bash/zsh:

```bash
source test_e3sm_unified_<version>_<machine>.sh
```

- csh/tcsh:

```bash
source test_e3sm_unified_<version>_<machine>.csh
```

For release builds, use the corresponding `load_e3sm_unified_<version>_<machine>.*`
or `load_latest_e3sm_unified_<machine>.*` script names.

* Verify that the MPI environment is active (not the no-MPI one):

```bash
echo "$E3SMU_MPI" # should NOT be "NOMPI" on a compute node
which python # should point to the E3SM-Unified conda env
python -c "import mpi4py, xarray; print('mpi4py:', mpi4py.__version__)"
```

Optional quick MPI sanity check (if mpirun/srun is available on the node):

```bash
mpirun -n 2 python -c "from mpi4py import MPI; print(MPI.COMM_WORLD.Get_size())"
# or, for Slurm
srun -n 2 python -c "from mpi4py import MPI; print(MPI.COMM_WORLD.Get_size())"
```

If the script loads the no-MPI environment (`E3SMU_MPI=NOMPI`) on a compute
node, check that the scheduler environment variables are present on compute
nodes for this machine and update the activation templates if needed.

5. **Manually test** tools in the installed environment

* Load via: `source test_e3sm_unified_<version>_<machine>.sh`
* Run tools like `zppy`, `e3sm_diags`, `mpas_analysis`

5. **Deploy more broadly** once core systems pass testing
6. **Deploy more broadly** once core systems pass testing

---

Expand Down
148 changes: 95 additions & 53 deletions e3sm_supported_machines/bootstrap.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,22 @@
import os
import subprocess
import shutil
from pathlib import Path
from jinja2 import Template
from importlib import resources
from configparser import ConfigParser

from mache import discover_machine
from mache.spack import make_spack_env, get_spack_script, \
get_modules_env_vars_and_mpi_compilers
from mache.machines.pre_conda import load_pre_conda_script
from mache.spack import (
make_spack_env,
get_spack_script,
get_modules_env_vars_and_mpi_compilers,
)
from mache.permissions import update_permissions
from shared import (
check_call,
get_conda_base,
get_base,
get_rc_dev_labels,
install_miniforge3,
parse_args,
Expand Down Expand Up @@ -75,7 +80,7 @@ def get_env_setup(args, config, machine):

if machine is not None and compiler is not None:
conda_mpi = 'hpc'
env_suffix = f'_{machine}'
env_suffix = '_compute'
else:
conda_mpi = mpi
env_suffix = '_login'
Expand Down Expand Up @@ -110,12 +115,6 @@ def build_env(is_test, recreate, compiler, mpi, conda_mpi, version,
os.chdir(build_dir)

env_name = f'e3sm_unified_{version}{env_suffix}'

# add the compiler and MPI library to the spack env name
spack_env = f'{env_name}_{compiler}_{mpi}'
# spack doesn't like dots
spack_env = spack_env.replace('.', '_')

env_path = os.path.join(conda_base, 'envs', env_name)

if conda_mpi in ['nompi', 'hpc']:
Expand All @@ -137,6 +136,9 @@ def build_env(is_test, recreate, compiler, mpi, conda_mpi, version,
if local_conda_build is not None:
channels = f'{channels} -c {local_conda_build}'

if 'rc' in version:
channels = f'{channels} -c conda-forge/label/e3sm_unified_dev'

meta_yaml_path = os.path.join(
os.path.dirname(__file__),
"..",
Expand Down Expand Up @@ -183,7 +185,7 @@ def build_env(is_test, recreate, compiler, mpi, conda_mpi, version,
else:
print(f'{env_name} already exists')

return env_path, env_name, activate_env, channels, spack_env
return env_path, env_name, activate_env, channels


def install_mache_from_branch(activate_env, fork, branch):
Expand All @@ -196,7 +198,7 @@ def install_mache_from_branch(activate_env, fork, branch):


def build_sys_ilamb_esmpy(config, machine, compiler, mpi, template_path,
activate_env, channels, spack_base, spack_env):
activate_env, channels, spack_base):

mpi4py_version = config.get('e3sm_unified', 'mpi4py')
ilamb_version = config.get('e3sm_unified', 'ilamb')
Expand Down Expand Up @@ -224,7 +226,7 @@ def build_sys_ilamb_esmpy(config, machine, compiler, mpi, template_path,
modules = f'{activate_env_lines}\n{modules}'

spack_view = f'{spack_base}/var/spack/environments/' \
f'{spack_env}/.spack-env/view'
f'e3sm_spack_env/.spack-env/view'
script = template.render(
mpicc=mpicc, modules=modules, template_path=template_path,
mpi4py_version=mpi4py_version, build_mpi4py=str(build_mpi4py),
Expand All @@ -248,10 +250,10 @@ def build_sys_ilamb_esmpy(config, machine, compiler, mpi, template_path,
return esmf_mk


def build_spack_env(config, machine, compiler, mpi, spack_env, tmpdir):
def build_spack_env(config, machine, compiler, mpi, version, tmpdir):

base_path = config.get('e3sm_unified', 'base_path')
spack_base = f'{base_path}/spack/{spack_env}'
base_path = get_base(config, version)
spack_base_path = f'{base_path}/{machine}/spack/{compiler}_{mpi}'

if config.has_option('e3sm_unified', 'use_e3sm_hdf5_netcdf'):
use_e3sm_hdf5_netcdf = config.getboolean('e3sm_unified',
Expand All @@ -274,20 +276,37 @@ def build_spack_env(config, machine, compiler, mpi, spack_env, tmpdir):
continue
value = section[option]
if value != '':
specs.append(f'"{value}"')
specs.append(f'{value}')

make_spack_env(spack_path=spack_base, env_name=spack_env,
make_spack_env(spack_path=spack_base_path, env_name='e3sm_spack_env',
spack_specs=specs, compiler=compiler, mpi=mpi,
machine=machine, tmpdir=tmpdir, include_e3sm_lapack=True,
include_e3sm_hdf5_netcdf=use_e3sm_hdf5_netcdf,
spack_mirror=spack_mirror)

return spack_base
return spack_base_path


def write_load_e3sm_unified(
template_path,
activ_path,
conda_base,
is_test,
version,
activ_suffix,
env_name,
env_nompi,
sys_info,
ext,
machine,
spack_script,
):

pre_conda_script = load_pre_conda_script(machine=machine, ext=ext)

def write_load_e3sm_unified(template_path, activ_path, conda_base, is_test,
version, activ_suffix, env_name, env_nompi,
sys_info, ext, machine, spack_script):
print(f'Pre-conda script for {machine} ({ext}):')
print(pre_conda_script)
print('---')

try:
os.makedirs(activ_path)
Expand Down Expand Up @@ -321,14 +340,17 @@ def write_load_e3sm_unified(template_path, activ_path, conda_base, is_test,
else:
env_type = 'SYSTEM'

script = template.render(conda_base=conda_base, env_name=env_name,
env_type=env_type,
script_filename=script_filename,
env_nompi=env_nompi,
spack='\n '.join(spack_script.split('\n')),
modules='\n '.join(sys_info['modules']),
env_vars=env_vars,
machine=machine)
script = template.render(
pre_conda_script=pre_conda_script,
conda_base=conda_base,
env_name=env_name,
env_type=env_type,
script_filename=script_filename,
env_nompi=env_nompi,
spack='\n '.join(spack_script.split('\n')),
modules='\n '.join(sys_info['modules']),
env_vars=env_vars,
machine=machine)

# strip out redundant blank lines
lines = list()
Expand Down Expand Up @@ -375,10 +397,6 @@ def check_env(script_filename, env_name, conda_mpi, machine):
command = f'{activate} && python -c "import {import_name}"'
test_command(command, os.environ, import_name)

# an extra check because the lack of ESMFRegrid is a problem for e3sm_diags
command = f'{activate} && python -c "from regrid2 import ESMFRegrid"'
test_command(command, os.environ, 'cdms2')

for command in commands:
package = command[0]
command_str = ' '.join(command)
Expand Down Expand Up @@ -418,7 +436,8 @@ def main():
else:
is_test = not config.getboolean('e3sm_unified', 'release')

conda_base = get_conda_base(args.conda_base, config, shared=True)
base_path = get_base(config, version)
conda_base = os.path.join(base_path, machine, 'conda')
conda_base = os.path.abspath(conda_base)

source_activation_scripts = \
Expand All @@ -439,7 +458,7 @@ def main():
nompi_suffix = '_login'
# first, make environment for login nodes. We're using no-MPI from
# conda-forge for now
env_path, env_nompi, activate_env, _, _ = build_env(
conda_env_path, env_nompi, activate_env, _ = build_env(
is_test, recreate, nompi_compiler, mpi, 'nompi', version,
python, conda_base, nompi_suffix, nompi_suffix, activate_base,
args.local_conda_build, config)
Expand All @@ -450,11 +469,22 @@ def main():
branch=args.mache_branch)

if not is_test:
# make a symlink to the environment
link = os.path.join(conda_base, 'envs', 'e3sm_unified_latest')
check_call(f'ln -sfn {env_path} {link}')

env_path, env_name, activate_env, channels, spack_env = build_env(
top_dir = Path(config.get('e3sm_unified', 'base_path'))
nco_dir = (top_dir / "e3smu_latest_for_nco").mkdir(exist_ok=True)

# copy readme into directory for nco symlinks
readme = Path(template_path) / "e3sm_unified_nco.readme"
shutil.copy(readme, nco_dir / "README")

link = nco_dir / machine
check_call(f'ln -sfn {conda_env_path} {link}')

(
conda_env_path,
conda_env_name,
activate_env,
channels
) = build_env(
is_test, recreate, compiler, mpi, conda_mpi, version,
python, conda_base, activ_suffix, env_suffix, activate_base,
args.local_conda_build, config)
Expand All @@ -463,27 +493,39 @@ def main():
env_vars=['export HDF5_USE_FILE_LOCKING=FALSE'])

if compiler is not None:
spack_base = build_spack_env(config, machine, compiler, mpi, spack_env,
args.tmpdir)
spack_base = build_spack_env(
config, machine, compiler, mpi, version, args.tmpdir
)
esmf_mk = build_sys_ilamb_esmpy(config, machine, compiler, mpi,
template_path, activate_env, channels,
spack_base, spack_env)
spack_base)
sys_info['env_vars'].append(esmf_mk)
else:
spack_base = None

# start restricted permissions at machine level
paths_to_update = [os.path.join(base_path, machine)]
test_script_filename = None
for ext in ['sh', 'csh']:
if compiler is not None:
spack_script = get_spack_script(
spack_path=spack_base, env_name=spack_env, compiler=compiler,
mpi=mpi, shell=ext, machine=machine)
spack_path=spack_base, env_name="e3sm_spack_env",
compiler=compiler, mpi=mpi, shell=ext, machine=machine)
else:
spack_script = ''

script_filename = write_load_e3sm_unified(
template_path, activ_path, conda_base, is_test, version,
activ_suffix, env_name, env_nompi, sys_info, ext, machine,
template_path,
activ_path,
conda_base,
is_test,
version,
activ_suffix,
conda_env_name,
env_nompi,
sys_info,
ext,
machine,
spack_script)
if ext == 'sh':
test_script_filename = script_filename
Expand All @@ -493,16 +535,16 @@ def main():
link = os.path.join(activ_path, link)
check_call(f'ln -sfn {script_filename} {link}')

check_env(test_script_filename, env_name, conda_mpi, machine)
# update files before directories, since they are quicker to do
paths_to_update.insert(0, script_filename)

check_env(test_script_filename, conda_env_name, conda_mpi, machine)

commands = f'{activate_base} && conda clean -y -p -t'
check_call(commands)

paths = [activ_path, conda_base]
if spack_base is not None:
paths.append(spack_base)
group = config.get('e3sm_unified', 'group')
update_permissions(paths, group, show_progress=True,
update_permissions(paths_to_update, group, show_progress=True,
group_writable=False, other_readable=True)


Expand Down
Loading