Skip to content

Commit e143753

Browse files
authored
Merge pull request #318 from xylar/add-aurora
Add support for Aurora
2 parents 876b4c3 + e81b637 commit e143753

File tree

23 files changed

+621
-267
lines changed

23 files changed

+621
-267
lines changed

docs/developers_guide/api.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -224,7 +224,7 @@ seaice/api
224224
225225
write_job_script
226226
get_slurm_options
227-
clean_up_whitespace
227+
get_pbs_options
228228
```
229229

230230
### logging

docs/developers_guide/machines/index.md

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,10 @@ These are the machines supported by MPAS-Ocean and -Seaice, including the
7979
+--------------+------------+-----------+-------------------+
8080
```
8181

82+
:::{note}
83+
MPAS components currently do not support Aurora in standalone builds.
84+
:::
85+
8286
(dev-omega-supported-machines)=
8387

8488
### Omega Supported Machines
@@ -90,6 +94,8 @@ E3SM default for the given machine an compiler.
9094
+--------------+------------------+-----------+
9195
| Machine | Compiler | MPI lib. |
9296
+==============+==================+===========+
97+
| aurora | oneapi-ifx | mpich |
98+
+--------------+------------------+-----------+
9399
| chicoma-cpu | gnu | mpich |
94100
+--------------+------------------+-----------+
95101
| chrysalis | intel | openmpi |
@@ -239,10 +245,10 @@ hostname_contains = morpheus
239245
```
240246

241247
The `[parallel]` section should describe the type of parallel queuing
242-
system (currently only `slurm` or `single_node` are supported), the number
248+
system (currently `slurm`, `pbs` or `single_node` are supported), the number
243249
of cores per node and the command for running an MPI executable (typically
244-
`srun` for Slurm and `mpirun` for a "single node" machine like a laptop or
245-
workstation.
250+
`srun` for Slurm and `mpirun` for a PBS or "single node" machine like a laptop
251+
or workstation.
246252

247253
The `[spack]` section has some config options to do with loading system
248254
modules before or after loading a Spack environment. On a "single node"

docs/tutorials/dev_add_test_group.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -669,7 +669,7 @@ and run it:
669669
```bash
670670
$ cd ${PATH_TO_WORKING_DIR}/ocean/yet_another_channel/10km/default
671671
$ sbatch job_script.sh
672-
$ cat polaris.o${SLURM_JOBID}
672+
$ cat polaris.o${JOBID}
673673

674674
Loading conda environment
675675
Done.

docs/users_guide/config_files.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ database_root = /home/xylar/data/polaris/data
3535
# The parallel section describes options related to running tests in parallel
3636
[parallel]
3737
38-
# parallel system of execution: slurm or single_node
38+
# parallel system of execution: slurm, pbs or single_node
3939
system = single_node
4040
4141
# whether to use mpirun or srun to run the model
@@ -47,7 +47,7 @@ cores_per_node = 8
4747
4848
```
4949

50-
The comments in this example are hopefully pretty self-explanatory.
50+
The comments in this example are hopefully pretty self-explanatory.
5151
You provide the config file to `polaris setup` and `polaris suite` with
5252
the `-f` flag:
5353

@@ -81,12 +81,12 @@ sources:
8181
[extended interpolation](https://docs.python.org/3/library/configparser.html#configparser.ExtendedInterpolation)
8282
in the config file to use config options within other config
8383
options, e.g. `component = ${paths:component_path}/ocean_model`.
84-
- a config file shared with other similar tasks if one is defined. For
85-
idealized tests, these often include the size and resolution of the mesh as
84+
- a config file shared with other similar tasks if one is defined. For
85+
idealized tests, these often include the size and resolution of the mesh as
8686
well as (for ocean initial conditions) the number of vertical levels.
8787
- any number of config files from the task. There might be different
8888
config options depending on how the task is configured (e.g. only if a
89-
certain feature is enabled. For example, {ref}`ocean-global-ocean` loads
89+
certain feature is enabled. For example, {ref}`ocean-global-ocean` loads
9090
different sets of config options for different meshes.
9191
- a user's config file described above.
9292

@@ -153,7 +153,7 @@ core_path = mpas-ocean
153153
# source: /home/xylar/code/polaris/customize_config_parser/polaris/default.cfg
154154
partition_executable = gpmetis
155155
156-
# parallel system of execution: slurm or single_node
156+
# parallel system of execution: slurm, pbs or single_node
157157
# source: /home/xylar/code/polaris/customize_config_parser/inej.cfg
158158
system = single_node
159159
@@ -467,6 +467,6 @@ min_layer_thickness = 3.0
467467
max_layer_thickness = 500.0
468468
```
469469

470-
The comments are retained and the config file or python module where they were
471-
defined is also included as a a comment for provenance and to make it easier
470+
The comments are retained and the config file or python module where they were
471+
defined is also included as a a comment for provenance and to make it easier
472472
for users and developers to understand how the config file is built up.

docs/users_guide/invalid_quick_start.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -276,7 +276,7 @@ database_root = </path/to/root>/polaris/data
276276
# The parallel section describes options related to running tasks in parallel
277277
[parallel]
278278
279-
# parallel system of execution: slurm or single_node
279+
# parallel system of execution: slurm, pbs or single_node
280280
system = single_node
281281
282282
# whether to use mpirun or srun to run the model

docs/users_guide/machines/anvil.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,9 @@ polaris_envs = /lcrc/soft/climate/polaris/anvil/base
3030
# the compiler set to use for system libraries and MPAS builds
3131
compiler = intel
3232
33+
# the compiler to use to build software (e.g. ESMF and MOAB) with spack
34+
software_compiler = intel
35+
3336
# the system MPI library to use for intel compiler
3437
mpi_intel = impi
3538
@@ -51,7 +54,7 @@ Additionally, some relevant config options come from the
5154
# The parallel section describes options related to running jobs in parallel
5255
[parallel]
5356
54-
# parallel system of execution: slurm, cobalt or single_node
57+
# parallel system of execution: slurm, pbs or single_node
5558
system = slurm
5659
5760
# whether to use mpirun or srun to run a task
Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
# Aurora
2+
3+
login: `ssh <username>@aurora.alcf.anl.gov`
4+
5+
interactive login:
6+
7+
```bash
8+
qsub -I -A E3SM_Dec -q debug -l select=1 -l walltime=00:30:00 -l filesystems=home:flare
9+
```
10+
11+
Here is a link to the
12+
[Aurora User Guide](https://docs.alcf.anl.gov/aurora/)
13+
14+
## config options
15+
16+
Here are the default config options added when you have configured Polairs on
17+
a Aurora login node (or specified `./configure_polaris_envs.py -m aurora`):
18+
19+
```cfg
20+
# The paths section describes paths for data and environments
21+
[paths]
22+
23+
# A shared root directory where polaris data can be found
24+
database_root = /lus/flare/projects/E3SM_Dec/polaris
25+
26+
# the path to the base conda environment where polars environments have
27+
# been created
28+
polaris_envs = /lus/flare/projects/E3SM_Dec/soft/polaris/aurora/base
29+
30+
31+
# Options related to deploying a polaris conda and spack environments
32+
[deploy]
33+
34+
# the compiler set to use for system libraries and MPAS builds
35+
compiler = oneapi-ifx
36+
37+
# the compiler to use to build software (e.g. ESMF and MOAB) with spack
38+
software_compiler = oneapi-ifx
39+
40+
# the system MPI library to use for oneapi-ifx compiler
41+
mpi_oneapi_ifx = mpich
42+
43+
# the base path for spack environments used by polaris
44+
spack = /lus/flare/projects/E3SM_Dec/soft/polaris/aurora/spack
45+
46+
# whether to use the same modules for hdf5, netcdf-c, netcdf-fortran and
47+
# pnetcdf as E3SM (spack modules are used otherwise)
48+
use_e3sm_hdf5_netcdf = False
49+
50+
51+
# Config options related to creating a job script
52+
[job]
53+
54+
# the filesystems used for the job
55+
filesystems = home:flare
56+
```
57+
58+
Additionally, some relevant config options come from the
59+
[mache](https://github.com/E3SM-Project/mache/) package:
60+
61+
```cfg
62+
# The parallel section describes options related to running jobs in parallel
63+
[parallel]
64+
65+
# parallel system of execution: slurm, pbs or single_node
66+
system = pbs
67+
68+
# whether to use mpirun or srun to run a task
69+
parallel_executable = mpirun
70+
71+
# cores per node on the machine (with hyperthreading)
72+
cores_per_node = 208
73+
74+
# account for running diagnostics jobs
75+
account = E3SM_Dec
76+
77+
# queues (default is the first)
78+
queues = prod, debug
79+
80+
# Config options related to spack environments
81+
[spack]
82+
83+
# whether to load modules from the spack yaml file before loading the spack
84+
# environment
85+
modules_before = False
86+
87+
# whether to load modules from the spack yaml file after loading the spack
88+
# environment
89+
modules_after = False
90+
```
91+
92+
## Loading and running Polaris on Aurora
93+
94+
Follow the developer's guide at {ref}`dev-machines` to get set up. There are
95+
currently no plans to support a different deployment strategy (e.g. a shared
96+
environoment) for users.
97+

docs/users_guide/machines/chicoma.md

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,16 +83,25 @@ polaris_envs = /usr/projects/e3sm/polaris/chicoma-cpu/conda/base
8383
# the compiler set to use for system libraries and MPAS builds
8484
compiler = gnu
8585
86+
# the compiler to use to build software (e.g. ESMF and MOAB) with spack
87+
software_compiler = gnu
88+
8689
# the system MPI library to use for gnu compiler
8790
mpi_gnu = mpich
8891
92+
# the system MPI library to use for nvidia compiler
93+
mpi_nvidia = mpich
94+
8995
# the base path for spack environments used by polaris
9096
spack = /usr/projects/e3sm/polaris/chicoma-cpu/spack
9197
9298
# whether to use the same modules for hdf5, netcdf-c, netcdf-fortran and
9399
# pnetcdf as E3SM (spack modules are used otherwise)
94100
use_e3sm_hdf5_netcdf = True
95101
102+
# location of a spack mirror for polaris to use
103+
spack_mirror = /usr/projects/e3sm/polaris/chicoma-cpu/spack/spack_mirror
104+
96105
97106
# The parallel section describes options related to running jobs in parallel
98107
[parallel]
@@ -107,6 +116,11 @@ cores_per_node = 128
107116
# hanging on perlmutter)
108117
threads_per_core = 1
109118
119+
# quality of service
120+
# overriding mache because the debug qos also requires --reservaiton debug,
121+
# which polaris doesn't currently support
122+
qos = standard
123+
110124
111125
# Config options related to creating a job script
112126
[job]
@@ -125,7 +139,7 @@ Additionally, some relevant config options come from the
125139
# The parallel section describes options related to running jobs in parallel
126140
[parallel]
127141
128-
# parallel system of execution: slurm, cobalt or single_node
142+
# parallel system of execution: slurm, pbs or single_node
129143
system = slurm
130144
131145
# whether to use mpirun or srun to run a task

docs/users_guide/machines/chrysalis.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,9 @@ polaris_envs = /lcrc/soft/climate/polaris/chrysalis/base
2626
# the compiler set to use for system libraries and MPAS builds
2727
compiler = intel
2828
29+
# the compiler to use to build software (e.g. ESMF and MOAB) with spack
30+
software_compiler = intel
31+
2932
# the system MPI library to use for intel compiler
3033
mpi_intel = openmpi
3134
@@ -47,7 +50,7 @@ Additionally, some relevant config options come from the
4750
# The parallel section describes options related to running jobs in parallel
4851
[parallel]
4952
50-
# parallel system of execution: slurm, cobalt or single_node
53+
# parallel system of execution: slurm, pbs or single_node
5154
system = slurm
5255
5356
# whether to use mpirun or srun to run a task

docs/users_guide/machines/compy.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ database_root = /compyfs/polaris
1717
1818
# the path to the base conda environment where polaris environments have
1919
# been created
20-
polaris_envs = /share/apps/E3SM/polaris/conda/base
20+
polaris_envs = /share/apps/E3SM/conda_envs/polaris/conda/base
2121
2222
2323
# Options related to deploying a polaris conda and spack environments
@@ -26,14 +26,17 @@ polaris_envs = /share/apps/E3SM/polaris/conda/base
2626
# the compiler set to use for system libraries and MPAS builds
2727
compiler = intel
2828
29+
# the compiler to use to build software (e.g. ESMF and MOAB) with spack
30+
software_compiler = intel
31+
2932
# the system MPI library to use for intel compiler
3033
mpi_intel = impi
3134
3235
# the system MPI library to use for gnu compiler
3336
mpi_gnu = openmpi
3437
3538
# the base path for spack environments used by polaris
36-
spack = /share/apps/E3SM/polaris/spack
39+
spack = /share/apps/E3SM/conda_envs/polaris/spack
3740
3841
# whether to use the same modules for hdf5, netcdf-c, netcdf-fortran and
3942
# pnetcdf as E3SM (spack modules are used otherwise)
@@ -49,7 +52,7 @@ Additionally, some relevant config options come from the
4952
# The parallel section describes options related to running jobs in parallel
5053
[parallel]
5154
52-
# parallel system of execution: slurm, cobalt or single_node
55+
# parallel system of execution: slurm, pbs or single_node
5356
system = slurm
5457
5558
# whether to use mpirun or srun to run a task

0 commit comments

Comments
 (0)