Skip to content

Commit 44814e4

Browse files
committed
Finish fleshing out the releasing docs
1 parent 1329562 commit 44814e4

File tree

8 files changed

+775
-11
lines changed

8 files changed

+775
-11
lines changed
Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
# Adding a New Machine
2+
3+
Support for a new HPC machine in E3SM-Unified requires coordinated updates
4+
across multiple tools — primarily in
5+
[`mache`](https://github.com/E3SM-Project/mache), but also in the E3SM Spack
6+
fork and deployment scripts.
7+
8+
This page provides guidance for E3SM-Unified maintainers and infrastructure
9+
developers integrating new machines into the release and deployment workflow.
10+
11+
---
12+
13+
## 🔗 Main Mache Documentation
14+
15+
Most of the process is already documented in the official `mache` developer
16+
guide:
17+
18+
* [Adding a New Machine](https://docs.e3sm.org/mache/main/developers_guide/adding_new_machine.html)
19+
* [Adding Spack Support](https://docs.e3sm.org/mache/main/developers_guide/spack.html)
20+
21+
Start in `mache` to:
22+
23+
* Add a machine-specific config file (e.g., `pm-cpu.cfg`)
24+
* Add hostname detection logic in `discover.py`
25+
* Create Spack templates for supported compiler/MPI stacks
26+
* Optionally add shell script templates for environment setup
27+
28+
> ⚠️ Machines not listed in the E3SM
29+
[`config_machines.xml`](https://github.com/E3SM-Project/E3SM/blob/master/cime_config/machines/config_machines.xml) must first be added upstream before `mache`
30+
can support them.
31+
32+
---
33+
34+
## 🧩 Integration with E3SM-Unified Deployment
35+
36+
After updating `mache`, you'll need to:
37+
38+
1. **Reference your `mache` branch in E3SM-Unified Deployment**
39+
40+
* Use the `--mache_fork` and `--mache_branch` flags to deploy using the
41+
updated branch
42+
* Confirm the new machine is recognized and templates are applied correctly
43+
44+
2. **Update Spack if needed**
45+
46+
* If new versions of external tools are required, update the
47+
[`spack_for_mache_<version>`](spack-updates.md) branch of the
48+
[E3SM Spack fork](https://github.com/E3SM-Project/spack)
49+
50+
---
51+
52+
## ✅ Testing Your Changes
53+
54+
Use the standard test deployment approach from
55+
[Deploying on HPCs](deploying-on-hpcs.md):
56+
57+
```bash
58+
cd e3sm_supported_machines
59+
./deploy_e3sm_unified.py --conda ~/miniforge3 \
60+
--mache_fork <your_fork> \
61+
--mache_branch <your_branch>
62+
```
63+
You can also supply these flags:
64+
```
65+
--machine <new_machine> \
66+
--compiler <compiler> \
67+
--mpi <mpi> \
68+
```
69+
but they should not be needed if you have set things up in `mache` correctly.
70+
71+
During testing, focus on:
72+
73+
* Spack external package detection and successful builds
74+
* Shell script generation and activation behavior
75+
* Module compatibility and performance of tools like `zppy` and `e3sm_diags`
76+
77+
---
78+
79+
## 💡 Tips and Best Practices
80+
81+
* Reuse YAML templates from similar machines to minimize effort
82+
* Add common system tools as `buildable: false` in the Spack environment
83+
* Avoid identifying machines using environment variables unless absolutely
84+
necessary. Instead use the hostnames for login and compute nodes if
85+
possible
86+
* Use `utils/update_cime_machine_config.py` to verify `mache` remains in sync
87+
with E3SM
88+
89+
---
90+
91+
➡ Next: [Publishing the Final Release](publishing-final-release.md)

docs/releasing/creating-rcs/rc-e3sm-unified.md

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,34 @@ resolution issues.
100100

101101
---
102102

103-
## 6. Tag and Publish the RC
103+
## 6. Make a draft PR
104+
105+
Push the branch to your fork of `e3sm-unified` and make a draft PR to the
106+
main `e3sm-unified` repo. Use that PR to document progress and highlight
107+
important version updates in this release for the public (those without
108+
acces to E3SM's Confluence pages). See
109+
[this example](https://github.com/E3SM-Project/e3sm-unified/pull/125).
110+
111+
---
112+
113+
## 7. Keeping updated on Confluence
114+
115+
As deployment and testing progresses, you needs to make sure that the packages
116+
in your `update-to-<version>` branch match the
117+
[agreed-upon versions on Confluence](https://e3sm.atlassian.net/wiki/spaces/DOC/pages/129732419/Packages+in+the+E3SM+Unified+conda+environment#Next-versions).
118+
Maintainers of dependencies will need to inform you as new release candidates
119+
or final releases become available, preferably by updating Confluence and also
120+
sending a Slack message or email.
121+
122+
As testing nears completion, it is also time to draft a release note, similar
123+
to [this example](https://e3sm.atlassian.net/wiki/spaces/DOC/pages/4908515329/E3SM-Unified+1.11.0+release+notes).
124+
Ask maintainers of any of the main E3SM-Unified packages that have been
125+
updated since the last release to describe (**briefly and with minimal
126+
jargon**) what is new in their package that would be of interest to users.
127+
128+
---
129+
130+
## 8. Tag and Publish the RC
104131

105132
After test builds are successful:
106133

Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
# Publishing the Final Release
2+
3+
Once all dependencies have been tested and validated, and the E3SM-Unified
4+
release candidate (RC) has passed testing across the relevant HPC systems, the
5+
final release can be published. This page outlines the process of finalizing
6+
and distributing an official E3SM-Unified release.
7+
8+
---
9+
10+
## ✅ Pre-Release Checklist
11+
12+
Before publishing:
13+
14+
* [ ] All RC versions of dependencies (e.g., `e3sm_diags`, `zppy`, `mache`)
15+
have been released with final version tags and conda-forge packages
16+
* [ ] Final version of `e3sm-unified` has been created and built on conda-forge
17+
* [ ] Final deployments have been completed on all target HPC machines
18+
* [ ] Smoke testing and key workflows (e.g., `zppy`, `mpas_analysis`) have
19+
been validated
20+
21+
---
22+
23+
## Step-by-Step Finalization
24+
25+
### 1. Remove RC Labels
26+
27+
Edit `recipes/e3sm-unified/meta.yaml` and:
28+
29+
* Replace RC versions of dependencies (e.g., `3.0.0rc2`) with final versions
30+
(e.g., `3.0.0`) in both `meta.yaml` and `default.cfg`
31+
* Bump the `e3sm-unified` version accordingly (e.g., from `1.12.0rc3` to
32+
`1.12.0`) in `meta.yaml` and `e3sm_supported_machines/shared.py`
33+
34+
Commit the changes to your `update-to-<version>` branch.
35+
36+
### 2. Tag Final Release in Source Repo
37+
38+
If you followed the suggested workflow under
39+
[Creating an RC for E3SM-Unified](creating-rcs/rc-e3sm-unified.md), you should
40+
have a draft PR from your `update-to-<version>` branch that documents the
41+
changes. Merge this PR into `main` so the release history and testing context
42+
are preserved.
43+
44+
Then, go to `Releases` on the right on the
45+
[main page](https://github.com/E3SM-Project/e3sm-unified) of the repo and
46+
click `Draft a new release` at the top.
47+
48+
Document the changes in this version (hopefully just copy-paste from the
49+
description of your recently merged PR), similar to
50+
[this example](https://github.com/E3SM-Project/e3sm-unified/releases/tag/1.11.0).
51+
52+
### 3. Submit Final Feedstock PR
53+
54+
Go to the [e3sm-unified-feedstock](https://github.com/conda-forge/e3sm-unified-feedstock):
55+
56+
* Open a pull request from your fork
57+
* Update the version number and `sha256` hash.
58+
* Target the `main` branch (not `dev`)
59+
* Ensure final versions of all dependencies are listed
60+
61+
Once CI passes, merge the PR.
62+
63+
This will trigger CI to publish the new release to the standard conda-forge
64+
channel. You typically need to wait as long as an hour after packages have
65+
built for them to become available for installation. You can watch
66+
[this page](https://anaconda.org/conda-forge/e3sm-unified/files)
67+
to see when files appear and how many downloads they have. Once all files have
68+
been built and show 2 or more downloads, you should be good to proceed with
69+
final deployment.
70+
71+
### 4. Deploy Final Release on HPC Systems
72+
73+
Use the same process as during RC testing, but now with the `--release` flag:
74+
75+
```bash
76+
./deploy_e3sm_unified.py --conda ~/miniforge3 --release
77+
```
78+
79+
This creates new activation scripts like:
80+
81+
* `load_e3sm_unified_<version>_<machine>.sh`
82+
83+
Also generates symlinks like:
84+
85+
* `load_latest_e3sm_unified_<machine>.sh`
86+
87+
### 5. Announce the Release
88+
89+
Share the release:
90+
91+
* 📝 **Confluence** [Like this example](https://e3sm.atlassian.net/wiki/spaces/DOC/pages/4908515329/E3SM-Unified+1.11.0+release+notes)
92+
* **Email** to [E3SM All-hands](https://e3sm.atlassian.net/wiki/spaces/ED/pages/818381294/Email+Lists) list (same contents as Confluence page)
93+
* 📣 **Slack** (`#e3sm-help-postproc`) with release highlights
94+
95+
Be sure to include:
96+
97+
* Final versions of core E3SM-developed packages (e.g., `mpas_analysis`,
98+
`zppy`)
99+
* List of supported HPC machines and activation instructions
100+
* Summary of major changes, fixes, and new features
101+
102+
---
103+
104+
## 🔁 Post-Release Maintenance
105+
106+
On each supported machine:
107+
108+
* Clean up outdated `test_...` activation scripts
109+
* Remove conda and spack environments for E3SM-Unified RCs
110+
* Delete the `update-to-<version>` branch
111+
* Move the contents on Confluence describing the
112+
[current version](https://e3sm.atlassian.net/wiki/spaces/DOC/pages/129732419/Packages+in+the+E3SM+Unified+conda+environment#Current-Version)
113+
to the top of the
114+
[previous versions](https://e3sm.atlassian.net/wiki/spaces/DOC/pages/3236233332/Packages+in+previous+versions+E3SM+Unified+conda+environment) page
115+
* Copy the contents of the next version to be the new current version
116+
* Update the the version under "next version" and remove all bold (to indicate
117+
that, as a starting point, no updates have been made to any packages in the
118+
next version)
119+
* Move any release notes for older E3SM-Unified versions into the Confluence
120+
subdirectory for [previous versions](https://e3sm.atlassian.net/wiki/spaces/DOC/pages/3236233332/Packages+in+previous+versions+E3SM+Unified+conda+environment).
121+
122+
---
123+
124+
➡ Next: [Maintaining Past Versions](maintaining-past-versions.md)
Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
# Maintaining Past Versions
2+
3+
After a new version of E3SM-Unified is released, older versions may still be
4+
in use for months or years by analysis workflows, diagnostic pipelines, or
5+
collaborators working on older datasets. This page outlines best practices for
6+
keeping past versions available and usable.
7+
8+
---
9+
10+
## 🎯 Goals
11+
12+
* Ensure long-term reproducibility
13+
* Avoid breaking existing workflows
14+
* Minimize overhead for maintainers
15+
* Free up limited disk space when required
16+
17+
---
18+
19+
## 🔒 Avoid Breaking Changes
20+
21+
### Don’t Delete Spack or Conda Environments
22+
23+
E3SM-Unified installs are isolated by version. Do not delete directories like:
24+
25+
```bash
26+
/lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.11.0/
27+
/lcrc/soft/climate/e3sm-unified/spack/e3sm_unified_1.11.0_chrysalis_gnu_mpich/
28+
```
29+
30+
These environments may be used by others via scripts, batch jobs, or notebooks.
31+
32+
**Exception**: If the environment is broken beyond repair and cannot be
33+
recreated, it should be removed. If there is no more disk space for software,
34+
the oldest environments must be deleted to make room for new ones. Use your
35+
best judgment and document removals on Confluence.
36+
37+
### Don’t Remove Activation Scripts
38+
39+
Keep activation scripts for previous versions (e.g.,
40+
`load_e3sm_unified_1.11.0_chrysalis.sh`) in place.
41+
42+
**Exception**: If the environment has been removed, it is safe to remove the
43+
associated activation scripts.
44+
45+
---
46+
47+
## 🧹 What Can Be Removed
48+
49+
### Test Environments
50+
51+
You can safely delete environments or activation scripts for
52+
**release candidates**:
53+
54+
* `test_e3sm_unified_1.11.0rc3_*.sh`
55+
* Conda environments like `test_e3sm_unified_install`
56+
57+
These were used only during internal testing and should be removed when they
58+
are no longer needed to free up disk space.
59+
60+
### Intermediate Build Artifacts
61+
62+
Temporary logs or caches (e.g., from failed deployments) can be removed to
63+
save space.
64+
65+
---
66+
67+
## 🔁 Rebuilding Past Versions
68+
69+
If a past version breaks due to:
70+
71+
* OS upgrades
72+
* Module stack changes
73+
* File system reorganizations
74+
75+
...you may need to rebuild that version. Follow these steps:
76+
77+
1. Checkout the appropriate tag in the `e3sm-unified` repo (e.g., `1.11.0`)
78+
3. Use `deploy_e3sm_unified.py` with the `--version` flag (as a precaution):
79+
80+
```bash
81+
./deploy_e3sm_unified.py --conda ~/miniforge3 --version 1.11.0 --release --recreate
82+
```
83+
84+
You may run into difficulty solving for older conda environments e.g. because
85+
of packages that have been marked as broken in the interim. At some point, it
86+
may simply not be possible to recreate older E3SM-Unified conda environments
87+
because of this.
88+
89+
---
90+
91+
## 💬 Communication
92+
93+
* Coordinate cleanup of old versions via Slack (`#e3sm-help-postproc`)
94+
* Use Confluence notes to document version removals or rebuilds
95+
96+
---
97+
98+
Back to: [Publishing the Final Release](publishing-final-release.md)

0 commit comments

Comments
 (0)