-
Notifications
You must be signed in to change notification settings - Fork 21
Carbon settings for GOCART
Set up a GitHub account. Must you also be part of the GEOS-ESM group? Is there an existing document for GEOS that we can link to?
Much of this is already documented in the
Only the essential steps are documented below.
First, you need to have the correct modules to check out the code. If you are expert user, make sure you have git and mepo in your path. If you're not, execute the following to get them (as well as anything else you might need).
module purge
module use -a /discover/swdev/gmao_SIteam/modulefiles-SLES15
module load GEOSenv
Note the above will only work on SLES15 nodes. For SLES12 nodes, replace SLES15 above with SLES12. Currently all login nodes are SLES12, all Milan compute nodes are SLES15, and Cascade Lake compute nodes are available on both SLES12 and SLES15.
Other notes: There is currently an issue with cmake identifying the correct Python 3 install. If a mepo or parallel_build.csh run dies, especially near f2py, this may be the problem. Working on a fix ...
Then, decide where you want to set up the model code. While you can check it out in your home directory, we recommend checking it out on a scratch space because it can get pretty big, especially when you are testing multiple different model versions. On NCCS, you typically want to use $NOBACKUP. Somewhere in that folder, check out the model with
git clone -b v11.5.2 git@github.com:GEOS-ESM/GEOSgcm.git GEOSgcm-v11.5.2
Note that 11.5.2 simply happens to be the latest released tag at the time this is being written. There is nothing sacred about that tag. If you want a later tag, you can find all release versions here.
The model consists of code in several sub-repositories, by default none of which are checked out. There is a file called components.yaml in the source tree you just checked out, which contains the tags for each repository that will be checked out. Save a copy of this file somewhere, say as components.yaml.orig. Then add the following block to check out RRG:
RRG:
local: ./src/Components/@GEOSgcm_GridComp/GEOSagcm_GridComp/GEOSphysics_GridComp/@GEOSchem_GridComp/@RRG
remote: ../RRG.git
branch: main
develop: develop
In addition, both GEOSchem_GridComp and GOCART will need to be changed to branches that contain the latest GOCART code. In components.yaml, change the tag line following GOCART to branch: feature/sbasu1/gocart+11.5.2, and the tag line following GEOSchem_GridComp to branch: feature/sbasu1/gocart+11.5.2. Note that these are different repositories, although the branch names are identical.
You must build the code on a compute node of the same architecture as the ones you will be running the model on. For this example we will be building and running on AMD Milan nodes.
./parallel_build.csh -mil
Run mepo clone at the command line to check out all the repositories at the branches/tags in components.yaml.
Then get a terminal on such a compute node with
salloc -A s1460 --nodes=1 --constraint=mil --qos=debug -t 60
This gets you a terminal on a Milan node under the debug queue, which is pretty fast but has a wall clock limit of 1 hour. You could, alternatively, issue this command first thing in the morning without --qos=debug and with -t 480 and get a node for 8 hours. You will need to wait longer to get a node, but once you do, you're set for a day's worth of building and debugging.
Extra details: The -t 60 command is optional for --qos=debug which defaults to and is capped at an hour. The -A s1460 command is also optional assuming you've run something before. You must, of course, be on the s1460 compute code use this account. If not, substitute your account charge code.
Once you get on a compute node, go to the folder where you checked out the source tree, and just to be safe create a clean environment as follows (this assumes you are running a bash/ksh/zsh variant and not C shell):
module purge
cd @env
source g5_modules.sh
cd ..
Now you're ready to build. Since you're already on a compute node, no need to submit a parallel build job. Instead, issue the following commands in order:
mkdir build
cd build
cmake .. -DBASEDIR=$BASEDIR/Linux -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=../install
make -j install
This builds the model into ../install, specifically the GEOS GCM executable is ../install/bin/GEOSgcm.x.
Important: When you run gcm_setup to set up a new run, this executable is copied over to the run directory. As a result, if you want to fix something in code and recompile, the changes will not be seen in your run unless you copy over the executable again. Therefore, I often symlink install/bin/GEOSgcm.x from my run directory.
Use gcm_setup to clone the run in /discover/nobackup/bweir/GEOS/runs/carbon-ng_ana. You can get restarts like this:
tar xf /discover/nobackup/projects/gmao/geos_carb_dev/bweir/runs/carbon-ng_ana/restarts/restarts.e20141023_21z.tar
~bweir/bin/striprst.sh
The run:
sbatch ./gcm_run.j
Although it'd probably be better to get an interactive session and try it, e.g.,
salloc --time=10:00:00 --constraint=mil --ntasks=1200 --ntasks-per-node=120
Go into install/bin and execute ./gcm_setup.
-
Experiment IDis any name you want to give the run. It's a good idea to include the model version and something about which tracers you are running in a short name. Mostly something that you will remember. If you call it (say)Appleit's pretty much guaranteed that you won't remember what it is for two years down the line. I'm calling mineGCM-11.5.2-methane-c180. -
Experiment Descriptionis a short description to help you remember. -
CLONEis the ability of a model to copy over someone else's run folder. This is a very useful ability, but for now let's chooseNO. -
Atmospheric Horizontal Resolutiondepends on what you want to run. I'm choosingc180. It's perfectly fine to choosec90for model development. - Default
Vertical Resolutionof 72 layers is fine - Default
MicrophysicsofBACM_1Mis fine - Default
TRUEforHydrostatic Atmosphereis fine - Use
IOSERVERif you're runningc180or higher - Default processor type of
milis fine - Default
NOtoCOUPLED Ocean/Sea-Ice Modelis fine - Choose
CS(cubed sphere) forData_Ocean Horizontal Resolution - Default choice
Icarus-NLv3for land surface boundary conditions is fine - Default choice
Catchmentfor land surface model is fine - Accept default choice to run GOCART with
Actualaerosols - Choose to use
OPSemission files for GOCART, because theAMIPemission files do not exist for recent years - For
c180, aHEARTBEAT_DTof450is fine - Don't worry about the
HISTORY template, you are going to change the history file anyway - The
HOME Directoryis where the run folder will be created. Just make sure it's created somewhere inside/discover/nobackup/projects/gmao/geos_carb/${USER} - In theory
EXPERIMENT Directorycan be different fromHOME Directory, but no one has ever tried it. Either set it to be the same, or try at your own risk and don't expect any sympathy if you break something. - The
Build directoryshould already be correct - Our
GROUP IDiss1460
Every so often gcm_setup will fail with errors like
/tmp/tmp.sVmzWQGKy5: Permission denied.
/bin/mv: cannot stat '/discover/nobackup/projects/gmao/geos_carb/sbasu1/runs/GCM/test_restarts/AGCM.rc.tmpl': No such file or directory
/tmp/tmp.VXxsGkzxBY: Permission denied.
/bin/mv: cannot stat '/discover/nobackup/projects/gmao/geos_carb/sbasu1/runs/GCM/test_restarts/AGCM.rc.tmpl': No such file or directory
/tmp/tmp.kAg9MZYON8: Permission denied.
/bin/mv: cannot stat '/discover/nobackup/projects/gmao/geos_carb/sbasu1/runs/GCM/test_restarts/AGCM.rc.tmpl': No such file or directory
/tmp/tmp.Xced0YAEi2: Permission denied.
/bin/mv: cannot stat '/discover/nobackup/projects/gmao/geos_carb/sbasu1/runs/GCM/test_restarts/AGCM.rc.tmpl': No such file or directory
/tmp/tmp.igPE8a3dYw: Permission denied.
/bin/mv: cannot stat '/discover/nobackup/projects/gmao/geos_carb/sbasu1/runs/GCM/test_restarts/AGCM.rc.tmpl': No such file or directory
/tmp/tmp.OL9WF9FmLj: Permission denied.
/bin/mv: cannot stat '/discover/nobackup/projects/gmao/geos_carb/sbasu1/runs/GCM/test_restarts/AGCM.rc.tmpl': No such file or directory
/tmp/tmp.73LABTUPJy: Permission denied.
/bin/mv: cannot stat '/discover/nobackup/projects/gmao/geos_carb/sbasu1/runs/GCM/test_restarts/AGCM.rc.tmpl': No such file or directory
/tmp/tmp.rdwOgZdlbP: Permission denied.
/bin/mv: cannot stat '/discover/nobackup/projects/gmao/geos_carb/sbasu1/runs/GCM/test_restarts/AGCM.rc.tmpl': No such file or directory
/tmp/tmp.Qak33WKR9E: Permission denied.
/bin/mv: cannot stat '/discover/nobackup/projects/gmao/geos_carb/sbasu1/runs/GCM/test_restarts/AGCM.rc.tmpl': No such file or directory
/tmp/tmp.vDLxsZWrwJ: Permission denied.
/bin/mv: cannot stat '/discover/nobackup/projects/gmao/geos_carb/sbasu1/runs/GCM/test_restarts/AGCM.rc.tmpl': No such file or directory
cat: /discover/nobackup/projects/gmao/geos_carb/sbasu1/runs/GCM/test_restarts/AGCM.rc.tmpl.tmp: No such file or directory
For some unknown reason, /tmp on discover acts up with denied permissions. Probably because it's mounted with noexec. To solve, do
export TMPDIR=/discover/nobackup/$USER/tmp
mkdir -p $TMPDIR
before executing gcm_setup.
This is a dark art. Remembering Robert the Bruce before embarking on this endeavor would be well advised.
GEOS restart files are called *_rst, even though they're really netcdf files. Ours not to reason why, ours but to do and die. You will see some *_import_rst and some *_internal_rst. Ignore the first kind, you will only need to supply the second kind for a new run. There are two types of *_internal_rst restart files, upper air (3D) restarts and surface (2D) restarts. Upper air restarts are defined on the cube, contains variables with shape levels x N x 6N, and are fairly easily created by the provided scripts for creating/remapping restarts (more below). There are very few ways in which these can "go wrong". Surface restarts can also be created by the provided remapping scripts. However, these will very likely make you weep. Instead of being on grids, surface restarts are provided as a list of tiles (my theory is that whoever made that decision was trying to save disk space and reinvented the wheel instead of relying on compression algorithms). Every single land model has a different ordering of these tiles, and understanding what your land model is requires a fair amount of expert knowledge. Worse, the choice of a land model makes pretty much zero difference in a replay run, yet your model will crash unless you do this correctly. In moments of frustration, remember Robert the Bruce.
The script to create restarts is called install/bin/remap_restarts.py. Do not run this on a compute node because it requires access to some filesystems that are not mounted on compute nodes. On a front-end node, run it as follows:
module purge
source @env/g5_modules.sh
install/bin/remap_restarts.py
This will present you with a series of questions, answer as follows.
- Remap archived MERRA-2 restarts?
Yes - Enter restart date/time: Enter YYYYMMDDHH, where HH is one of 03, 09, 15 or 21 for MERRA2
- Enter output directory for new restarts: Make sure this is a unique folder which is not your run folder, you can later copy them over
- Remap to a stretched cubed-sphere grid?
No - Enter atmospheric grid for new restarts: Enter the same atmospheric resolution you entered for
gcm_setup - Select ocean model for new restarts:
data - Select data ocean grid/resolution for new restarts:
CS - Enter number of atmospheric levels for new restarts: Choose what you chose for
gcm_setup - Select boundary conditions (BCs) version for new restarts: This depends on what you chose for the land boundary condition in
gcm_setup. If you choseIcarus-NLv3there, chooseNL3here. - Land BCs for input restarts: You will be presented a folder choice, accept it
- Select BCs base directory for new restarts: Select what you are given
- Land BCs for output restarts: Select what you are given
- Remap upper air restarts?
Yes - Remap agcm_import_rst (a.k.a. IAU) file needed for REPLAY runs?
No - Remap surface restarts?
Yes - Remap bkg files?
No - Write lcv file?
No - Enter value of WEMIN. No idea what this is, just choose what you are given.
- Enter value of zoom parameter for surface restarts [1-8]? No idea what this is, just choose what you are given.
- Enter experiment ID for new restarts: Fine to leave this blank.
- Add labels for BCs version and atm/ocean resolutions to restart file names?
No - SLURM or PBS quality-of-service (qos)?
debug - ('Select/enter SLURM or PBS account:\n',)
s1460 - ('Enter SLURM or PBS partition: (If desired; can leave blank.)\n',) Leave blank.
After entering all the questions, it will submit a job to the queue to regrid the restarts, and make you wait while it does, i.e., the sbatch command won't exit. Don't close the terminal or quit at this point, hopefully the debug queue will be quick enough. Once the job is done, you need to copy over the *_rst.nc4 files from the ouput folder (above) to your run directory and remove the extension .nc4.
Before running with GOCART, RRG etc., it's good practice to run a "minimal" replay configuration. Once that works, you can add components and tracers. For this example, we will run with just PCHEM (I believe this stands for Parameterized CHEMistry) for chemistry, and specify that for the source of radiative forcing, aerosols, etc. This obviates the need for tracer restart files and emissions and gets you running the GCM. To do that, make the following modifications.
- Comment out the following lines in
AGCM.rc# Enable wet scavenging #MCHEMTRI_increments:: #DU::DU default #SS::SS default #SU::SO4 default #CA.bc::CA.bcphilic default #CA.br::CA.brphilic default #CA.oc::CA.ocphilic default #NI::NO3an1 "NI::NO3an2,NI::NO3an3" #PCHEM::OX default #:: - In
RC/GEOS_ChemGridComp.rc, set everything toFALSE, exceptENABLE_PCHEM. Set that toTRUE. - In
AGCM.rc, set the appropriateRATSandAEROproviders,RATS_PROVIDER: PCHEM # options: PCHEM, GMICHEM, STRATCHEM (Radiatively active tracers) AERO_PROVIDER: none # options: GOCART2G, MAM, none (Radiatively active aerosols) ANALYSIS_OX_PROVIDER: PCHEM # options: PCHEM, GMICHEM, STRATCHEM, GOCART - In
AGCM.rcsetUSE_AEROSOL_NN: .false.. That key may not exist in recent model tags, add it. - In
RC/GOCART2G_GridComp.rc, keep allACTIVE_INSTANCES_*andPASSIVE_INSTANCES_*blank, e.g.,ACTIVE_INSTANCES_DU: PASSIVE_INSTANCES_DU: ACTIVE_INSTANCES_SS: PASSIVE_INSTANCES_SS: ACTIVE_INSTANCES_SU: PASSIVE_INSTANCES_SU: ACTIVE_INSTANCES_CA: PASSIVE_INSTANCES_CA: ACTIVE_INSTANCES_NI: PASSIVE_INSTANCES_NI: - In
HISTORY.rcdo not ask for any collection to be written, i.e.,COLLECTIONS: ::
The GEOS GCM is by default "free running", which means that it has no obligation to follow the real atmosphere. It is a dynamical model which will be driven by an initial condition, the Navier-Stokes equations, incoming solar radiation, and a few other boundary conditions. If you want it to have the winds that were actually observed, you will need to replay it to a meteorological reanalysis. The reanalysis knows about what happened in the past by virtue of weather data assimilation.
Enable replay in AGCM.rc by uncommenting one of the REPLAY_MODE keys. The most typical replay configuration you will use is "Regular" replay to the MERRA2 reanalysis. You have the choice of replaying to either 6-hourly snapshots at 3z, 9z, 15z and 21z, or 3 hourly averages spanning 0-3z, 3-6z, etc. To replay to 3-hourly averages, which is recommended, use the following settings in AGCM.rc:
ASSIMILATION_CYCLE: 10800
REPLAY_MODE: Regular
REPLAY_ANA_EXPID: MERRA-2
REPLAY_FILE: /discover/nobackup/projects/gmao/merra2/data/products/MERRA2_all/Y%y4/M%m2/MERRA2.tavg3_3d_asm_Nv.%y4%m2%d2.nc4
REPLAY_FILE_FREQUENCY: 10800
REPLAY_FILE_REFERENCE_TIME: 013000
The repository version of gcm_run.j as of October 16 2024 will not work with this. That is because that gcm_run.j expects two keys, REPLAY_ANA_LOCATION and REPLAY_FILE. The above would correspond to the pair
REPLAY_ANA_LOCATION: /discover/nobackup/projects/gmao/merra2/data/products
REPLAY_FILE: MERRA2_all/Y%y4/M%m2/MERRA2.tavg3_3d_asm_Nv.%y4%m2%d2.nc4
When you run the model, gcm_run.j assumes that the first path component of REPLAY_FILE is a folder, and makes a symlink of that name inside scratch pointing to REPLAY_ANA_LOCATION, i.e., scratch/MERRA2_all points to /discover/nobackup/projects/gmao/merra2/data/products. So when GEOS runs, it is really reading scratch/MERRA2_all/Y%y4/M%m2/MERRA2.tavg3_3d_asm_Nv.%y4%m2%d2.nc4 after substituting all the date and time tokens. However, this mechanism will clearly not work with the following pair
REPLAY_ANA_LOCATION: /discover/nobackup/projects/gmao/merra2/data/products/MERRA2_all
REPLAY_FILE: Y%y4/M%m2/MERRA2.tavg3_3d_asm_Nv.%y4%m2%d2.nc4
which, from the perspective of a normal human user used to filesystem logic, is equivalent to the key pair that works. Worse, GEOS doesn't actually need REPLAY_ANA_LOCATION, it only reads the key REPLAY_FILE, and is perfectly capable of handling long paths. Hence, in my gcm_run.j I have removed the entire mechanism of creating the aforementioned symlink (search for the conditional block if( $REPLAY_MODE == 'Exact' | $REPLAY_MODE == 'Regular' ) then and look at the lines commented within), and removed the key REPLAY_ANA_LOCATION in AGCM.rc.
Some model tags, upon checkout, will use openmpi by default. However, on AMD Milan nodes openmpi results in irreproducible crashes and is generally slow. To switch to Intel MPI,
- Identify which version of
baselibsyou are using. Open@env/g5_modulesand you should see anifblock keyed offOS_VERSION. Check whatOS_VERSIONresolves to on your node(s), and locatebasedir. Specifically, a line like this
set basedir = /discover/swdev/gmao_SIteam/Baselibs/ESMA-Baselibs-7.24.0/x86_64-pc-linux-gnu/ifort_2021.6.0-openmpi_4.1.6-SLES15
This means you're using baselibs 7.24. Say this is X.XX.
- Navigate to
~mathomp4/GitG5Modules/SLES15/X.XXto see all possibleg5_modulesfor this version ofbaselibs. Locate one that uses the compiler suite you're using and Intel MPI (should haveimpiin the name). - Copy over this
g5_modules.blah_blahto@env/g5_modulesin your source tree. Do not worry about the existing one, you can always bring it back withgit revert. - Delete your
builddirectory and rebuild the model from scratch, and copy over the newGEOSgcm.xinto your model run directory. - In your
gcm_run.j, you'll find a bunch ofOMPI_MCA_environment variables set. Those are now irrelevant. Below that block, add the following for Intel MPI
setenv I_MPI_FABRICS shm:ofi # Use shared memory and OFI
setenv I_MPI_OFI_PROVIDER psm3 # Specify the PSM3 OFI provider
setenv I_MPI_ADJUST_ALLREDUCE 12 # Prevent MPI hang after ExtData read
setenv I_MPI_ADJUST_GATHERV 3 # Prevent MPI hang after ExtData read
Every so often, when you try to run a fresh model setup, you'll get an error such as
Error! Found 339967 tiles in openwater. Expect to find 359523 tiles.
Your restarts are probably for a different ocean.
This is probably because the water restarts you are using come from a run with a different choice of Land Surface Boundary Conditions. Look in your gcm_run.j, specifically setenv BCSDIR. Set this to whatever is in the run you copied the water restarts from. There is a specific combination of BCSDIR in gcm_run.j and the water restarts that will work. Unfortunately, gcm_setup is not your friend here; it will not tell you which folder to copy the restarts from given your choice of land boundary conditions.
This has to do with the choice of ocean during gcm_setup and making restart files. The Reynolds ocean ends some time in 2022, so you need to have picked a cubed sphere ocean boundary condition. Again, gcm_setup is not your friend here because in most cases the Reynolds ocean is the default choice. So if you have clicked through the default choices, you are toast. Set up two experiments, one with Reynols ocean and another with CS ocean, and check the differences in gcm_run.j and linkbcs. Try to make those same modifications in your actual experiment.
If you're using openmpi on AMD Milan nodes, then once in a while your job will crash with the following strange error
An ORTE daemon has unexpectedly failed after launch and before
communicating back to mpirun. This could be caused by a number
of factors, including an inability to create a connection back
to mpirun due to a lack of common network interfaces and/or no
route found between them. Please check network connectivity
(including firewalls and network routing requirements).
The recommendation from NCCS is to use Intel MPI, so switch to Intel MPI using instructions above.
Use this section to propose and discuss changes.
Minor stuff: SLURM settings
Use this section for notes.