1- This directory contains data and code for the Frontier window 10-20 Feb 2023.
1+ This directory contains data and code for the Frontier window 10-20 Feb 2023,
2+ plus later additional Frontier runs.
23
34The run*sh scripts are one-off run scripts. They take as an argument the number
45of nodes to run on. run-rocm54.sh and run-rocm54-eul.sh use the tag
@@ -7,7 +8,11 @@ run-rocm51.sh uses the tag
78 https://github.com/E3SM-Project/scream/releases/tag/archive%2Fscreamv1-frontier-feb2023-rocm51
89The rocm/5.4, cce/15.0.0 configuration has issues with BLAS in the LND
910component. The rocm/5.1, cce/14.0.2 configuration seems fine. As a result, our
10- figures use the rocm51-annotated data.
11+ figures use the rocm51-annotated data. Later, we were able to redo the
12+ large-scale simulations. These use the branch
13+ https://github.com/E3SM-Project/scream/tree/sarats/frontier-gb which is
14+ essentially the same rocm/5.1 configuration. These are the
15+ frontier-v1-scream-gb-o3-ne1024 data sets.
1116
1217jobmonitor.py is a tool to monitor a single job. If e3sm.exe terminates but the
1318job hangs, jobmonitor.py will kill it, thus minimizing hanging time.
@@ -28,8 +33,23 @@ The figs/ directory contains hy (version 0.20 running on any python3) code to
2833summarize and plot the data.
2934
3035The figures use the following subset of the data:
31- frontier-v1-scaling1-rocm51-nnodes512.ne1024pg2_ne1024pg2.F2010-SCREAMv1-timing.1271946.230210-212030-model_timing_stats
32- frontier-v1-scaling1-rocm51-nnodes1024.ne1024pg2_ne1024pg2.F2010-SCREAMv1-timing.1273705.230216-000516-model_timing_stats
33- frontier-v1-scaling1-rocm51-nnodes2048.ne1024pg2_ne1024pg2.F2010-SCREAMv1-timing.1273525.230215-230716-model_timing_stats
34- frontier-v1-scaling1-rocm54-nnodes4096.ne1024pg2_ne1024pg2.F2010-SCREAMv1-timing.1272650.230213-074922-model_timing_stats
35- frontier-v1-scaling1-rocm54-nnodes8192.ne1024pg2_ne1024pg2.F2010-SCREAMv1-timing.1272541.230212-204538-model_timing_stats
36+ Frontier:
37+ frontier-v1-scaling1-rocm51-nnodes512.ne1024pg2_ne1024pg2.F2010-SCREAMv1-timing.1271946.230210-212030-model_timing_stats
38+ frontier-v1-scaling1-rocm51-nnodes1024.ne1024pg2_ne1024pg2.F2010-SCREAMv1-timing.1273705.230216-000516-model_timing_stats
39+ frontier-v1-scaling1-rocm51-nnodes2048.ne1024pg2_ne1024pg2.F2010-SCREAMv1-timing.1273525.230215-230716-model_timing_stats
40+ frontier-v1-scream-gb-o3-ne1024-nnodes4096.ne1024pg2_ne1024pg2.F2010-SCREAMv1-1389464.230729-022636-model_timing_stats
41+ frontier-v1-scream-gb-o3-ne1024-nnodes8192.ne1024pg2_ne1024pg2.F2010-SCREAMv1-1389460.230730-003528-model_timing_stats
42+ Summit:
43+ screamv1-summit-oct2022/data/scream-v1-scaling2-nnodes1024.ne1024pg2_ne1024pg2.F2010-SCREAMv1-timing.2495303.221008-023937-model_timing_stats
44+ screamv1-summit-oct2022/data/scream-v1-scaling2-nnodes2048.ne1024pg2_ne1024pg2.F2010-SCREAMv1-timing.2495304.221009-093803-model_timing_stats
45+ screamv1-summit-oct2022/data/scream-v1-scaling2-nnodes3072.ne1024pg2_ne1024pg2.F2010-SCREAMv1-timing.2495590.221008-072336-model_timing_stats
46+ screamv1-summit-oct2022/data/scream-v1-scaling2-nnodes4096.ne1024pg2_ne1024pg2.F2010-SCREAMv1-timing.2497059.221010-173053-model_timing_stats
47+ screamv1-summit-oct2022/data/scream-v1-scaling2-nnodes4608.ne1024pg2_ne1024pg2.F2010-SCREAMv1-timing.2495935.221009-090433-model_timing_stats
48+ Perlmutter CPU
49+ screamv1-pm-cpu-mar2023/data/pm-cpu-v1-scaling1-nnodes1536.ne1024pg2_ne1024pg2.F2010-SCREAMv1-timing.6746630.230401-112731-model_timing_stats
50+ screamv1-pm-cpu-mar2023/data/pm-cpu-v1-scaling1-nnodes2048.ne1024pg2_ne1024pg2.F2010-SCREAMv1-timing.6907074.230404-030231-model_timing_stats
51+ Perlmutter GPU
52+ screamv1-pm-gpu-mar2023/data/pm-gpu-v1-scaling1-nnodes384.ne1024pg2_ne1024pg2.F2010-SCREAMv1-timing.8132452.230425-101318-model_timing_stats
53+ screamv1-pm-gpu-mar2023/data/pm-gpu-v1-scaling1-nnodes512.ne1024pg2_ne1024pg2.F2010-SCREAMv1-timing.5993462.230309-205142-model_timing_stats
54+ screamv1-pm-gpu-mar2023/data/pm-gpu-v1-scaling1-nnodes1024.ne1024pg2_ne1024pg2.F2010-SCREAMv1-timing.8421021.230505-110231-model_timing_stats
55+ screamv1-pm-gpu-mar2023/data/pm-gpu-v1-scaling1-nnodes1536.ne1024pg2_ne1024pg2.F2010-SCREAMv1-timing.8168038.230430-014707-model_timing_stats
0 commit comments