Skip to content

Conversation

@AaronDonahue
Copy link
Collaborator

Add provenance and data location information for the SCREAMv0 DYAMOND2 simulation and the SCREAMv1 4-Seasons simulations.

@AaronDonahue AaronDonahue requested a review from forsyth2 October 24, 2024 17:44
@forsyth2
Copy link
Collaborator

@AaronDonahue @crterai FYI I just want to iron out some details for integrating v2.1 data (#50) and then I can make any necessary adjustments in this pull request as a new commit, to keep the code/doc design consistent. (In particular, I'm planning to list the simulations in a csv rather than hard-code them in the rst).

Also, do you have any original scripts beyond run_scripts/SCREAMv0/original/run.production.ne1024pg2_scream.dyamond2.sh? I imagine the simulations in the table were run with different run scripts.

@forsyth2 forsyth2 force-pushed the ASD_CRT/add_screamv0_v1_sim_data branch from 62e529f to 2af396d Compare November 13, 2024 19:22
@forsyth2 forsyth2 force-pushed the ASD_CRT/add_screamv0_v1_sim_data branch from 2af396d to d38115b Compare November 13, 2024 19:22
Copy link
Collaborator

@forsyth2 forsyth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AaronDonahue @crterai I think given the following:

  1. competing priorities at the moment
  2. the upcoming publication of the newsletter pointing to this data
  3. the csv conversion being a bit more involved than I thought
  4. Perlmutter being down today (which I need for automatic data size calculation)

we'll just leave this hard-coded for now. Perhaps I will clean that up when there is more time available (meaning the backend will change, but nothing will change for people viewing the data on the website).

For now though, I just made a very small commit to get the web pages to appear correctly. My latest build of the docs can be seen at https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/data_docs_49/html/.

Action items for you two:

  • Please review the SCREAM pages there and tell me if everything looks as expected. If so, I'll merge this PR.
  • The one original script you have isn't linked anywhere. Do you want to 1) add more original scripts? and/or 2) link the original scripts from a table?
  • The FourSeasons reproduction table is largely empty; do you want to even include this table in the first place?
  • Are there any ESGF links you want to include?
Table screenshots

Screenshot 2024-11-13 at 11 30 47 AM

Screenshot 2024-11-13 at 11 32 39 AM

Screenshot 2024-11-13 at 11 32 59 AM


If you're interested, I've described below the potential refactoring I had in mind, which I may return to when we're under less of a time crunch:

Proposed refactoring

The idea was to generate all the tables using a csv, because the rst files can be tricky to format correctly. The automatic table generation also has the advantage of updating everything all at once. E.g., the code calculates the data size on HPSS for each simulation.

I'm currently trying to do that with v2.1 data in #50.

However, the data provided here and structural organization of that data is a bit different from what the auto-generation code (generate_tables.py) is expecting. Notice:

Web page row categories (bolded rows) simulation table columns reproduction table columns
v2 > E3SMv2 (Water Cycle) resolution > category Simulation, Data Size (TB), ESGF Links, HPSS Path Simulation, Machine, 10 day checksum, Reproduction Script, Original Script (requires significant changes to run!!)
SCREAMv0 > SCREAMv0 DYAMOND2 Simulation Name (num days) Simulation, Data Size (TB), NERSC HPSS Path (notice missing ESGF links column) No reproduction table
SCREAMv1 > SCREAMv1 Four Seasons Simulation Name (num days) Simulation, Data Size (GB), NERSC HPSS Path (notice missing ESGF links column) Simulation, Machine, 10-day checksum (all empty), Reproduction Script (all empty) (notice missing Original Script column)

Problems with extending auto-generation to this data (thus making refactoring non-trivial and hard-coding the immediate solution):

  • Different organizational structure. generate_tables.py is expecting a rigid organizational structure (version > group > resolution > category > simulation), which doesn't appear to be matched up here, notably the resolution > category part.
  • Different columns in the tables.
  • The simulations are named very differently (e.g., long name with many dots versus a month name)

@crterai
Copy link
Collaborator

crterai commented Nov 14, 2024

Thanks for reviewing, @forsyth2.
Below are my responses:

Please review the SCREAM pages there and tell me if everything looks as expected. If so, I'll merge this PR.

The SCREAM pages look as I'd expected.

The one original script you have isn't linked anywhere.

I've placed the link to the run script on the line "Scripts originally used to run SCREAMv0 simulations are available here." on this page: https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/data_docs_49/html/SCREAMv0/DYAMOND2/simulation_data/index.html
The link currently doesn't work because it points to the location on the main branch, but I'm expecting it to work once we have this PR merged. I didn't put it in a reproduction table, because it was run on a machine that doesn't exist and I don't want there to be an expectation that others can just take the run script and have it produce BFB answers or work right out of the box.

The FourSeasons reproduction table is largely empty; do you want to even include this table in the first place?

Maybe we should remove it? Curious what @AaronDonahue thinks though.

Are there any ESGF links you want to include?

We haven't published any of the SCREAM data on ESGF, so no ESGF links.

@AaronDonahue
Copy link
Collaborator Author

AaronDonahue commented Nov 14, 2024

Hi Ryan, thank you for taking a look at this and fixing it up a bit. In response to your questions,

Please review the SCREAM pages there and tell me if everything looks as expected. If so, I'll merge this PR.

Looks great to me

The one original script you have isn't linked anywhere.

Do we need something like this for the FourSeasons runs as well?

The FourSeasons reproduction table is largely empty; do you want to even include this table in the first place?

Lets just remove it. I was using the v2 watercycle pages as a template but I don't think we need this for FourSeasons. If it is requested then I am happy to add it in a subsequent PR.

@forsyth2
Copy link
Collaborator

forsyth2 commented Nov 14, 2024

Thanks!

I've placed the link to the run script on the line "Scripts originally used to run SCREAMv0 simulations are available here.

@crterai Oh I missed that link. Ok, that looks good. So that one script was used to generate everything? I'm wondering if we need to include any further scripts.

For v2, we listed original scripts along with reproduction scripts on https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/data_docs_49/html/v2/WaterCycle/reproducing_simulations/reproduction_table.html, but this method is fine.

Do we need something like this for the FourSeasons runs as well?

@AaronDonahue That's really up to you two / the SCREAM team. I think it's probably good to have original scripts included somewhere so people can see how you generated the data, but I suppose it's not a requirement.

Lets just remove it.

Ok, that sounds good.

Action items:

  • @AaronDonahue @crterai let me know if you want to add any more original scripts.
  • I'll remove the reproduction table docs
  • I think we can then merge this.

@crterai
Copy link
Collaborator

crterai commented Nov 14, 2024

Thanks for the follow up, @forsyth2.

So that one script was used to generate everything?

Yes. We only had one 40-day production simulation with SCREAMv0, and that script was what was used to run it.

@AaronDonahue
Copy link
Collaborator Author

@forsyth2 , I can add the run scripts I used. I am going to push a commit to the branch with them stored in the directory
run_scripts/SCREAMv1/FourSeasons

Can you add the link where appropriate? Like you were able to do with the SCREAMv0 links?

@forsyth2
Copy link
Collaborator

Ok, I think everything is looking good here, so I will merge.

@AaronDonahue @crterai If you do find anything you want to change, it's easy enough to open an new PR and make an adjustment, as long as you don't change the URLs you're pointing people to.

@forsyth2 forsyth2 merged commit b712bc6 into main Nov 14, 2024
1 check passed
@forsyth2 forsyth2 deleted the ASD_CRT/add_screamv0_v1_sim_data branch November 14, 2024 22:30
@forsyth2
Copy link
Collaborator

forsyth2 commented Nov 14, 2024

@AaronDonahue @crterai The docs have now built on the actual website: https://docs.e3sm.org/e3sm_data_docs/_build/html/index.html. Please confirm everything looks good. I did check that the run script links now point to valid URLs.

(This is mainly an issue on my end, but for reference, I created #51 to address the proposed refactoring I mention in #49 (review)).

@crterai
Copy link
Collaborator

crterai commented Nov 14, 2024

Thanks for pushing this through, @forsyth2. It looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants