Skip to content

Conversation

Copy link

Copilot AI commented Sep 8, 2025

This PR adds missing input data files that are required for E3SM tests but were not previously included in the container build files.

Background

Recent E3SM test runs were failing due to missing input data files. The test logs showed that several files were being downloaded successfully but were not pre-included in the container, causing tests to fail when network access was limited or when the files weren't available during the test execution.

Changes Made

Files Added to inputdata/files.txt (12 new files):

  • Ocean/SST data files:

    • domain.camocn.0.9x1.25_gx1v6_090403.nc - Ocean fraction domain file
    • sst_HadOIBl_bc_0.9x1.25_clim_c040926.nc - SST boundary conditions
  • DATM forcing files:

    • nyf.giss.T62.051007.nc, nyf.gxgxs.T62.051007.nc, nyf.ncep.T62.050923.nc - NYF forcing data
    • COREv2.correction_factors.T62.121007.nc - CORE2 correction factors
    • domain.T62.050609.nc - DATM domain file
  • SCREAM model files:

    • Updated table files with v2 versions: vn_table_vals_v2.dat8, vm_table_vals_v2.dat8, revap_table_vals_v2.dat8, mu_r_table_vals_v2.dat8
    • New mapping file: map_ne4pg2_to_ne2pg2_c20240902.nc

Files Added to inputdata/files-standalone.txt (5 new files):

Added the SCREAM v2 table files and mapping file to support standalone SCREAM tests.

Impact

These additions ensure that E3SM tests have access to all required input data files without needing to download them at runtime, improving test reliability and performance. The changes are purely additive - no existing files were removed or modified.

All new URLs follow the established pattern: https://web.lcrc.anl.gov/public/e3sm/inputdata/...

Fixes #31.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • web.lcrc.anl.gov
    • Triggering command: curl -I REDACTED (dns block)

If you need me to access, download, or install something from one of these locations, you can either:


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copy link
Author

Copilot AI commented Sep 8, 2025

@mahf708 👋 This repository doesn't have Copilot instructions. With Copilot instructions, I can understand the repository better, work faster and produce higher quality PRs.

I can generate a .github/copilot-instructions.md file for you automatically. Click here to open a pre-filled issue and assign it to me. I'll write the instructions, and then tag you for review.

Copilot AI changed the title [WIP] add new data files needed for tests Add missing input data files needed for E3SM tests Sep 8, 2025
Copilot AI requested a review from mahf708 September 8, 2025 15:24
Copilot finished work on behalf of mahf708 September 8, 2025 15:24
@bartgol
Copy link

bartgol commented Sep 10, 2025

@mahf708 Is this meant to fix all the gh / ci fails we've been seeing on E3SM prs?

@mahf708
Copy link
Collaborator

mahf708 commented Sep 10, 2025

Yes, just waiting on the input server to be back up and then we run this, update the other containers here and quick pr on the e3sm repo to increment up the version

@mahf708
Copy link
Collaborator

mahf708 commented Sep 14, 2025

@rljacob @bartgol we have an issue here; the data is pushing the runners beyond size limits. Thoughts on what to do? Feel like this model of needing loads of inputdata files is not sustainable...

@rljacob
Copy link
Member

rljacob commented Sep 14, 2025

Which recently added tests need the new data? We can probably reduce it somehow. Lower resolution or fewer time slices.

@mahf708
Copy link
Collaborator

mahf708 commented Sep 14, 2025

E3SM-Project/E3SM#7617

@mahf708
Copy link
Collaborator

mahf708 commented Sep 14, 2025

I can take a deeper look at trimming these containers, but I'm relatively swamped these days, so it will take me a bit of time (weeks)... I hope someone could take a look before we get into trouble with the server people :/

@bartgol
Copy link

bartgol commented Sep 15, 2025

Would having 2+ containers be sustainable? We can have a "base" one, with just the software stack, and build on top of that a few containers that have the input data needed by different compsets/tests. I don't think it's a beautiful solution, but it might help.

Edit: i just googled the gh containers size limit, to see what the cap was, and google AI says the limit is 10GB per layer. Is that correct? If so, do we just need to split data into multiple layers? Seems to good to be true, but I figured I asked...

@mahf708
Copy link
Collaborator

mahf708 commented Sep 15, 2025

we can get rid of 20 gb of stuff in these containers, but it will take careful work. I can get to this later...

I think your solution makes sense to me. Note that I am kind of doing exactly that now right. I build a bare base now with data only, then add the software stack on top of that.

I guess what I am trying to say is, this is not really a hard problem, it will just take time iterating. I will get to it at some point, but I am short on time for the next several weeks (lots of non-software tasks that I delayed over the months ... :/ ... )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add new data files needed for tests

4 participants