-
Notifications
You must be signed in to change notification settings - Fork 0
Add data finding script #62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@chengzhuzhang We had discussed determining how many copies of the simulation data we had, and if any of them could be removed. The script added here checks a number of paths we might expect to find data. The results: Table of data locationsThis is a big table, scroll to see further columns.
Things to note:
|
|
@forsyth2 It's good to have this information, but at this point I don't think we need to ask user to delete data. Based on the table, it does look like the v1 BGC simulations are not included, which you can use Tony's copy under publication-archive |
Well, NERSC is generally adverse to having duplicated data, but if we have the space as a project, I suppose it's ok. But again, there's still the problem that I do not have the space as an individual user of NERSC.
Yes, that's correct. They previously weren't included on the list of simulations to include and I haven't had a chance to add them yet. On that note, |
Let's hear from @ndkeen how to best resolve this space problem.
Good catch! Yes, let's add both BGC and cryo to make it complete |
|
This PR is for this script that counts data duplicates, so we can move discussion of remaining v1 data to #63. |
Add data finding script