Context
The publish_dump_zenodo.yml workflow currently checks out the ror-community/ror-data repository and scans it for a zip file matching the release name in order to upload the dump to Zenodo. As part of the migration away from the ror-data repo, this workflow needs to be updated to download the dump zip from a ror-records release artifact instead.
Current behavior
- Checks out
ror-community/ror-data repo (line 38-40)
- Changes into
./ror-data directory (line 48)
- Downloads and runs
upload_dump_zenodo.py from curation_ops (schema-v2-1 branch) via raw curl
upload_dump_zenodo.py scans the current directory (DUMP_FILE_DIR = "./") for a zip matching the release name
- The script also calls the GitHub API for
ror-community/ror-updates to get release notes data (total orgs, added, updated counts)
- Uses
actions/checkout@v2 and Python 3.9 (both outdated)
Proposed changes
- Remove the ror-data checkout step entirely
- Add a step to download the dump zip from the ror-records release artifact using
gh release download {release-tag} --pattern '*ror-data*.zip'
- Download to a working directory, then
cd into it before running the script so that the script's DUMP_FILE_DIR assumption (current directory) still works
- Upgrade
actions/checkout to v4
- Upgrade Python to 3.11
- Check out
curation_ops properly instead of curling individual files (current workflow curls from the schema-v2-1 branch; the checkout should target main to match generate_dump.yml, or whichever branch is canonical at the time of implementation)
Open question
upload_dump_zenodo.py looks like it hardcodes a check that the filename contains "ror-data.zip" (line 221) and has an error message "Dump file not found in ror-data" (line 228). These are string conventions and should still work with the new artifact-based flow, but the error message is misleading and could be updated in the curation_ops sub-issue.
Files to modify
.github/workflows/publish_dump_zenodo.yml
Acceptance criteria
Context
The
publish_dump_zenodo.ymlworkflow currently checks out theror-community/ror-datarepository and scans it for a zip file matching the release name in order to upload the dump to Zenodo. As part of the migration away from the ror-data repo, this workflow needs to be updated to download the dump zip from a ror-records release artifact instead.Current behavior
ror-community/ror-datarepo (line 38-40)./ror-datadirectory (line 48)upload_dump_zenodo.pyfrom curation_ops (schema-v2-1 branch) via raw curlupload_dump_zenodo.pyscans the current directory (DUMP_FILE_DIR = "./") for a zip matching the release nameror-community/ror-updatesto get release notes data (total orgs, added, updated counts)actions/checkout@v2and Python 3.9 (both outdated)Proposed changes
gh release download {release-tag} --pattern '*ror-data*.zip'cdinto it before running the script so that the script'sDUMP_FILE_DIRassumption (current directory) still worksactions/checkoutto v4curation_opsproperly instead of curling individual files (current workflow curls from theschema-v2-1branch; the checkout should targetmainto matchgenerate_dump.yml, or whichever branch is canonical at the time of implementation)Open question
upload_dump_zenodo.pylooks like it hardcodes a check that the filename contains"ror-data.zip"(line 221) and has an error message"Dump file not found in ror-data"(line 228). These are string conventions and should still work with the new artifact-based flow, but the error message is misleading and could be updated in the curation_ops sub-issue.Files to modify
.github/workflows/publish_dump_zenodo.ymlAcceptance criteria
actions/checkoutupgraded to v4curation_opsis checked out properly rather than curled as raw files