Context
The indexing workflows tell the ROR API to download and index a data dump. They currently pass a data-env parameter that tells the API which GitHub repo to download the dump from (ror-data or ror-data-test). These workflows need to be updated in tandem with the ROR API changes so the API can locate the dump in the new release artifact location on ror-records.
Current behavior
- Both workflows take inputs:
release-dump (filename without .zip), schema-version (v2), data-env (test/prod)
- They call
index_dump.py, which constructs a URL as {api_url}/{filename}/{dataenv} and makes a GET request
- The
data-env parameter tells the ror-api which GitHub repo to download the dump from (ror-data or ror-data-test)
prod_index_dump.yml calls INDEX_DUMP_PROD_API_URL_V2
staging_index_dump.yml calls INDEX_DUMP_STAGING_API_URL_V2
Proposed changes
The exact changes depend on how the ror-api sub-issue as part of the same epic (#355) resolves the dataenv parameter question:
- If ror-api adds a
release-tag parameter: update index_dump.py and both workflows to pass the release tag instead of (or in addition to) data-env
- If ror-api keeps
dataenv but changes its semantics: update workflow input descriptions to reflect the new meaning
- Update
index_dump.py to construct the new URL format expected by the updated API endpoint
- Clean up workflow descriptions that reference "ror-data-test repo" and "ror-data" (lines 7, 17 in both files)
Files to modify
.github/workflows/prod_index_dump.yml
.github/workflows/staging_index_dump.yml
.github/workflows/index_dump.py
Acceptance criteria
Context
The indexing workflows tell the ROR API to download and index a data dump. They currently pass a
data-envparameter that tells the API which GitHub repo to download the dump from (ror-data or ror-data-test). These workflows need to be updated in tandem with the ROR API changes so the API can locate the dump in the new release artifact location on ror-records.Current behavior
release-dump(filename without .zip),schema-version(v2),data-env(test/prod)index_dump.py, which constructs a URL as{api_url}/{filename}/{dataenv}and makes a GET requestdata-envparameter tells the ror-api which GitHub repo to download the dump from (ror-data or ror-data-test)prod_index_dump.ymlcallsINDEX_DUMP_PROD_API_URL_V2staging_index_dump.ymlcallsINDEX_DUMP_STAGING_API_URL_V2Proposed changes
The exact changes depend on how the ror-api sub-issue as part of the same epic (#355) resolves the
dataenvparameter question:release-tagparameter: updateindex_dump.pyand both workflows to pass the release tag instead of (or in addition to)data-envdataenvbut changes its semantics: update workflow input descriptions to reflect the new meaningindex_dump.pyto construct the new URL format expected by the updated API endpointFiles to modify
.github/workflows/prod_index_dump.yml.github/workflows/staging_index_dump.yml.github/workflows/index_dump.pyAcceptance criteria
index_dump.pyconstructs the correct URL format for the updated ror-api endpoint