Skip to content

KMS-663: Export Historical KMS Keyword Versions#103

Merged
htranho merged 14 commits intomainfrom
KMS-663
Apr 23, 2026
Merged

KMS-663: Export Historical KMS Keyword Versions#103
htranho merged 14 commits intomainfrom
KMS-663

Conversation

@htranho
Copy link
Copy Markdown
Contributor

@htranho htranho commented Apr 20, 2026

Overview

What is the feature?

Export Historical KMS Keyword Versions

What is the Solution?

Create scripts to:

  1. Download past version's rdf files from S3.
  2. Process locally (upload to local rdf4j, use kms to download csv) to get csv files for each scheme.
  3. Upload csv files to S3.

When publishing a new keyword version, rdf file and csv files of the new version are uploaded to S3.

What areas of the application does this impact?

Past versions processing.
Publishing new version.

Testing

  1. Use config to specify a version(s) to download from S3 (Edit scripts-config.sh). Run 'npm run download-rdf'. Downloaded files should be in 'archive-processor/downloaded-rdf/'
  2. Test change some keyword label.
  3. Start up local env (rdf4j, kms)
  4. Run 'npm run process-rdf' to upload the rdf from step 1 and then download the produced csv from local kms. Downloaded files should be in 'archive-processor/local-kms-csv/[version]/'
  5. Verify changes made in step 2.
  6. Run 'npm run upload-csv' to upload to S3.

Use MMT to publish a new version, then check AWS S3 to verify rdf and csv files uploaded in the new version directory in S3.

Attachments

N/A

Checklist

  • I have added automated tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.71%. Comparing base (d38be2e) to head (45f34f1).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #103      +/-   ##
==========================================
+ Coverage   99.67%   99.71%   +0.03%     
==========================================
  Files         154      156       +2     
  Lines        3108     3163      +55     
  Branches      741      745       +4     
==========================================
+ Hits         3098     3154      +56     
  Misses          9        9              
+ Partials        1        0       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread archive-processor/scripts/scripts-config.sh Outdated
@eudoroolivares2016
Copy link
Copy Markdown

  1. Make the CSV headers which we don't have data for use NA string
  2. Do we have an automated solution to update the consolidated CSVs on publish?

Comment thread archive-processor/scripts/process-rdf.js
@htranho
Copy link
Copy Markdown
Contributor Author

htranho commented Apr 22, 2026

  1. Make the CSV headers which we don't have data for use NA string
  2. Do we have an automated solution to update the consolidated CSVs on publish?
  1. Yes
  2. Yes, rdf and csv files are uploaded to S3 at publishing.

Comment thread serverless/src/publisher/handler.js
Comment thread serverless/src/shared/exportPublishSchemeCsvToS3.js
Comment thread bin/localstack/start.sh Outdated
Comment thread archive-processor/scripts/scripts-config.sh
Comment thread archive-processor/scripts/download-rdf-from-S3.js
Comment thread archive-processor/scripts/download-rdf-from-S3.js Outdated
Comment thread bin/localstack/start.sh
Copy link
Copy Markdown
Contributor

@cgokey cgokey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had one more minor comment but then I think it looks good.

*/
const clearContext = async (version) => {
console.log(`Clearing context: ${version}`)
const graphUri = `https://gcmd.earthdata.nasa.gov/kms/version/${version}`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const graphUri = `https://gcmd.earthdata.nasa.gov/kms/version/${version}`
const graphUri = `https://cmr.earthdata.nasa.gov/kms/version/${version}`

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this the old endpoint on KMS 1.0

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All uri, resource in rdf are using gcmd.earthdata... We will need to change it in future work if we want?

Comment thread serverless/src/shared/awsClients.js
Comment thread archive-processor/scripts/process-rdf.js
Comment thread serverless/src/shared/__tests__/exportRdfToS3.test.js Outdated
Comment thread serverless/src/shared/__tests__/exportPublishSchemeCsvToS3.test.js Outdated
@htranho htranho merged commit 326675f into main Apr 23, 2026
6 checks passed
@htranho htranho deleted the KMS-663 branch April 23, 2026 21:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants