2.2.2
This release includes new features to facilitate synchronizing new ProQuest ETDs from AWS S3 and loading these ETDs into GW ScholarSpace while avoiding duplicate loads.
Instructions for importing ETDs from ProQuest have been revised on the Wiki: https://github.com/gwu-libraries/scholarspace-hyrax/wiki/Bulkrax-imports
New features
ETD pipeline
- New
proquest_zipfilemetadata field on GwETD type works, which is intended to store the filename of the original ProQuest zip file - e.g.etdadmin_upload_100535.zip(#572). This field is not visible to site users, but users with edit rights can see it when editing a work. - New rake tasks:
gwss:populate_etd_proquest_zipfileThis should only need to be run once, for migration of existing GwETD works. It matches up the filename of the main PDF file on each GwETD (e.g.Anderson_gwu_0075M_16591.pdf) with the main PDF file within each ProQuest zip file in S3 (e.g.etdadmin_upload_1075322.zip)gwss:download_new_pq_zips- Downloads new (and only new) ETDs from S3, by comparing filenames in S3 withproquest_zipfilevalues on GwETD works.
- Improvements to rake tasks:
gwss:ingest_pq_etds
Technical debt
- Removes old rake tasks for ingesting Bulkrax content that are now no longer needed. (#571). Consistent with this, https://github.com/gwu-libraries/etd-loader and https://github.com/gwu-libraries/batch-loader repositories have been archived.
- Removes remnants of Travis CI (#573)
Upgrade instructions
Prerequisites
Set values in .env for:
AWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEYAWS_REGIONAWS_PROQUEST_ETD_BUCKET_NAME
Install new gem(s)
Run bundle install
Populate proquest_zipfile on existing ETDs:
Run the gwss:populate_etd_proquest_zipfile task. Edit one of the GwETDs and observe that proquest_zipfile is (correctly) populated.
(Optionally) Load latest ProQuest ETDs from S3
Follow the instructions at https://github.com/gwu-libraries/scholarspace-hyrax/wiki/Bulkrax-imports to download new ETDs from S3, create the Bulkrax manifest, import into GW ScholarSpace, and clean up.