-
Notifications
You must be signed in to change notification settings - Fork 5
Description
The only method we currently have to create JP2 image derivatives is through the assemblyWF. This means that when we need to create new JP2 images for an already-accessioned item, we need to:
- retrieve the original image source files from preservation
- reaccession them via Goobi or Preassembly
It also means that when an item was accessioned without JP2s (such as a dark item), it has to be reaccessioned in order to create JP2s.
Because the source images are already in SDR preservation, it should be possible to have SDR retrieve the source images and then generate the JP2s without having to go through a manual download and reaccession process.
We now have existing precedents for re-running certain accessioning features without having to reaccession: both the OCR and speech-to-text workflows can be run without manual reaccessioning. This means that we can regenerate already-generated OCR, and OCR never OCR'd items, directly from Argo.
Requirements
To bring this functionality into Argo we will need:
- A mechanism for users to send a single-item through JP2 generation
- this could be a new button but I worry about the eventual proliferation of buttons on the Argo show page
- A mechanism for users to send a batch of items through JP2 generation
- Logic for when this feature can be run
- Logic for determining which files should be processed
Logic for when this can be run:
- item is an item (not APO/collection/agreement)
- item is a book, image, map, or media content type (these content types all use image viewer capability, which requires JP2)
- rights are not "dark, none"
- item contains at least one of the following resource types: image, page, media
- item contains at least one file that can be converted to JP2: TIFF, JPG, PNG (the types we support for
jp2-create
How this feature could generate the new JP2s:
- User requests JP2 regeneration (new version is opened)
- System identifies source files that can be converted
- Retrieve these files from preservation to
/dor/assembly/druid/tree - Run assemblyWF in the same way that Preassembly initiates it for an "update" - where the existing files are left alone, except new JP2s are created
- jp2-create should run in the "normal" way
- for each TIFF/JPG/PNG found on the filesystem, create a new JP2
- remainder of assemblyWF should also run as "normal" in order to generate checksums and image dimensions for the new files
- accessionWF should automatically shelve the new JP2s and remove the old ones from Stacks
We should have some of this machinery in place because we had to account for the "update JP2" case when we added incremental file updates to Preassembly. It may be possible to connect Argo up to this existing process. I'm also open to considering other ways to handle JP2 creation writ large.