Skip to content

Running multiple workbenches#1040

Open
DibyaGoswami wants to merge 8 commits into
mjordan:mainfrom
DibyaGoswami:Running-Multiple-Workbenches
Open

Running multiple workbenches#1040
DibyaGoswami wants to merge 8 commits into
mjordan:mainfrom
DibyaGoswami:Running-Multiple-Workbenches

Conversation

@DibyaGoswami
Copy link
Copy Markdown

Link to Github issue or other discussion

#1020
#999

What does this PR do?

This PR allows multiple workbenches to be run simultaneously without interfering with one another, given different input CSVs, allowing for larger amounts of data to be ingested in parallel.

What changes were made?

The changes include:

  • Added a function (get_config_file_identifier_shortened) in the workbench_utils.py file, that creates a unique suffix to add to the names of files that get created during each session of ingests
  • Each ingest session generates a unique identifier using the function, and this suffix is appended to the end of the filename of every file created during that session. This ensures that filenames remain distinct, prevents collisions when multiple ingests run concurrently, and allows users to easily identify which files belong to a specific ingest session. The generated google sheet or excel CSV will include this suffix in the filename (at the very end as it’s attached to the end of the filename), as well as the rollback file and the SQLite database file (also attached to the end of the filenames).
  • Made changes to how recovery mode ingests work. Now, it takes in another input config setting which is the session identifier (the unique suffix that is attached to the end of the filename that gets created in a session), so that the recovery mode can determine which ingest the recovery mode will be applied to.

How to test / verify this PR?

For testing, prepare two separate ingest configuration files (for example, config.yml and config2.yml), each pointing to a different input_csv spreadsheet. Open two terminals on the same workbench directory, and run the ingest process for each configuration file in its own respective terminal at the same time.

Verify that workbench creates a separate tmp files and sqllite database files with session suffix for each session.
Verify that workbench creates a rollback file with the same suffix as the session suffix.
Test and verify that recover mode works independently for each session.

Sample csv and demo objects are available here: https://github.com/Islandora-Devops/islandora_demo_objects

Interested Parties

@mjordan @digitalutsc @whikloj


Checklist

  • Before opening this PR, have you opened an issue explaining what you want to to do?
  • Have you included some configuration and/or CSV files useful for testing this PR?
  • Have you written unit or integration tests if applicable?
  • Does the code added in this PR require a version of Python that is higher than the current minimum version?
  • If the changes in this PR require an additional Python library, have you included it in setup.py?
  • If the changes in this PR add a new configuration option, have you provided a default for when the option is not present in the .yml file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant