generated from MITLibraries/python-cli-template
-
Notifications
You must be signed in to change notification settings - Fork 0
Core Functions and CLI signatures
Graham Hukill edited this page Sep 25, 2024
·
6 revisions
- arguments
-
job_directory: [required] -
message: [optional] message to include injob.json
-
- actions
- creates job directory; throw exception if exists
- creates
job.jsonwith job initial details: job working directory and message
- returns
- job directory:
str
- job directory:
- arguments
-
job_directory: [required] -
commit_sha_a: [required] -
commit_sha_b: [required]
-
- actions
- builds A/B images of Transmogrifier based on A/B git commits SHAs provided
- updates
job.jsonwith these newly created Docker image names
- returns
- Docker A/B image names:
tuple[str, str]
- Docker A/B image names:
- arguments
-
job_directory: [required] -
message: [optional] message to include in run.json
-
- actions
- creates run directory as sub-directory of /runs
- name is
YYYY-MM-DD_HH-MM-SStimestamp as the name - throw exception if Job directory doesn’t exist
- clones
job.jsonand creates run.json with run details: run working directory, message, timestamp, etc.
- returns
- run directory:
str
- run directory:
- arguments
-
run_directory: [required]
-
e.g. output/super-job/runs/2024-08-13_13-01-44
e.g. s3://my_bucket/transmog-ab/super-job/runs/2024-08-13_13-01-44
e.g. /my/weird/local/path/abstuff/...
-
image_name_a: [required] docker image of Transmogrifier version A -
image_name_b: [required] docker image of Transmogrifier version B -
input_files: [required],list[str]list of S3 (or local?) files to be transformed in A/B runs - actions
- reads input files, runs A/B transforms for all files
- creates transformed/a and transformed/b sub-directories under
<JOB>/runs/<RUN>directory - updates
run.jsonwith the input files used
- returns
- tuple of a/b transformed files:
tuple[list, list]- paths are relative to run directory, e.g.
- tuple of a/b transformed files:
(
[“transformed/a/alma-01.json”, “transformed/a/aspace.json”,…],
[“transformed/b/alma-01.json”, “transformed/b/aspace.json”,…]
)
- arguments
-
run_directory: [required] -
transformed_files: [required] tuple of A/B transformed filepaths
-
- actions
- utilizes transformed A/B files from
<JOB>/runs/<RUN>/transformed/a|bdirectories - creates dataset of parquet files under
<JOB>/runs/<RUN>/collated/*.parquet
- utilizes transformed A/B files from
- returns
- directorey of collated parquet dataset:
str
- directorey of collated parquet dataset:
- arguments
-
run_directory: [required] -
collated_directory: [required] directory filepath with collated parquet files
-
- actions
- creates diff for each record
- creates dataset of parquet files under
<JOB>/runs/<RUN>/diffs/*.parquet
- returns
- directory holding parquet files with diffs:
str
- directory holding parquet files with diffs:
- arguments
-
run_directory: [required] -
diff_file: [required] parquet filepath with collated records and diff
-
- actions
- generates metrics
- writes
metrics.jsonto run directory- consider they get added to
run.json?
- consider they get added to
- returns
- metrics dictionary:
dict
- metrics dictionary:
- arguments
-
job-directory/d: [required] location where Job is to be created, can be anywhere -
message/m: [optional] message added to job.json- both passed to init_job
-
transmogrifier-a-sha/a: [required] git commit SHA of transmogrifier to build -
transmogrifier-b-sha/b: [required] git commit SHA of transmogrifier to build- both passed to
build_ab_images
- both passed to
-
- actions
- runs
init_job→build_ab_images
- runs
- arguments
-
job-directory/d: [required] Job directory -
message/m: [optional] message added to run.json- both passed to init_run
-
input-files/i: [required] list of S3 (or local?) files to be transformed in A/B runs- passed to
run_ab_transforms
- passed to
-
- actions
- reads
job.json- here we get transmog image names from to pass to run_ab_transforms
- runs
init_run→run_ab_transforms→collate_ab_transforms→calc_ab_record_diffs→calc_ab_metrics
- reads
- arguments
-
job-directory/d: [required] provide working directory of Job
-
- actions
- runs Flask app with an env var set to Job directory (this is then used throughout Flask app)
- reads Job directory and crawls Runs directories, provides table of runs (with messages if included)
- rest of functionality as outlined…
pipenv run abdiff init-job \
--job-directory output/super-job \
--transmogrifier-a-sha abc123 \
--transmogrifier-b-sha def456# Run 1
pipenv run abdiff run-diff \
--job-directory output/super-job \
--input-files s3://alma.xml,s3://aspace.xml,...
# Run 2
pipenv run abdiff run-diff \
--job-directory output/super-job \
--input-files s3://libguides.xml