You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add option to download input files using a local MinIO server
Why these changes are being introduced:
* Downloading extract files improves the performance of the app by
reducing requests sent to AWS S3 and avoiding repeated downloads of
extract files used across multiple container runs. Having extract files
available on local disk also minimizes the occurence of network issues
or AWS credentials timing out during a transform. These changes introduces
a locally hosted MinIO server to act as a "local S3 bucket" as part of
the A/B diff workflow.
How this addresses that need:
* Add a Docker Compose YAML file to run local MinIO server
* Add Makefile commands for starting and stopping local MinIO server
* Add option '--download-files' to run-diff CLI command
* Implement download_input_files core function
* Update run_ab_transforms to suport use of local MinIO server
Side effects of this change:
* None
Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-353
Copy file name to clipboardExpand all lines: README.md
+37Lines changed: 37 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,6 +15,38 @@ Compare transformed TIMDEX records from two versions (A,B) of Transmogrifier.
15
15
- To lint the repo: `make lint`
16
16
- To run the app: `pipenv run abdiff --help`
17
17
18
+
### Storing Files in a Local MinIO Server
19
+
20
+
TIMDEX extract files from S3 (i.e., input files to use in transformations) can be downloaded to a local MinIO server hosted via Docker container. [MinIO is an object storage solution that provides an Amazon Web Services S3-compatible API and supports all core S3 features](https://min.io/docs/minio/kubernetes/upstream/). Downloading extract files improves the runtime of a diff by reducing the number of requests sent to S3 and avoids repeated downloads of extract files.
21
+
22
+
1. Configure your `.env` file. In addition to the [required environment variables](#required), the following environment variables must also be set:
Note: There are additional variables required by the Local MinIO server (see vars prefixed with "MINIO" in [optional environment variables](#optional)). For these variables, defaults are provided in [abdiff.config](abdiff/config.py).
30
+
31
+
2. Create an AWS profile `minio`. When prompted for an "AWS Access Key ID" and "AWS Secret Access Key", pass the values set for the `MINIO_ROOT_USER` and `MINIO_ROOT_PASSWORD` environment variables, respectively.
32
+
```shell
33
+
aws configure --profile minio
34
+
```
35
+
36
+
3. Launch a local MinIO server via Docker container by running the Makefile command:
37
+
```shell
38
+
make start-minio-server
39
+
```
40
+
41
+
The API is accessible at: http://127.0.0.1:9000.
42
+
The WebUI is accessible at: http://127.0.0.1:9001.
43
+
44
+
4. On your browser, navigate to the WebUI and sign into the local MinIO server. Create a bucket in the local MinIO server named after the S3 bucket containing the TIMDEX extract files that will be used in the A/B Diff.
45
+
46
+
5. Proceed with A/B Diff CLI commands as needed!
47
+
48
+
Once a diff run is complete, you can stop the local MinIO server using the Makefile command: `make stop-minio-server`. If you're planning to run another diff using the same files -- good news! All you have to do is restart the local MinIO server. Your data will persist as long as the files exist in the directory you specified for `MINIO_S3_LOCAL_STORAGE`.
49
+
18
50
## Concepts
19
51
20
52
A **Job** in `abdiff` represents the A/B test for comparing the results from two versions of Transmogrifier. When a job is first created, a working directory and a JSON file `job.json` with an initial set of configurations is created.
@@ -90,6 +122,11 @@ AWS_SESSION_TOKEN=# passed to Transmogrifier containers for use
90
122
### Optional
91
123
92
124
```text
125
+
MINIO_S3_LOCAL_STORAGE=# full file system path to the directory where MinIO stores its object data on the local disk
126
+
MINIO_S3_URL=# endpoint for MinIO server API; default is "http://localhost:9000/"
127
+
MINIO_S3_CONTAINER_URL=# endpoint for the MinIO server when acccessed from inside a Docker container; default is "http://host.docker.internal:9000/"
128
+
MINIO_ROOT_USER=# username for root user account for MinIO server
129
+
MINIO_ROOT_PASSWORD=# password for root user account MinIO server
93
130
WEBAPP_HOST=# host for flask webapp
94
131
WEBAPP_PORT=# port for flask webapp
95
132
TRANSMOGRIFIER_MAX_WORKERS=# max number of Transmogrifier containers to run in parallel; default is 6
0 commit comments