Skip to content

Commit 2477924

Browse files
committed
Allow failed Transmogrifier containers
Why these changes are being introduced: It was suggested in a previous PR that we raise an exception if ANY Transmogrifier containers fail. This appeared to make sense from a data integrity POV, ensuring that all input files were properly transformed by Transmogrifier for analysis. In retrospect, this fell short in two ways: 1. For very large runs, a very small number of containers failing may be admissible given the run will still contain valuable data and should be allowed to continue. 2. More subtle, a failed Transmogrifier container may not be an intermittent bug, but potentially an indication that a code change in Transmog is problematic. The responsibility here should be for this application to surface this in a meaningful way during analysis (out of scope here), not halt the run completely. How this addresses that need: * Creates new env var ALLOW_FAILED_TRANSMOGRIFIER_CONTAINERS, that defaults to 'true' in the Config class, to allow failed containers * Continue with run even if failed containers are present Side effects of this change: * Runs with failed Transmog containers will continue. Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/TIMX-383
1 parent 8c5fe22 commit 2477924

File tree

3 files changed

+10
-1
lines changed

3 files changed

+10
-1
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -138,6 +138,7 @@ TRANSMOGRIFIER_MAX_WORKERS=# max number of Transmogrifier containers to run in p
138138
TRANSMOGRIFIER_TIMEOUT=# timeout for a single Transmogrifier container; default is 5 hours
139139
TIMDEX_BUCKET=# when using CLI command 'timdex-sources-csv', this is required to know what TIMDEX bucket to use
140140
PRESERVE_ARTIFACTS=# if 'true', intermediate artifacts like transformed files, collated records, etc., will not be automatically removed
141+
ALLOW_FAILED_TRANSMOGRIFIER_CONTAINERS=# if 'true' (default), the run will continue even if some Transmogrifier containers failed to complete successfully
141142
```
142143

143144
## CLI commands

abdiff/config.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ class Config:
2222
"TRANSMOGRIFIER_TIMEOUT",
2323
"TIMDEX_BUCKET",
2424
"PRESERVE_ARTIFACTS",
25+
"ALLOW_FAILED_TRANSMOGRIFIER_CONTAINERS",
2526
)
2627

2728
def __getattr__(self, name: str) -> Any: # noqa: ANN401
@@ -88,6 +89,13 @@ def preserve_artifacts(self) -> bool:
8889
self.PRESERVE_ARTIFACTS and self.PRESERVE_ARTIFACTS.strip().lower() == "true"
8990
)
9091

92+
@property
93+
def allow_failed_transmogrifier_containers(self) -> bool:
94+
return bool(
95+
self.ALLOW_FAILED_TRANSMOGRIFIER_CONTAINERS
96+
and self.ALLOW_FAILED_TRANSMOGRIFIER_CONTAINERS.strip().lower() == "true"
97+
)
98+
9199

92100
def configure_logger(logger: logging.Logger, *, verbose: bool) -> str:
93101
if verbose:

abdiff/core/run_ab_transforms.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@ def run_ab_transforms(
112112
# process results
113113
log_file = aggregate_logs(run_directory, containers)
114114
logger.info(f"Log file created: {log_file}")
115-
if exceptions:
115+
if not CONFIG.allow_failed_transmogrifier_containers and exceptions:
116116
raise RuntimeError( # noqa: TRY003
117117
f"{len(exceptions)} / {len(containers)} containers failed "
118118
"to complete successfully."

0 commit comments

Comments
 (0)