TIMX 387 - Fix bug in Transmogrifier output filename #62
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Purpose and background context
This PR addresses a small bug where the helper function
run_ab_transforms.get_transformed_filename()was modifying a dictionary twice given the same dictionary for A and B records.This was resulting in an extra underscore getting added to the output filenames for the B version of the record, specifically when the file had an index (e.g.
_01for Alma input files).Before:
After:
How can a reviewer manually see the effects of these changes?
The following small job should be fully successful, where prior it would fail a validation check because all "B" records would be missed during joining.
NOTE: while this example would fully fail, in larger jobs, it would not fail validation because an entire subset of records were not missing A or B versions. It revealed itself as an unusually high number of A or B records being NULL after Transmogrifier, which was also not accurate or representative.
1- Set AWS prod credentials
2- Create job
3- Run diff for input file that has underscore (index) in filename
pipenv run abdiff --verbose run-diff \ -d output/jobs/fnamebug -m "testing 3 small alma files" --download-files \ -i s3://timdex-extract-prod-300442551476/alma/alma-2023-06-07-daily-extracted-records-to-index_01.xml,s3://timdex-extract-prod-300442551476/alma/alma-2023-06-07-daily-extracted-records-to-index_02.xml,s3://timdex-extract-prod-300442551476/alma/alma-2023-06-07-daily-extracted-records-to-index_03.xmlIncludes new or updated dependencies?
NO
Changes expectations for external applications?
NO
What are the relevant tickets?
Developer
Code Reviewer(s)