Skip to content

Conversation

@ghukill
Copy link
Contributor

@ghukill ghukill commented Nov 12, 2024

Purpose and background context

This PR addresses a small bug where the helper function run_ab_transforms.get_transformed_filename() was modifying a dictionary twice given the same dictionary for A and B records.

This was resulting in an extra underscore getting added to the output filenames for the B version of the record, specifically when the file had an index (e.g. _01 for Alma input files).

Before:

alma-2023-03-09-full-transformed-records-to-index_07.json  # A version
alma-2023-03-09-full-transformed-records-to-index__07.json # B version

After:

alma-2023-03-09-full-transformed-records-to-index_07.json  # A version
alma-2023-03-09-full-transformed-records-to-index_07.json # B version

How can a reviewer manually see the effects of these changes?

The following small job should be fully successful, where prior it would fail a validation check because all "B" records would be missed during joining.

NOTE: while this example would fully fail, in larger jobs, it would not fail validation because an entire subset of records were not missing A or B versions. It revealed itself as an unusually high number of A or B records being NULL after Transmogrifier, which was also not accurate or representative.

1- Set AWS prod credentials

2- Create job

pipenv run abdiff --verbose init-job -d output/jobs/fnamebug -a 008e20c -b 3f77c7c

3- Run diff for input file that has underscore (index) in filename

pipenv run abdiff --verbose run-diff \
-d output/jobs/fnamebug -m "testing 3 small alma files" --download-files \
-i s3://timdex-extract-prod-300442551476/alma/alma-2023-06-07-daily-extracted-records-to-index_01.xml,s3://timdex-extract-prod-300442551476/alma/alma-2023-06-07-daily-extracted-records-to-index_02.xml,s3://timdex-extract-prod-300442551476/alma/alma-2023-06-07-daily-extracted-records-to-index_03.xml

Includes new or updated dependencies?

NO

Changes expectations for external applications?

NO

What are the relevant tickets?

Developer

  • All new ENV is documented in README
  • All new ENV has been added to staging and production environments
  • All related Jira tickets are linked in commit message(s)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer(s)

  • The commit message is clear and follows our guidelines (not just this PR message)
  • There are appropriate tests covering any new functionality
  • The provided documentation is sufficient for understanding any new functionality introduced
  • Any manual tests have been performed or provided examples verified
  • New dependencies are appropriate or there were no changes

Why these changes are being introduced:

The output filenames were getting an extra underscore for the B version
of records.

How this addresses that need:
* Avoids modifying the filename details dictionary in place,
where it was getting modified twice for the 2nd file

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-387
"""Get transformed filename using extract filename details."""
filename_details.update(
stage="transformed",
index=f"_{sequence}" if (sequence := filename_details["index"]) else "",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the function was reworked slightly for brevity, this was the line that modified the dictionary in place, where the 2nd time it was passed, it was modified again.

Copy link
Contributor

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works as expected!

Copy link
Contributor

@jonavellecuerdo jonavellecuerdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@ghukill ghukill merged commit 082b9f3 into main Nov 13, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants