Skip to content

Conversation

@ghukill
Copy link
Contributor

@ghukill ghukill commented Nov 4, 2024

Purpose and background context

Originally, the library jsondiff was used for getting a diff between the A and B version of a JSON record. This was selected based on prior use and experience. For the most part it was a nice fit, but lacked the ability to ignore changes in array order as a difference to report. This resulted in high numbers of records showing a diff, when the only change was a differently ordered array.

The solution was to utilize another JSON diffing library, DeepDiff.

How this addresses that need:

  • DeepDiff library has built-in support to ignore_order when generating diffs, allowing us to programatically ignore order in arrays as a diff to report
  • DeepDiff is as fast, or faster, than jsondiff making it a performant option
  • Our primary, current use of diffs is to highlight records where a TIMDEX field changed, which is conveniently a "root" field on the JSON object. DeepDiff provides the attribute .affected_root_keys on the diff object that returns an explicit list of modified "root" fields, which is just what we need, further simplifying some logic in ABDiff to parse this information. Here is where that is applied in our code. Note in this commit how additional diff parsing logic is no longer needed.

Overall, DeepDiff is a more configurable library, with built-in options that align closely with our use case.

How can a reviewer manually see the effects of these changes?

See the following tests to understand how array order is an optional difference to report (where we default to not reporting it):

Includes new or updated dependencies?

YES

Changes expectations for external applications?

NO

What are the relevant tickets?

Developer

  • All new ENV is documented in README
  • All new ENV has been added to staging and production environments
  • All related Jira tickets are linked in commit message(s)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer(s)

  • The commit message is clear and follows our guidelines (not just this PR message)
  • There are appropriate tests covering any new functionality
  • The provided documentation is sufficient for understanding any new functionality introduced
  • Any manual tests have been performed or provided examples verified
  • New dependencies are appropriate or there were no changes

Why these changes are being introduced:

Originally, the library jsondiff was used for getting a diff between
the A and B version of a record.  This was selected based on prior
use and experience.  For the most part, it was a nice fit, but lacked
the ability to ignore changes in array order as a diff to report.

This resulted in high numbers of records showing diff, when the only
change was a differently ordered array.

How this addresses that need:
* The DeepDiff library has built-in support to 'ignore_order' when
generating diffs, making it a great fit
* DeepDiff is equally fast for generating diffs
* Our primary, current use of diffs is to highlight records where a
TIMDEX field changed, which is conveniently a "root" field on the
JSON object.  DeepDiff provides an attribute on the diff object
that returns an explicit list of modified "root" fields, which is
just what we need, further simplifying some logic in ABDiff to
parse this information.

Overall, DeepDiff is a more configurable library, with built-in
options that align closely with our use case.

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-373
Why these changes are being introduced:

Now that the DeepDiff library provides an explicit list of
modified "root" fields -- i.e. TIMDEX fields -- as a built-in
property, we no longer need any additional logic to parse the diff
and surface what fields were modified.

How this addresses that need:
* This removes the helper function generate_field_diff_bools_for_record()
and any tests related to it.

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-373
@coveralls
Copy link

Pull Request Test Coverage Report for Build 11666616180

Details

  • 17 of 17 (100.0%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.02%) to 91.184%

Totals Coverage Status
Change from base Build 11636162593: -0.02%
Covered Lines: 693
Relevant Lines: 760

💛 - Coveralls

Copy link
Contributor

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question, but this is a great update!

@ghukill ghukill requested a review from ehanson8 November 4, 2024 19:59
Copy link
Contributor

@jonavellecuerdo jonavellecuerdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great find! Awesome job. 🎉

Comment on lines -110 to -135
def generate_field_diff_bools_for_record(diff_data: dict) -> dict:
"""Function to return dictionary of fields that have a diff.
Determining if a field had a diff is as straight-forward as looking to see if it shows
up in the parsed diff JSON. The fields may be at the root of the diff, or they could
be nested under "$insert" or "$delete" nodes in the diff.
If a field from the original A/B records are not in the diff at all, then they did not
have changes, and therefore will not receive a 1 here to indicate a diff.
"""
fields_with_diffs = {}

for key in diff_data:

# identify modified fields nested in $insert or $delete blocks
if key in ("$insert", "$delete"):
for subfield in diff_data[key]:
fields_with_diffs[subfield] = 1

# identified modified fields at root of diff
else:
fields_with_diffs[key] = 1

return fields_with_diffs


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woohoo! ✨

@ghukill ghukill merged commit cea0a91 into main Nov 5, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants