-
Notifications
You must be signed in to change notification settings - Fork 0
Use DeepDiff library for a/b record diffing #47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Why these changes are being introduced: Originally, the library jsondiff was used for getting a diff between the A and B version of a record. This was selected based on prior use and experience. For the most part, it was a nice fit, but lacked the ability to ignore changes in array order as a diff to report. This resulted in high numbers of records showing diff, when the only change was a differently ordered array. How this addresses that need: * The DeepDiff library has built-in support to 'ignore_order' when generating diffs, making it a great fit * DeepDiff is equally fast for generating diffs * Our primary, current use of diffs is to highlight records where a TIMDEX field changed, which is conveniently a "root" field on the JSON object. DeepDiff provides an attribute on the diff object that returns an explicit list of modified "root" fields, which is just what we need, further simplifying some logic in ABDiff to parse this information. Overall, DeepDiff is a more configurable library, with built-in options that align closely with our use case. Side effects of this change: * None Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/TIMX-373
Why these changes are being introduced: Now that the DeepDiff library provides an explicit list of modified "root" fields -- i.e. TIMDEX fields -- as a built-in property, we no longer need any additional logic to parse the diff and surface what fields were modified. How this addresses that need: * This removes the helper function generate_field_diff_bools_for_record() and any tests related to it. Side effects of this change: * None Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/TIMX-373
Pull Request Test Coverage Report for Build 11666616180Details
💛 - Coveralls |
ehanson8
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question, but this is a great update!
jonavellecuerdo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great find! Awesome job. 🎉
| def generate_field_diff_bools_for_record(diff_data: dict) -> dict: | ||
| """Function to return dictionary of fields that have a diff. | ||
| Determining if a field had a diff is as straight-forward as looking to see if it shows | ||
| up in the parsed diff JSON. The fields may be at the root of the diff, or they could | ||
| be nested under "$insert" or "$delete" nodes in the diff. | ||
| If a field from the original A/B records are not in the diff at all, then they did not | ||
| have changes, and therefore will not receive a 1 here to indicate a diff. | ||
| """ | ||
| fields_with_diffs = {} | ||
|
|
||
| for key in diff_data: | ||
|
|
||
| # identify modified fields nested in $insert or $delete blocks | ||
| if key in ("$insert", "$delete"): | ||
| for subfield in diff_data[key]: | ||
| fields_with_diffs[subfield] = 1 | ||
|
|
||
| # identified modified fields at root of diff | ||
| else: | ||
| fields_with_diffs[key] = 1 | ||
|
|
||
| return fields_with_diffs | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woohoo! ✨
Purpose and background context
Originally, the library
jsondiffwas used for getting a diff between the A and B version of a JSON record. This was selected based on prior use and experience. For the most part it was a nice fit, but lacked the ability to ignore changes in array order as a difference to report. This resulted in high numbers of records showing a diff, when the only change was a differently ordered array.The solution was to utilize another JSON diffing library,
DeepDiff.How this addresses that need:
DeepDifflibrary has built-in support toignore_orderwhen generating diffs, allowing us to programatically ignore order in arrays as a diff to reportDeepDiffis as fast, or faster, thanjsondiffmaking it a performant optionDeepDiffprovides the attribute.affected_root_keyson the diff object that returns an explicit list of modified "root" fields, which is just what we need, further simplifying some logic in ABDiff to parse this information. Here is where that is applied in our code. Note in this commit how additional diff parsing logic is no longer needed.Overall,
DeepDiffis a more configurable library, with built-in options that align closely with our use case.How can a reviewer manually see the effects of these changes?
See the following tests to understand how array order is an optional difference to report (where we default to not reporting it):
test_calc_record_diff_array_by_default_order_not_a_difftest_calc_record_diff_array_set_flag_order_is_a_diffIncludes new or updated dependencies?
YES
Changes expectations for external applications?
NO
What are the relevant tickets?
Developer
Code Reviewer(s)