Skip to content

Commit bc1ab6a

Browse files
authored
Merge pull request #245 from seperman/dev
5.5.0
2 parents 4ae6170 + 94ee32c commit bc1ab6a

31 files changed

+1053
-75
lines changed

AUTHORS.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -35,4 +35,5 @@ Authors in order of the contributions:
3535
- Florian Klien [flowolf](https://github.com/flowolf) for adding math_epsilon
3636
- Tim Klein [timjklein36](https://github.com/timjklein36) for retaining the order of multiple dictionary items added via Delta.
3737
- Wilhelm Schürmann[wbsch](https://github.com/wbsch) for fixing the typo with yml files.
38-
- [lyz-code](https://github.com/lyz-code) for adding support for regular expressions in DeepSearch.
38+
- [lyz-code](https://github.com/lyz-code) for adding support for regular expressions in DeepSearch and strict_checking feature in DeepSearch.
39+
- [dtorres-sf](https://github.com/dtorres-sf)for adding the option for custom compare function

CHANGELOG.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# DeepDiff Change log
22

3-
- v5-3-0: add support for regular expressions in DeepSearch
3+
- v5-5-0: adding iterable_compare_func for DeepDiff, adding output_format of list for path() in tree view.
4+
- v5-4-0: adding strict_checking for numbers in DeepSearch.
5+
- v5-3-0: add support for regular expressions in DeepSearch.
46
- v5-2-3: Retaining the order of multiple dictionary items added via Delta. Fixed the typo with yml files in deep cli. Fixing Grep RecursionError where using non UTF-8 character. Allowing kwargs to be passed to to_json method.
57
- v5-2-2: Fixed Delta serialization when None type is present.
68
- v5-2-0: Removed Murmur3 as the preferred hashing method. Using SHA256 by default now. Added commandline for deepdiff. Added group_by. Added math_epsilon. Improved ignoring of NoneType.

README.md

+15-15
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# DeepDiff v 5.3.0
1+
# DeepDiff v 5.5.0
22

33
![Downloads](https://img.shields.io/pypi/dm/deepdiff.svg?style=flat)
44
![Python Versions](https://img.shields.io/pypi/pyversions/deepdiff.svg?style=flat)
@@ -18,11 +18,11 @@ Tested on Python 3.6+ and PyPy3.
1818

1919
**NOTE: The last version of DeepDiff to work on Python 3.5 was DeepDiff 5-0-2**
2020

21-
- [Documentation](https://zepworks.com/deepdiff/5.3.0/)
21+
- [Documentation](https://zepworks.com/deepdiff/5.5.0/)
2222

2323
## What is new?
2424

25-
Deepdiff 5.3.0 comes with regular expressions in the DeepSearch and grep modules:
25+
Deepdiff 5.5.0 comes with regular expressions in the DeepSearch and grep modules:
2626

2727
```python
2828
>>> from deepdiff import grep
@@ -66,13 +66,13 @@ Note: if you want to use DeepDiff via commandline, make sure to run `pip install
6666

6767
DeepDiff gets the difference of 2 objects.
6868

69-
> - Please take a look at the [DeepDiff docs](https://zepworks.com/deepdiff/5.3.0/diff.html)
70-
> - The full documentation of all modules can be found on <https://zepworks.com/deepdiff/5.3.0/>
69+
> - Please take a look at the [DeepDiff docs](https://zepworks.com/deepdiff/5.5.0/diff.html)
70+
> - The full documentation of all modules can be found on <https://zepworks.com/deepdiff/5.5.0/>
7171
> - Tutorials and posts about DeepDiff can be found on <https://zepworks.com/tags/deepdiff/>
7272
7373
## A few Examples
7474

75-
> Note: This is just a brief overview of what DeepDiff can do. Please visit <https://zepworks.com/deepdiff/5.3.0/> for full documentation.
75+
> Note: This is just a brief overview of what DeepDiff can do. Please visit <https://zepworks.com/deepdiff/5.5.0/> for full documentation.
7676
7777
### List difference ignoring order or duplicates
7878

@@ -276,8 +276,8 @@ Example:
276276
```
277277

278278

279-
> - Please take a look at the [DeepDiff docs](https://zepworks.com/deepdiff/5.3.0/diff.html)
280-
> - The full documentation can be found on <https://zepworks.com/deepdiff/5.3.0/>
279+
> - Please take a look at the [DeepDiff docs](https://zepworks.com/deepdiff/5.5.0/diff.html)
280+
> - The full documentation can be found on <https://zepworks.com/deepdiff/5.5.0/>
281281
282282

283283
# Deep Search
@@ -309,17 +309,17 @@ And you can pass all the same kwargs as DeepSearch to grep too:
309309
{'matched_paths': {"root['somewhere']": 'around'}, 'matched_values': {"root['long']": 'somewhere'}}
310310
```
311311

312-
> - Please take a look at the [DeepSearch docs](https://zepworks.com/deepdiff/5.3.0/dsearch.html)
313-
> - The full documentation can be found on <https://zepworks.com/deepdiff/5.3.0/>
312+
> - Please take a look at the [DeepSearch docs](https://zepworks.com/deepdiff/5.5.0/dsearch.html)
313+
> - The full documentation can be found on <https://zepworks.com/deepdiff/5.5.0/>
314314
315315
# Deep Hash
316316
(New in v4-0-0)
317317

318318
DeepHash is designed to give you hash of ANY python object based on its contents even if the object is not considered hashable!
319319
DeepHash is supposed to be deterministic in order to make sure 2 objects that contain the same data, produce the same hash.
320320

321-
> - Please take a look at the [DeepHash docs](https://zepworks.com/deepdiff/5.3.0/deephash.html)
322-
> - The full documentation can be found on <https://zepworks.com/deepdiff/5.3.0/>
321+
> - Please take a look at the [DeepHash docs](https://zepworks.com/deepdiff/5.5.0/deephash.html)
322+
> - The full documentation can be found on <https://zepworks.com/deepdiff/5.5.0/>
323323
324324
Let's say you have a dictionary object.
325325

@@ -367,8 +367,8 @@ Which you can write as:
367367
At first it might seem weird why DeepHash(obj)[obj] but remember that DeepHash(obj) is a dictionary of hashes of all other objects that obj contains too.
368368

369369

370-
> - Please take a look at the [DeepHash docs](https://zepworks.com/deepdiff/5.3.0/deephash.html)
371-
> - The full documentation can be found on <https://zepworks.com/deepdiff/5.3.0/>
370+
> - Please take a look at the [DeepHash docs](https://zepworks.com/deepdiff/5.5.0/deephash.html)
371+
> - The full documentation can be found on <https://zepworks.com/deepdiff/5.5.0/>
372372
373373

374374
# Using DeepDiff in unit tests
@@ -421,7 +421,7 @@ And here is more info: <http://zepworks.com/blog/diff-it-to-digg-it/>
421421

422422
# ChangeLog
423423

424-
Please take a look at the [changelog](changelog.md) file.
424+
Please take a look at the [CHANGELOG](CHANGELOG.md) file.
425425

426426
# Releases
427427

conftest.py

+18
Original file line numberDiff line numberDiff line change
@@ -62,3 +62,21 @@ def nested_b_t2():
6262
def nested_b_result():
6363
with open(os.path.join(FIXTURES_DIR, 'nested_b_result.json')) as the_file:
6464
return json.load(the_file)
65+
66+
67+
@pytest.fixture(scope='class')
68+
def compare_func_t1():
69+
with open(os.path.join(FIXTURES_DIR, 'compare_func_t1.json')) as the_file:
70+
return json.load(the_file)
71+
72+
73+
@pytest.fixture(scope='class')
74+
def compare_func_t2():
75+
with open(os.path.join(FIXTURES_DIR, 'compare_func_t2.json')) as the_file:
76+
return json.load(the_file)
77+
78+
79+
@pytest.fixture(scope='class')
80+
def compare_func_result1():
81+
with open(os.path.join(FIXTURES_DIR, 'compare_func_result1.json')) as the_file:
82+
return json.load(the_file)

deepdiff/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
"""This module offers the DeepDiff, DeepSearch, grep, Delta and DeepHash classes."""
22
# flake8: noqa
3-
__version__ = '5.3.0'
3+
__version__ = '5.5.0'
44
import logging
55

66
if __name__ == '__main__':

deepdiff/delta.py

+21-4
Original file line numberDiff line numberDiff line change
@@ -260,9 +260,14 @@ def _del_elem(self, parent, parent_to_obj_elem, parent_to_obj_action,
260260
value=obj, action=parent_to_obj_action)
261261

262262
def _do_iterable_item_added(self):
263-
iterable_item_added = self.diff.get('iterable_item_added')
263+
iterable_item_added = self.diff.get('iterable_item_added', {})
264+
iterable_item_moved = self.diff.get('iterable_item_moved')
265+
if iterable_item_moved:
266+
added_dict = {v["new_path"]: v["value"] for k, v in iterable_item_moved.items()}
267+
iterable_item_added.update(added_dict)
268+
264269
if iterable_item_added:
265-
self._do_item_added(iterable_item_added)
270+
self._do_item_added(iterable_item_added, insert=True)
266271

267272
def _do_dictionary_item_added(self):
268273
dictionary_item_added = self.diff.get('dictionary_item_added')
@@ -274,7 +279,7 @@ def _do_attribute_added(self):
274279
if attribute_added:
275280
self._do_item_added(attribute_added)
276281

277-
def _do_item_added(self, items, sort=True):
282+
def _do_item_added(self, items, sort=True, insert=False):
278283
if sort:
279284
# sorting items by their path so that the items with smaller index
280285
# are applied first (unless `sort` is `False` so that order of
@@ -289,6 +294,11 @@ def _do_item_added(self, items, sort=True):
289294
elements, parent, parent_to_obj_elem, parent_to_obj_action, obj, elem, action = elem_and_details
290295
else:
291296
continue # pragma: no cover. Due to cPython peephole optimizer, this line doesn't get covered. https://github.com/nedbat/coveragepy/issues/198
297+
298+
# Insert is only true for iterables, make sure it is a valid index.
299+
if(insert and elem < len(obj)):
300+
obj.insert(elem, None)
301+
292302
self._set_new_value(parent, parent_to_obj_elem, parent_to_obj_action,
293303
obj, elements, path, elem, action, new_value)
294304

@@ -397,7 +407,14 @@ def _do_item_removed(self, items):
397407
self._do_verify_changes(path, expected_old_value, current_old_value)
398408

399409
def _do_iterable_item_removed(self):
400-
iterable_item_removed = self.diff.get('iterable_item_removed')
410+
iterable_item_removed = self.diff.get('iterable_item_removed', {})
411+
412+
iterable_item_moved = self.diff.get('iterable_item_moved')
413+
if iterable_item_moved:
414+
# These will get added back during items_added
415+
removed_dict = {k: v["value"] for k, v in iterable_item_moved.items()}
416+
iterable_item_removed.update(removed_dict)
417+
401418
if iterable_item_removed:
402419
self._do_item_removed(iterable_item_removed)
403420

deepdiff/diff.py

+92-9
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
number_to_string, datetime_normalize, KEY_TO_VAL_STR, booleans,
2222
np_ndarray, get_numpy_ndarray_rows, OrderedSetPlus, RepeatedTimer,
2323
TEXT_VIEW, TREE_VIEW, DELTA_VIEW,
24-
np, get_truncate_datetime, dict_)
24+
np, get_truncate_datetime, dict_, CannotCompare)
2525
from deepdiff.serialization import SerializationMixin
2626
from deepdiff.distance import DistanceMixin
2727
from deepdiff.model import (
@@ -139,6 +139,7 @@ def __init__(self,
139139
truncate_datetime=None,
140140
verbose_level=1,
141141
view=TEXT_VIEW,
142+
iterable_compare_func=None,
142143
_original_type=None,
143144
_parameters=None,
144145
_shared_parameters=None,
@@ -154,7 +155,8 @@ def __init__(self,
154155
"view, hasher, hashes, max_passes, max_diffs, "
155156
"cutoff_distance_for_pairs, cutoff_intersection_for_pairs, log_frequency_in_sec, cache_size, "
156157
"cache_tuning_sample_size, get_deep_distance, group_by, cache_purge_level, "
157-
"math_epsilon, _original_type, _parameters and _shared_parameters.") % ', '.join(kwargs.keys()))
158+
"math_epsilon, iterable_compare_func, _original_type, "
159+
"_parameters and _shared_parameters.") % ', '.join(kwargs.keys()))
158160

159161
if _parameters:
160162
self.__dict__.update(_parameters)
@@ -182,6 +184,7 @@ def __init__(self,
182184
self.ignore_string_case = ignore_string_case
183185
self.exclude_obj_callback = exclude_obj_callback
184186
self.number_to_string = number_to_string_func or number_to_string
187+
self.iterable_compare_func = iterable_compare_func
185188
self.ignore_private_variables = ignore_private_variables
186189
self.ignore_nan_inequality = ignore_nan_inequality
187190
self.hasher = hasher
@@ -558,6 +561,71 @@ def _diff_iterable(self, level, parents_ids=frozenset(), _original_type=None):
558561
else:
559562
self._diff_iterable_in_order(level, parents_ids, _original_type=_original_type)
560563

564+
def _compare_in_order(self, level):
565+
"""
566+
Default compare if `iterable_compare_func` is not provided.
567+
This will compare in sequence order.
568+
"""
569+
570+
return [((i, i), (x, y)) for i, (x, y) in enumerate(
571+
zip_longest(
572+
level.t1, level.t2, fillvalue=ListItemRemovedOrAdded))]
573+
574+
def _get_matching_pairs(self, level):
575+
"""
576+
Given a level get matching pairs. This returns list of two tuples in the form:
577+
[
578+
(t1 index, t2 index), (t1 item, t2 item)
579+
]
580+
581+
This will compare using the passed in `iterable_compare_func` if available.
582+
Default it to compare in order
583+
"""
584+
585+
if(self.iterable_compare_func is None):
586+
# Match in order if there is no compare function provided
587+
return self._compare_in_order(level)
588+
try:
589+
matches = []
590+
y_matched = set()
591+
y_index_matched = set()
592+
for i, x in enumerate(level.t1):
593+
x_found = False
594+
for j, y in enumerate(level.t2):
595+
596+
if(j in y_index_matched):
597+
# This ensures a one-to-one relationship of matches from t1 to t2.
598+
# If y this index in t2 has already been matched to another x
599+
# it cannot have another match, so just continue.
600+
continue
601+
602+
if(self.iterable_compare_func(x, y, level)):
603+
deep_hash = DeepHash(y,
604+
hashes=self.hashes,
605+
apply_hash=True,
606+
**self.deephash_parameters,
607+
)
608+
y_index_matched.add(j)
609+
y_matched.add(deep_hash[y])
610+
matches.append(((i, j), (x, y)))
611+
x_found = True
612+
break
613+
614+
if(not x_found):
615+
matches.append(((i, -1), (x, ListItemRemovedOrAdded)))
616+
for j, y in enumerate(level.t2):
617+
618+
deep_hash = DeepHash(y,
619+
hashes=self.hashes,
620+
apply_hash=True,
621+
**self.deephash_parameters,
622+
)
623+
if(deep_hash[y] not in y_matched):
624+
matches.append(((-1, j), (ListItemRemovedOrAdded, y)))
625+
return matches
626+
except CannotCompare:
627+
return self._compare_in_order(level)
628+
561629
def _diff_iterable_in_order(self, level, parents_ids=frozenset(), _original_type=None):
562630
# We're handling both subscriptable and non-subscriptable iterables. Which one is it?
563631
subscriptable = self._iterables_subscriptable(level.t1, level.t2)
@@ -566,10 +634,7 @@ def _diff_iterable_in_order(self, level, parents_ids=frozenset(), _original_type
566634
else:
567635
child_relationship_class = NonSubscriptableIterableRelationship
568636

569-
for i, (x, y) in enumerate(
570-
zip_longest(
571-
level.t1, level.t2, fillvalue=ListItemRemovedOrAdded)):
572-
637+
for (i, j), (x, y) in self._get_matching_pairs(level):
573638
if self._count_diff() is StopIteration:
574639
return # pragma: no cover. This is already covered for addition.
575640

@@ -586,10 +651,22 @@ def _diff_iterable_in_order(self, level, parents_ids=frozenset(), _original_type
586651
notpresent,
587652
y,
588653
child_relationship_class=child_relationship_class,
589-
child_relationship_param=i)
654+
child_relationship_param=j)
590655
self._report_result('iterable_item_added', change_level)
591656

592657
else: # check if item value has changed
658+
659+
if (i != j):
660+
# Item moved
661+
change_level = level.branch_deeper(
662+
x,
663+
y,
664+
child_relationship_class=child_relationship_class,
665+
child_relationship_param=i,
666+
child_relationship_param2=j
667+
)
668+
self._report_result('iterable_item_moved', change_level)
669+
593670
item_id = id(x)
594671
if parents_ids and item_id in parents_ids:
595672
continue
@@ -738,6 +815,7 @@ def _get_rough_distance_of_hashed_objs(
738815
_shared_parameters=self._shared_parameters,
739816
view=DELTA_VIEW,
740817
_original_type=_original_type,
818+
iterable_compare_func=self.iterable_compare_func,
741819
)
742820
_distance = diff._get_rough_distance()
743821
if cache_key and self._stats[DISTANCE_CACHE_ENABLED]:
@@ -788,6 +866,10 @@ def _get_most_in_common_pairs_in_iterables(
788866
pre_calced_distances = self._precalculate_numpy_arrays_distance(
789867
hashes_added, hashes_removed, t1_hashtable, t2_hashtable, _original_type)
790868

869+
if hashes_added and hashes_removed and self.iterable_compare_func and len(hashes_added) > 1 and len(hashes_removed) > 1:
870+
pre_calced_distances = self._precalculate_distance_by_custom_compare_func(
871+
hashes_added, hashes_removed, t1_hashtable, t2_hashtable, _original_type)
872+
791873
for added_hash in hashes_added:
792874
for removed_hash in hashes_removed:
793875
added_hash_obj = t2_hashtable[added_hash]
@@ -797,9 +879,10 @@ def _get_most_in_common_pairs_in_iterables(
797879
if id(removed_hash_obj.item) in parents_ids:
798880
continue
799881

882+
_distance = None
800883
if pre_calced_distances:
801-
_distance = pre_calced_distances["{}--{}".format(added_hash, removed_hash)]
802-
else:
884+
_distance = pre_calced_distances.get("{}--{}".format(added_hash, removed_hash))
885+
if _distance is None:
803886
_distance = self._get_rough_distance_of_hashed_objs(
804887
added_hash, removed_hash, added_hash_obj, removed_hash_obj, _original_type)
805888
# Left for future debugging

0 commit comments

Comments
 (0)