Skip to content

Latest commit

 

History

History
49 lines (35 loc) · 1.83 KB

File metadata and controls

49 lines (35 loc) · 1.83 KB

dam_lev

Author: Daniel Walker
Version: 0.1.2
Date: 2023-01-05

Overview

The dam_lev package implements the Damerau–Levenshtein diff algorithm. That is, it will take two sequences and determine the minimum number of transpositions, substitutions, insertions, and deletions needed to transform the first sequence into the second.

Usage

The package exposes a single function, dam_lev.get_changes. It takes two sequences (i.e., they must implement the __len__ and __getitem__ methods) and returns a list of dam_lev.Mutation objects. There are four subclasses of dam_lev.Mutation corresponding to the four types of transformations. For example,

diffs = dam_lev.get_changes('abcdef', 'bcedxy')
print(diffs) # [Deletion(at=0), Transposition(at=3), Substitution(at=5, at2=4), Insertion(at=6, at2=5)]

We see that the sequence of transformations is:

  • Delete the item at index 0 ('a')
  • Transpose the item at index 3 ('d') with its successor
  • Substitute the item at index 5 ('f') with the item from the second sequence at index 4 ('x')
  • Insert at index 6 the item from the second sequence at index 5 ('y')

Note the index for the transposition. Even though, after the deletion, the 'd' is at index 2, it's at index 3 in the original version of the sequence. Likewise for the successive mutations.

Key function

You can also pass a callable as the key keyword argument to dam_lev.get_changes. Similar to list.sort, this callable will be used to compare the elements of the sequences. For example,

diffs = dam_lev.get_changes('aBc', 'AbC', key=str.upper)
print(diffs) # []