@@ -287,6 +287,7 @@ which provides the following fields:
287287* ``min_length``: The minimum match length (from start to end) required to
288288 successfully match this expression.
289289* ``edit_distance``: Match this expression within a given Levenshtein distance.
290+ * ``hamming_distance``: Match this expression within a given Hamming distance.
290291
291292These parameters either allow the set of matches produced by a pattern to be
292293constrained at compile time (rather than relying on the application to process
@@ -299,10 +300,15 @@ and a ``max_offset`` of 15 will not produce matches when scanned against
299300streams ``foo0123bar`` or ``foo0123456bar``.
300301
301302Similarly, the pattern :regexp:`/foobar/` when given an ``edit_distance`` of 2
302- will produce matches when scanned against ``foobar``, ``fooba``, ``fobr``,
303- ``fo_baz``, ``foooobar``, and anything else that lies within edit distance of 2
304- (as defined by Levenshtein distance). For more details, see the
305- :ref:`approximate_matching` section.
303+ will produce matches when scanned against ``foobar``, ``f00bar``, ``fooba``,
304+ ``fobr``, ``fo_baz``, ``foooobar``, and anything else that lies within edit
305+ distance of 2 (as defined by Levenshtein distance).
306+
307+ When the same pattern :regexp:`/foobar/` is given a ``hamming_distance`` of 2,
308+ it will produce matches when scanned against ``foobar``, ``boofar``,
309+ ``f00bar``, and anything else with at most two characters substituted from the
310+ original pattern. For more details, see the :ref:`approximate_matching`
311+ section.
306312
307313=================
308314Prefiltering Mode
@@ -377,7 +383,7 @@ The :c:type:`hs_platform_info_t` structure has two fields:
377383#. ``cpu_features``: This allows the application to specify a mask of CPU
378384 features that may be used on the target platform. For example,
379385 :c:member:`HS_CPU_FEATURES_AVX2` can be specified for Intel\ | reg| Advanced
380- Vector Extensions + 2 (Intel\ | reg| AVX2) instruction set support. If a flag
386+ Vector Extensions 2 (Intel\ | reg| AVX2) instruction set support. If a flag
381387 for a particular CPU feature is specified, the database will not be usable on
382388 a CPU without that feature.
383389
@@ -398,13 +404,20 @@ follows:
398404
399405#. ** Edit distance** is defined as Levenshtein distance. That is, there are
400406 three possible edit types considered: insertion, removal and substitution.
401- More formal description can be found on
402- `Wikipedia <https://en. wikipedia. org/wiki/Levenshtein_distance>`_.
407+ A more formal description can be found on
408+ `Wikipedia <https://en. wikipedia. org/wiki/Levenshtein_distance>`__.
409+
410+ #. ** Hamming distance** is the number of positions by which two strings of
411+ equal length differ. That is, it is the number of substitutions required to
412+ convert one string to the other. There are no insertions or removals when
413+ approximate matching using a Hamming distance. A more formal description can
414+ be found on
415+ `Wikipedia <https://en. wikipedia. org/wiki/Hamming_distance>`__.
403416
404- #. ** Approximate matching** will match all * corpora* within a given edit
405- distance. That is, given a pattern, approximate matching will match anything
406- that can be edited to arrive at a corpus that exactly matches the original
407- pattern.
417+ #. ** Approximate matching** will match all * corpora* within a given edit or
418+ Hamming distance. That is, given a pattern, approximate matching will match
419+ anything that can be edited to arrive at a corpus that exactly matches the
420+ original pattern.
408421
409422#. ** Matching semantics** are exactly the same as described in :ref:`semantics`.
410423
@@ -437,7 +450,9 @@ matching support. Here they are, in a nutshell:
437450 reduce to so-called "vacuous" patterns (patterns that match everything). For
438451 example, pattern :regexp:`/foo/` with edit distance 3, if implemented,
439452 would reduce to matching zero-length buffers. Such patterns will result in a
440- "Pattern cannot be approximately matched" compile error.
453+ "Pattern cannot be approximately matched" compile error. Approximate
454+ matching within a Hamming distance does not remove symbols, so will not
455+ reduce to a vacuous pattern.
441456 * Finally, due to the inherent complexities of defining matching behavior,
442457 approximate matching implements a reduced subset of regular expression
443458 syntax. Approximate matching does not support UTF-8 (and other
0 commit comments