Soundex is a phonetic algorithm -> http://en.wikipedia.org/wiki/Soundex
I used Key collision metaphone3 method, which is a way to transform tokens into the way they are pronounced.
Example:
Parque nacional de gama
Parque nacional do iguacu
Parque nacional do itatiaia
Parque Nacional de Itatiaia
Kolmogorov complexity -> http://en.wikipedia.org/wiki/Kolmogorov_complexity
to estimate 'similarity' between strings and has been widely applied to the comparison of strings originating from DNA sequencing.
Example:
Podocarpus National Park, Cajanuma at Casa de Pedesur
Podocarpus National Park, Cajanuma, at Casa de predesur
levenshtein -> http://en.wikipedia.org/wiki/Levenshtein_distance
Nearest neighbor, distance function
Example:
Salto Iguazu
Salto tguazu
Soundex is a phonetic algorithm -> http://en.wikipedia.org/wiki/Soundex
I used Key collision metaphone3 method, which is a way to transform tokens into the way they are pronounced.
Example:
Parque nacional de gama
Parque nacional do iguacu
Parque nacional do itatiaia
Parque Nacional de Itatiaia
Kolmogorov complexity -> http://en.wikipedia.org/wiki/Kolmogorov_complexity
to estimate 'similarity' between strings and has been widely applied to the comparison of strings originating from DNA sequencing.
Example:
Podocarpus National Park, Cajanuma at Casa de Pedesur
Podocarpus National Park, Cajanuma, at Casa de predesur
levenshtein -> http://en.wikipedia.org/wiki/Levenshtein_distance
Nearest neighbor, distance function
Example:
Salto Iguazu
Salto tguazu