-
Notifications
You must be signed in to change notification settings - Fork 23
Open
Description
RLTK returns incorrect score for Jaro Winkler similarity metric for the following pair: ("United Kingdom", "Sengenia (United Kingdom)").
Test:
import jaro
import rltk.similarity as sim
import py_stringmatching.similarity_measure.jaro_winkler
import py_stringmatching.similarity_measure.jaro
import rapidfuzz.distance
key = "United Kingdom"
query = "Sengenia (United Kingdom)"
print(sim.jaro_winkler_similarity(key, query, threshold=0.7, scaling_factor=0.1, prefix_len=4))
jw = py_stringmatching.similarity_measure.jaro_winkler.JaroWinkler(); print(jw.get_raw_score(key, query))
print(jaro.jaro_winkler_metric(key, query))
print(rapidfuzz.distance.JaroWinkler.similarity(key, query, score_cutoff=0.7, prefix_weight=0.1))Output:
0.5054761904761905
0.7342857122421265
0.7342857142857143
0.7342857142857143
Similarly, the Jaro metric is also incorrect.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels