fuzzy comparison - Python: Fuzzywuzzy not working for foreign characters -


when try simple fuzzywuzzy expression foreign characters, erroneous results using process.extractone method (i've tried , without u):

>>> choices= [u"הלכות חנוכה",u"הלכות פורים",u"הלכות סוכה"] >>> process.extractone("הלכות סוכה", choices) (u'\u05d4\u05dc\u05db\u05d5\u05ea \u05d7\u05e0\u05d5\u05db\u05d4', 0) 

yet runs smoothly fuzz.ratio:

>>> fuzz.ratio("הלכות ראש השנה", "הלכות תעניות") 69 

and same code works great regular characters:

>>> choices= ['this','that','those'] >>> process.extractone("these", choices) ('those', 80) 

what might problem?

pass fuzz.ratio in scorer= argument , add u in front of string you're trying match for.

below works:

choices= [u"הלכות חנוכה",u"הלכות פורים",u"הלכות סוכה"] process.extractone(u"הלכות סוכה", choices, scorer=fuzz.ratio)

(u'\u05d4\u05dc\u05db\u05d5\u05ea \u05e1\u05d5\u05db\u05d4', 100)

and others give same score well:

choices= [u"הלכות חנוכה",u"הלכות פורים",u"הלכות סוכה"] process.extract(u"הלכות סוכה", choices, scorer=fuzz.ratio)

[(u'\u05d4\u05dc\u05db\u05d5\u05ea \u05e1\u05d5\u05db\u05d4', 100), (u'\u05d4\u05dc\u05db\u05d5\u05ea \u05d7\u05e0\u05d5\u05db\u05d4', 86), (u'\u05d4\u05dc\u05db\u05d5\u05ea \u05e4\u05d5\u05e8\u05d9\u05dd', 67)]

fuzzywuzzy version: fuzzywuzzy 0.7.0 & python 2.7x


Comments

Popular posts from this blog

mysql - Dreamhost PyCharm Django Python 3 Launching a Site -

java - Sending SMS with SMSLib and Web Services -

java - How to resolve The method toString() in the type Object is not applicable for the arguments (InputStream) -