indexing - Lucene search with dashes does not return consistent results -


hi have problem lucene search not return consistent results. indexing done standardanalyzer , lucene version 3.0

an example entry in database

a1bc-1-12345678 - au-01 / 123456 - no.1 abc defg xx-yyy example data  

if search whole string, not return results.

if take out single dashes , slashes, search

a1bc-1-12345678 au-01 123456 no.1 abc defg xx-yyy example data 

it not return results.

if replace dash between xx-yyy whitespace, search for

a1bc-1-12345678 au-01 123456 no.1 abc defg xx yyy example data  

--------it returns result!----------------------

now if include dashes , slash, , replace dash between xx-yyy whitespace, search for

a1bc-1-12345678 - au-01 / 123456 - no.1 abc defg xx yyy example data 

it not return results.

finally if replace dash between both au-01 , xx-yyy whitespace, search for

a1bc-1-12345678 au 01 123456 no.1 abc defg xx yyy example data 

it not return results.

in conclusion, "xx-yyy" not valid "au-01" valid, "xx yyy" valid , "au 01" not valid, seems problem?

what can solve this?

i think i've got answer this, according lucene doc, standardanalyzer uses standardtokenizer, , index based on:

  • splits words @ punctuation characters, removing punctuation. however, dot that's not followed whitespace considered part of token.
  • splits words @ hyphens, unless there's number in token, in case whole token interpreted product number , not split.

i think whitespaceanalyzer lowercase filter suit needs.


Comments

Popular posts from this blog

mysql - Dreamhost PyCharm Django Python 3 Launching a Site -

java - Sending SMS with SMSLib and Web Services -

java - How to resolve The method toString() in the type Object is not applicable for the arguments (InputStream) -