Topic: Why does the search seem so fuzzy
Topic type:
When doing a search on a kete site I get results I really don't expect.
With the Coming Home project swinging into action we tried to do a search on the ketekawarau kete site for "war". We got 30 results out of 37.
None of the results have war in them. They do have "was" so the fuziness of the search is perhaps too fuzzy.
So I thought why not search for some porn. I did this on www.old.kete.net.nz and you can imagine my delight when I discovered that there are 9 topics and 5 discussions on porn. But no. There's stuff about port however.
How about "lol cat" hmmm 49 topics on old.kete.net.nz, thats great. What ? no cute pictures of fuzzy kitties....
So how does the search work......? Its getting too fuzzy and if it does fuzz can it reply with "X matching, there are Y similar sounding items".
Once we have a kete with 1000's of topic items and many many images it may be difficult for people to find information if the search gives simular results before it gives exact matches. Water can be Wazer and I'm guessing lol cat is actually location ?
Discuss This Topic
There are 3 comments in this discussion.
Read and join this discussion
I should also say that Kete does include a number of ways to get more exact with searches, example are:
- support for quotes to demarcate entire phrases you want to match, e.g. "world war"
- boolean operators, e.g. installation not documentation in the search field will match items with installation in their content, but exclude results that also have the word documenation
- browsing or searching within a particular basket
- browsing or searching within results of contributed by a user, tagged as, or related items
Well, it might be a good time to revisit our Zebra configuration generally. I've discovered, admittedly I should have spotted this earlier, that we are getting unexpected issues with unicode characters. For technical background you can see these two tickets:
http://kete.lighthouseapp.com/projects/14288/tickets/15
http://kete.lighthouseapp.com/projects/14288/tickets/114
This has to do with standards for XML handling of unicode characters. In order for a site's Zebra instance to be populated properly with the actual unicode characters we'll probably have to do some work with its XSLT transformations of incoming search records.
I raise this because while we are looking at our Zebra configuration stuff for that, it might be worth hashing out what people expect for other search configuration options.
Cheers,
Walter
Tags: Zebra, search, macrons, xml, xslt, configuration
Walter McGinnis
said Re: fuzzy searches
The fuzziness that Kete uses in its searches is a combination of factors. It handles truncation on either side of word, so a search for "excite" will also match items with the word "excited" in them. It also allows for one character within a word to be different. Thus, a search for "war" will return results that only contain "was".
An example of where this "one character different" matching is useful is when someone searches for "māori", but there are only entries with "maori". So there are definite benefits, but it's a tradeoff.
Cheers,
Walter
Tags: Zebra, search, fuzzy