Class | Ferret::Search::FuzzyQuery |
In: |
ext/r_search.c
|
Parent: | Ferret::Search::Query |
FuzzyQuery uses the Levenshtein distance formula for measuring the similarity between two terms. For example, weak and week have one letter difference and they are four characters long so the simlarity is 75% or 0.75. You can use this query to match terms that are very close to the search term.
FuzzyQuery can be quite useful for find documents that wouldn‘t normally be found because of typos.
FuzzyQuery.new(:field, "google", :min_similarity => 0.6, :prefix_length => 2) # matches => "gogle", "goggle", "googol", "googel"
Set the default value for +:prefix_length+
Create a new FuzzyQuery that will match terms with a similarity of at least +:min_similarity+ to term. Similarity is scored using the Levenshtein edit distance formula. See en.wikipedia.org/wiki/Levenshtein_distance
If a +:prefix_length+ > 0 is specified, a common prefix of that length is also required.
You can also set +:max_terms+ to prevent memory overflow problems. By default it is set to 512.
FuzzyQuery.new(:content, "levenshtein", :min_similarity => 0.8, :prefix_length => 5, :max_terms => 1024)
field: | field to search |
term: | term to search for including it‘s close matches |
:min_similarity: | Default: 0.5. minimum levenshtein distance score for a match |
:prefix_length: | Default: 0. minimum prefix_match before levenshtein distance is measured. This parameter is used to improve performance. With a +:prefix_length+ of 0, all terms in the index must be checked which can be quite a performance hit. By setting theprefix length to a larger number you minimize the number of terms that need to be checked. Even 1 will cut down the work by a factor of about 26 depending on your character set and the first letter. |
:max_terms: | Limits the number of terms that can be added to the query when it is expanded as a MultiTermQuery. This is not usually a problem with FuzzyQueries unless you set +:min_similarity+ to a very low value. |