Class | Ferret::Analysis::AsciiWhiteSpaceAnalyzer |
In: |
ext/r_analysis.c
|
Parent: | Ferret::Analysis::Analyzer |
The AsciiWhiteSpaceAnalyzer recognizes tokens as maximal strings of non-whitespace characters. If implemented in Ruby the AsciiWhiteSpaceAnalyzer would look like;
class AsciiWhiteSpaceAnalyzer def initialize(lower = true) @lower = lower end def token_stream(field, str) if @lower return AsciiLowerCaseFilter.new(AsciiWhiteSpaceTokenizer.new(str)) else return AsciiWhiteSpaceTokenizer.new(str) end end end
As you can see it makes use of the AsciiWhiteSpaceTokenizer. You should use WhiteSpaceAnalyzer if you want to recognize multibyte encodings such as "UTF-8".
Create a new AsciiWhiteSpaceAnalyzer which downcases tokens by default but can optionally leave case as is. Lowercasing will only be done to ascii characters.
lower: | set to false if you don‘t want the field‘s tokens to be downcased |