Class Ferret::Analysis::StandardTokenizer
In: ext/r_analysis.c
Parent: Ferret::Analysis::TokenStream

The standard tokenizer is an advanced tokenizer which tokenizes most words correctly as well as tokenizing things like email addresses, web addresses, phone numbers, etc.

Example

  "Dave's résumé, at http://www.davebalmain.com/ 1234"
    => ["Dave's", "résumé", "at", "http://www.davebalmain.com", "1234"]

Methods

new  

Public Class methods

Create a new StandardTokenizer which optionally downcases tokens. Downcasing is done according the the current locale.

lower:set to false if you don‘t wish to downcase tokens

[Validate]