The standard tokenizer is an advanced tokenizer which tokenizes most words correctly as well as tokenizing things like email addresses, web addresses, phone numbers, etc.
"Dave's résumé, at http://www.davebalmain.com/ 1234" => ["Dave's", "r", "sum", "at", "http://www.davebalmain.com", "1234"]
Create a new AsciiStandardTokenizer
static VALUE frb_a_standard_tokenizer_init(VALUE self, VALUE rstr) { return get_wrapped_ts(self, rstr, standard_tokenizer_new()); }
Generated with the Darkfish Rdoc Generator 2.