Module | Stemmable |
In: |
lib/stemmer/porter.rb
|
$Id: stemmable.rb,v 1.2 2003/02/01 02:07:30 condit Exp $
See example usage at the end of this file.
STEP_2_LIST | = | { 'ational'=>'ate', 'tional'=>'tion', 'enci'=>'ence', 'anci'=>'ance', 'izer'=>'ize', 'bli'=>'ble', 'alli'=>'al', 'entli'=>'ent', 'eli'=>'e', 'ousli'=>'ous', 'ization'=>'ize', 'ation'=>'ate', 'ator'=>'ate', 'alism'=>'al', 'iveness'=>'ive', 'fulness'=>'ful', 'ousness'=>'ous', 'aliti'=>'al', 'iviti'=>'ive', 'biliti'=>'ble', 'logi'=>'log' |
STEP_3_LIST | = | { 'icate'=>'ic', 'ative'=>'', 'alize'=>'al', 'iciti'=>'ic', 'ical'=>'ic', 'ful'=>'', 'ness'=>'' |
SUFFIX_1_REGEXP | = | /( ational | tional | enci | anci | izer | bli | alli | entli | eli | ousli | ization | ation | ator | alism | iveness | fulness | ousness | aliti | iviti | biliti | logi)$/x |
SUFFIX_2_REGEXP | = | /( al | ance | ence | er | ic | able | ible | ant | ement | ment | ent | ou | ism | ate | iti | ous | ive | ize)$/x |
C | = | "[^aeiou]" |
V | = | "[aeiouy]" |
CC | = | "#{C}(?>[^aeiouy]*)" |
VV | = | "#{V}(?>[aeiou]*)" |
MGR0 | = | /^(#{CC})?#{VV}#{CC}/o |
MEQ1 | = | /^(#{CC})?#{VV}#{CC}(#{VV})?$/o |
MGR1 | = | /^(#{CC})?#{VV}#{CC}#{VV}#{CC}/o |
VOWEL_IN_STEM | = | /^(#{CC})?#{V}/o |
Porter stemmer in Ruby.
This is the Porter stemming algorithm, ported to Ruby from the version coded up in Perl. It‘s easy to follow against the rules in the original paper in:
Porter, 1980, An algorithm for suffix stripping, Program, Vol. 14, no. 3, pp 130-137,
See also www.tartarus.org/~martin/PorterStemmer
Send comments to raypereda@hotmail.com