Fuzzy search

Ispell

When mnoGoSearch is used with ispell support, all words are normalized. This allows finding different grammatical forms of the same words. During indexing, all words are stored "as is" in the database. During the search, all forms of the given keyword are selected and are taken into account. E.g. the search front-end will try to find the word "test" if "testing" or "tests" is given in the search query.

Two types of ispell files

MnoGoSearch understands two types of ispell files: affixes and dictionaries. Ispell affixes file contains rules for words and has approximately the following format:


Flag V:
       E   > -E, IVE      # As in create> creative
      [^E] > IVE          # As in prevent > preventive
Flag *N:
       E   > -E, ION      # As in create > creation
       Y   > -Y, ICATION  # As in multiply > multiplication
     [^EY] > EN           # As in fall > fallen

Ispell dictionary file contains words themselves and has the following format:


wop/S
word/DGJMS
wordage/S
wordbook
wordily
wordless/P

Using Ispell

To make mnoGoSearch support ispell, you must specify Affix and Spell commands in the search.htm file. The format of commands:


Affix [lang] [charset] [ispell affixes file name]
Spell [lang] [charset] [ispell dictionary filename]

The first parameter of both commands is a two letters language abbreviation. The second is the ispell files charset. The third one is the filename. File names are relative to mnoGoSearch's /etc directory. Absolute paths can also be specified.

Note: Simultaneous loading of several languages is supported, e.g.:


Affix en iso-8859-1 en.aff
Spell en iso-8859-1 en.dict
Affix de iso-8859-1 de.aff
Spell de iso-8859-1 de.dict

...will load support for both English and German languages.

If you use searchd, add the same commands to searchd.conf.

When mnoGoSearch is used with Ispell support, it is recommended to use searchd, especially for several languages support. Otherwise the starting time of search.cgi increases.

Customizing dictionaries

It is possible that several rare words are found in your site which are not in Ispell dictionaries. You may create the list of such words in plain text file with the following format (one word per line):


rare.dict:
----------
webmaster
intranet
.......
www
http
---------
			

You may also use ispell flags in this file (for Ispell flags refer to Ispell documentation). This will allow not writing the same word with different endings to the rare words file, for example "webmaster" and "webmasters". You may choose the word which has the same changing rules from an existing Ispell dictionary word and just copy flags from it. For example, English dictionary has this line:

postmaster/MS

So, webmaster with MS flags will probably be OK:

webmaster/MS

Then copy this file to the /etc directory of mnoGoSearch and add this file by using the Spell command in Ispell tab of mnoGoSearch:

During next reindexing using of all documents' new words will be considered as words with correct spelling. The only really incorrect words will remain.

Synonyms

Starting from mnoGoSearch version 3.2, synonyms-based inexplicit search is supported.

Synonyms files are installed into the etc/synonym subdirectory of mnoGoSearch's installation.

To enable synonyms, add search template commands like Synonym <filename> to search.htm, e.g.:


Synonym synonym/english.syn
Synonym synonym/russian.syn

Filenames are relative to the etc directory of mnoGoSearch's installation or absolute if they begin with /

If you use searchd, add the same commands to searchd.conf.

Please feel free to send us your own synonyms lists at . As an example you may take the english synonyms file. In the beginning of the list please specify the following two commands:


Language: en
Charset:  us-ascii