Lexification Wizard Help

E-Mail Comments to: doc@cyc.com
Copyright© 2000, 2001 Cycorp. All rights reserved.

Background

Currently at Cycorp, concepts are labelled in a two phase process. First, when the knowledge engineer is creating a concept, a phrase is associated with the concept using a #$termStrings assertion. This is suitable for retrieving the concept using the same phrase, but doesn't allow for recognition of simple variants, for example, those based on pluralization. Furthermore, these labels are not suitable for generation purposes where the form might change based on the context (e.g., 'how to swim' vs. 'swimming advice'). Thus, there is a second phase in which the plain labels are converted into lexical assertions, which encode syntactic information suitable for recognizing variants and proper generation. Such assertions are generally only made by the NL staff, although KE'ers with substantial experience can also make them. Thus there is a bottleneck is the process of ensuring that proper ``lexifications'' are associated with Cyc concepts.

As a first step in addressing this problem, the Open Directory linking tool introduced a lexical wizard that guided the users in the decisions that needed to be made when performing the lexification and made the appropriate assertions for them. However, this is only applicable for concepts created during external concept linking. Also, only a limited range of lexifications were supported..

Introduction

The Cycorp Lexification Wizard is extension of the Open Directory linking tool lexification wizard that is integrated into the browser. The interface is designed to have the feel of a Microsoft Windows based wizard. In particular, options can be specified one at a time with the next phase invoked via a NEXT->button. Defaults will be filled in based on the selection and then the same page brought up for further input. When the tool has determined the settings for all options needed forthe lexification, a LEXIFY button will replace the next button. After the proposed assertion is diagnosed OK, the FINISH button will appear, signalling that the actual assertion can be made.

General Lexification

For all terms to be lexified, the following options need to be specified.

Term The Cyc concept to be lexified
Phrase A word or phrase serving as a label for the term
Is preferred? Whether the phrase the most suitable form of reference
Is proper noun? Whether the phrase is a proper name
MT Microtheory to use for the lexification (usually #$EnglishMt except for relations which use #$EnglishParaphraseMt)

Proper Name Lexification

Proper name lexification involves selecting a predicate that best expresses the relationship of a name to a Cyc constant. Often this will just be #$nameString, unless a specialization is more appropriate (e.g., #$scientificName). Therefore, there is a single option to be selected in this case:

Proper-name-predicate Name of the predicate to use (see #$properNameStrings)

Denotational Lexification

``Denotational'' lexification involves selecting options needed for a #$denotation or one of the multi-word lexification predicates (e.g., #$multiWordString). The relevant options follow:

Headword position Position in the phrase of the headword (i.e., the word that determines the grammatic function of the entire phrase)
Headword part of speech Grammatical part of speech of the headword (see #$SpeechPart).

Often selecting between the #$SimpleNoun and #$MassNoun parts of speech is not straightforward. For tips on making this decision, see file://localhost/cyc/projects/od/gen-info/odFAQ.htm#massSimple.

Whenever the selected headword for a phrase being lexified doesn't have an entry in the Cyc Lexicon, the ``No suitable parts of speech'' error message will be displayed and the ADD MAPPING button will be shown to allow the syntactic information to be added via Wales. For documentation using Wales for this purpose, see file:////home/tom/cvs/head/cycorp/cyc/doc/help/lexwiz-wales.html.

Relation Lexification

During lexification of relation terms, the term itself is not lexified, since relations are only of interest when instantiated. Instead, relational lexification involves creating a #$genFormat statement which indicates how a particular instance of a relation along with the arguments can be paraphrased into English. The relevant options follow:

Relation template Template for relational lexifications (via #$genFormat)
Argument specification Argument specification for the relational template
Skip arity checks? Whether to bypass the relation arity checks

An introduction to the use of #$genFormat can be found in file:////home/baxter/cvs/head/cycorp/cyc/doc/ref/generation.text.

Keywords commonly used in the argument specifications:

:quote Use concept name rather than NL paraphrase
:plural-generic Use a plural noun form
:singular-generic Use a singular noun form
:mass-number-generic Use a mass (aka non-count) noun form such as 'sand'
:non-plural-generic Use either a mass or a singular form
:non-singular-generic Use either a mass or a plural form
:a Include the determiner 'a' or 'an' as appropriate
:the Include the determiner 'the'
:gerund Use a progressive verb form such as 'walking'

The following points should be kept in mind: