Next: Relevance ranking, Previous: Fields, Up: Tutorial
XML is a good context for exploring field searching. The following
examples make use of the xml
document type, which supports
nested fields (i.e. fields within fields).
Suppose we index the following XML data, contained in a file called, jones.xml:
<Document> <Author> <Name> <FirstName> Tom </FirstName> <LastName> Jones </LastName> </Name> </Author> </Document>
with the following command:
$ af -i -d mydb -C -t xml -v jones.xml
The xml
document type views this document as containing two
words, `Tom' and `Jones', each located at a certain field
path within the document:
/Document/_c/Author/_c/Name/_c/FirstName/_c/Tom /Document/_c/Author/_c/Name/_c/LastName/_c/Jones
The character, `/', separates the field names, and in this case
each field except for `_c' corresponds to an XML element. (Below
we shall see an example in which a field corresponds to an XML
attribute.) The `_c' is a special field defined by xml
that
means, “element content.” Thus the following search:
$ af -s -d mydb -q '/Document/_c/Author/_c/Name/_c/LastName/_c/Jones'
will return jones.xml as a matching result. These queries also will return a positive match:
$ af -s -d mydb -q '/.../Document/_c/Author/_c/Name/_c/LastName/_c/Jones' $ af -s -d mydb -q '/.../_c/Author/_c/Name/_c/LastName/_c/Jones' $ af -s -d mydb -q '/.../Author/_c/Name/_c/LastName/_c/Jones' $ af -s -d mydb -q '/.../_c/Name/_c/LastName/_c/Jones' $ af -s -d mydb -q '/.../Name/_c/LastName/_c/Jones' $ af -s -d mydb -q '/.../_c/LastName/_c/Jones' $ af -s -d mydb -q '/.../LastName/_c/Jones' $ af -s -d mydb -q '/.../_c/Jones' $ af -s -d mydb -q '/.../Jones' $ af -s -d mydb -q 'Jones'
The `...' is defined by Amberfish as, “a sequence of any zero or more fields.” A `/.../' that begins a field path can be left out completely. For example, these two queries yield the same results:
$ af -s -d mydb -q '/.../LastName/_c/Jones' $ af -s -d mydb -q 'LastName/_c/Jones'
The `...' can be used anywhere within a field path. For example, the following queries match jones.xml:
$ af -s -d mydb -q '/Document/_c/Author/_c/Name/.../Jones' $ af -s -d mydb -q 'Name/.../LastName/.../Jones'
The first of the two examples above will match `Jones' anywhere within the author's name, not necessarily only his last name. The second matches only a last name of Jones, but it need not be the author; for example, it would match a document containing the following fragment:
<Bibliography> <Reference Type="book"> <Title> Text searching the old fashioned way. </Title> <Name> <FirstName> Indiana </FirstName> <LastName> Jones </LastName> </Name> </Reference> </Bibliography>
Other queries that would match the above fragment:
$ af -s -d mydb -q 'Reference/_a/Type/book' $ af -s -d mydb -q 'Reference/_a/.../book' $ af -s -d mydb -q 'Reference/.../book'
The `_a' is another special field defined by xml
that means,
“attribute content.” Thus `_c' and `_a' allow one to
distinguish between attribute and element searching if desired. In
constructing queries for this document type, it is always necessary to
specify `_c', `_a', or `...' after an element field name
and before the next field name or the search word.
Phrase searching with fields is done this way:
$ af -s -d mydb -q 'Title/.../"text searching"'
or in a multiple term expression:
$ af -s -d mydb -Q 'Title/.../"text searching" & Name/.../Indiana & Name/.../Jones'