Previous: Listing database information, Up: Tutorial
The xml
document type contains several features related to the
structure of XML documents.
Amberfish treats an XML document as an hierarchy of nested documents, which can be useful because it increases the resolution of search results. This is regulated with the af option --dlevel, which limits how many levels of resolution will be processed during indexing. For example:
$ af -i -d mydb -C -t xml --dlevel 2 -v medline.xml
Specifying --dlevel 1 will define each XML file as a single document, bounded by the outermost XML element. (This is the default.) Specifying --dlevel 2 will descend another level to the children of the outermost element and consider these to be documents nested within the outer documents, etc. This feature should be used very cautiously, because of the dramatic increase in disk space and processing time required for progressively higher --dlevel values.
The result of using the --dlevel option is that search results can be very specific. The results returned by af -s consist of the innermost documents (as allowed by --dlevel) that match the query. The --style option can be used with af -s to print the lineage of each result document, for example:
$ af -s -d mydb -q 'ArticleTitle/.../"adipose tissue"' --style=lineage
This produces an indented hierarchy of “ancestor” documents above each result document.