Next Previous Contents

4. Administrating Zebra

To administrate Zebra, you run the zebraidx program. This program supports a number of options which are preceded by a minus, and a few commands (not preceded by minus).

Both the Zebra administrative tool and the Z39.50 server share a set of index files and a global configuration file. The name of the configuration file defaults to zebra.cfg. The configuration file includes specifications on how to index various kinds of records and where the other configuration files are located. zebrasrv and zebraidx must be run in the directory where the configuration file lives unless you indicate the location of the configuration file by option -c.

4.1 Record Types

Indexing is a per-record process. Before a record is indexed search keys are extracted from whatever might be the layout the original record (sgml,html,text, etc..). The Zebra system currently supports two fundamantal types of records: structured and simple text. To specify a particular extraction process, use either the command line option -t or specify a recordType setting in the configuration file.

4.2 The Zebra Configuration File

The Zebra configuration file, read by zebraidx and zebrasrv defaults to zebra.cfg unless specified by -c option.

You can edit the configuration file with a normal text editor. Parameter names and values are seperated by colons in the file. Lines starting with a hash sign (#) are treated as comments.

If you manage different sets of records that share common characteristics, you can organize the configuration settings for each type into "groups". When zebraidx is run and you wish to address a given group you specify the group name with the -g option. In this case settings that have the group name as their prefix will be used by zebraidx. If no -g option is specified, the settings with no prefix are used.

In the configuration file, the group name is placed before the option name itself, separated by a dot (.). For instance, to set the record type for group public to grs.sgml (the SGML-like format for structured records) you would write:

public.recordType: grs.sgml

To set the default value of the record type to text write:

recordType: text

The available configuration settings are summarized below. They will be explained further in the following sections.

group.recordType[.name]

Specifies how records with the file extension name should be handled by the indexer. This option may also be specified as a command line option (-t). Note that if you do not specify a name, the setting applies to all files. In general, the record type specifier consists of the elements (each element separated by dot), fundamental-type, file-read-type and arguments. Currently, two fundamental types exist, text and grs.

group.recordId

Specifies how the records are to be identified when updated. See section Locating Records.

group.database

Specifies the Z39.50 database name.

group.storeKeys

Specifies whether key information should be saved for a given group of records. If you plan to update/delete this type of records later this should be specified as 1; otherwise it should be 0 (default), to save register space.

group.storeData

Specifies whether the records should be stored internally in the Zebra system files. If you want to maintain the raw records yourself, this option should be false (0). If you want Zebra to take care of the records for you, it should be true(1).

lockDir

Directory in which various lock files are stored.

keyTmpDir

Directory in which temporary files used during zebraidx' update phase are stored.

setTmpDir

Specifies the directory that the server uses for temporary result sets. If not specified /tmp will be used.

profilePath

Specifies the location of profile specification files.

attset

Specifies the filename(s) of attribute set files for use in searching. At least the Bib-1 set should be loaded (bib1.att). The profilePath setting is used to look for the specified files. See section The Attribute Set Files

memMax

Specifies size of internal memory to use for the zebraidx program. The amount is given in megabytes - default is 4 (4 MB).

4.3 Locating Records

The default behaviour of the Zebra system is to reference the records from their original location, i.e. where they were found when you ran zebraidx. That is, when a client wishes to retrieve a record following a search operation, the files are accessed from the place where you originally put them - if you remove the files (without running zebraidx again, the client will receive a diagnostic message.

If your input files are not permanent - for example if you retrieve your records from an outside source, or if they were temporarily mounted on a CD-ROM drive, you may want Zebra to make an internal copy of them. To do this, you specify 1 (true) in the storeData setting. When the Z39.50 server retrieves the records they will be read from the internal file structures of the system.

4.4 Indexing example

Consider a system in which you have a group of text files called simple. That group of records should belong to a Z39.50 database called textbase. The following zebra.cfg file will suffice:

profilePath: /usr/lib/yaz/tab:/usr/lib/zebra/tab
attset: explain.att
attset: bib1.att
simple.recordType: text
simple.database: textbase


Next Previous Contents