As mentioned, the Zebra indexer and server always look for the file
zebra.cfg
in their current working directory (unless they are
told to look for it elsewhere with the -c
option). The example
file in the test
directory represents all but the bare minimum
for such a file. We find the
following to be a powerful setup for a GILS-like database (everything
preceded by (#) is ignored by the software):
#
# Sample configuration file for GILS database
#
# Where are the configuration files located?
profilePath: /usr/local/lib/zebra
# Load attribute sets for searching
attset bib1.att
attset gils.att
# Records are identified by their path in the file system
recordId: file
# Store information about records to allow deletion and updating
storeKeys: 1
# Records are structured
recordType: grs
# Where to store the indexes
register: /datadisk/index:500M
# Where to store temporary data while merging with register
shadow: /datadisk/shadow:500M
If you like, you can paste this file straight into a zebra.cfg
file ready for your own use (with a bit of editing of the pathnames).
In the following, we'll explain the individual settings. For the full
story on the zebra.cfg
file and the configuration options of
Zebra, you should read the general documentation.
This field tells Zebra where to look for the
configuration files. In the distribution, these files are located in
the tab
directory, but you may wish to put them someplace else
for convenience. If necessary, you can provide multiple directory
paths, separated by (:).
This field tells the Zebra server which attribute sets it should support for searching. You could get by with just loading the GILS set, but if you load BIB-1 as well, Zebra will support both sets for those GILS attributes that are inherited from BIB-1.
The recordId: file
setting tells Zebra that
individual records should be identified by the physical files in which
they are located. In this mode, your database will always (after an
update operation) reflect the contents of the directory (or
directories).
This setting tells Zebra to store additional
information about each record, to facilitate updating. In combination
with the recordId: file
setting, this is a very convenient
maintenance option. If you maintain your records as individual files
in a directory tree, you have only to run zebraidx
with the
top-level directory as an argument. If new files are added, they are
entered into the database. If they are modified, the indexes are
changed accordingly, and if they are deleted from the filesystem (or
renamed), the indexes are also updated correctly, the next time you
run zebraidx
.
This setting selects the type of processing which is to take place when a record is accessed by the indexer or the Z39.50 server. GRS stands for Generic Record Syntax, and signals that the records are structured.
In the first test above, you may have noticed that the
zebraidx
created a number of files in the working directory. Some
of these files, which contain the indexing information for the
database, can grow quite large, and it is sometimes useful to place
them in a separate directory or file system. You should provide the
path of the directory followed by a colon (:), followed by the maximum
amounts of megabytes (M) or kilobytes (K) of disk space that Zebra is
allowed to use in the given directory. If you specify more than one
directory:size combination on the same line, Zebra will fill up
each directory from left to right. This feature is essential if your
database is so large that the registers cannot fit into a single
partition of your disk.
The format of this setting is the same as for the one
above. If you provide one or more directory for the "shadow
system", you enable the safe updating system of the Zebra
indexer. When changes to the records are merged into the register
files, the files are not changed immediately. Instead, the changes are
written into separate files, or "shadow files". At the end
of the merging process, or in a separate operation, the changes are
"committed", and written into the register files
themselves. This final step is carried out by the command zebraidx
commit
- the commit
directive can also be given on the same
command line as the update
directive - at the end of the command
line. The shadow file system can consume a lot of disk space -
particularly in a large update operation which involves almost all of
the index, but the benefits are substantial. If the system crashes
during an update procedure, or the process is otherwise interrupted,
the registers are left in an unknown state, and are effectively
rendered useless - this can be unfortunate if the index is very large,
but the use of the shadow system greatly reduces the risk of an index
being damaged in this way. Further, when the shadow system is enabled,
your clients may access the Zebra server without interruption
throughout the update and commit procedures - Zebra will ensure that
the parts of the register accessed by the server are always
consistent.