Zebra has been deployed in numerous applications, in both the academic and commercial worlds, in application domains as diverse as bibliographic catalogues, geospatial information, structured vocabulary browsing, government information locators, civic information systems, environmental observations, museum information and web indexes.
Notable applications include the following:
DADS is a huge database of more than ten million records, totalling over ten gigabytes of data. The records are metadata about academic journal articles, primarily scientific; about 10% of these metadata records link to the full text of the articles they describe, a body of about a terabyte of information (although the full text is not indexed.)
It allows students and researchers at DTU (Danmarks Tekniske Universitet, the Technical College of Denmark) to find and order articles from multiple databases in a single query. The database contains literature on all engineering subjects. It's available on-line through a web gateway, though currently only to registered users.
More information can be found at http://www.dtv.dk/help/dads/index_e.htm
Fernuniversität Hagen in Germany have developed a natural language interface for access to library databases. http://ki212.fernuni-hagen.de/nli/NLIintro.html In order to evaluate this interface for recall and precision, they chose Zebra as the basis for retrieval effectiveness. The Zebra server contains a copy of the GIRT database, consisting of more than 76000 records in SGML format (bibliographic records from social science), which are mapped to MARC for presentation.
(GIRT is the German Indexing and Retrieval Testdatabase. It is a standard German-language test database for intelligent indexing and retrieval systems. See http://www.gesis.org/forschung/informationstechnologie/clef-delos.htm)
Evaluation will take place as part of the TREC/CLEF campaign 2003 http://clef.iei.pi.cnr.it or http://www4.eurospider.ch/CLEF/
For more information, contact Johannes Leveling
<Johannes.Leveling@FernUni-Hagen.De>
The M25 Systems Team has created a union catalogue for the periodicals of the twenty-one constituent libraries of the University of London and the University of Westminster (http://www.m25lib.ac.uk/ULS/). They have achieved this using an unusual architecture, which they describe as a ``non-distributed virtual union catalogue''.
The member libraries send in data files representing their periodicals, including both brief bibliographic data and summary holdings. Then 21 individual Z39.50 targets are created, each using Zebra, and all mounted on the single hardware server. The live service provides a web gateway allowing Z39.50 searching of all of the targets or a selection of them. Zebra's small footprint allows a relatively modest system to comfortably host the 21 servers.
More information can be found at http://www.m25lib.ac.uk/ULS/
Zebra has been used by a variety of institutions to construct indexes of large web sites, typically in the region of tens of millions of pages. In this role, it functions somewhat similarly to the engine of google or altavista, but for a selected intranet or a subset of the whole Web.
For example, Liverpool University's web-search facility (see on the home page at http://www.liv.ac.uk/ and many sub-pages) works by relevance-searching a Zebra database which is populated by the Harvest-NG web-crawling software.
For more information on Liverpool university's intranet search
architecture, contact John Gilbertson
<jgilbert@liverpool.ac.uk>
Kang-Jin Lee
<lee@arco.de>
,
has recently modified the Harvest web indexer to use Zebra as
its native repository engine. His comments on the switch over
from the old engine are revealing:
The first results after some testing with Zebra are very promising. The tests were done with around 220,000 SOIF files, which occupies 1.6GB of disk space.
Building the index from scratch takes around one hour with Zebra where [old-engine] needs around five hours. While [old-engine] blocks search requests when updating its index, Zebra can still answer search requests. [...] Zebra supports incremental indexing which will speed up indexing even further.
While the search time of [old-engine] varies from some seconds to some minutes depending how expensive the query is, Zebra usually takes around one to three seconds, even for expensive queries. [...] Zebra can search more than 100 times faster than [old-engine] and can process multiple search requests simultaneously
I am very happy to see such nice software available under GPL.