Chapter 8. Supporting Tools

Table of Contents
Query Syntax Parsers
Object Identifiers
Nibble Memory

In support of the service API - primarily the ASN module, which provides the pro-grammatic interface to the Z39.50 APDUs, YAZ contains a collection of tools that support the development of applications.

Query Syntax Parsers

Since the type-1 (RPN) query structure has no direct, useful string representation, every origin application needs to provide some form of mapping from a local query notation or representation to a Z_RPNQuery structure. Some programmers will prefer to construct the query manually, perhaps using odr_malloc() to simplify memory management. The YAZ distribution includes two separate, query-generating tools that may be of use to you.

Prefix Query Format

Since RPN or reverse polish notation is really just a fancy way of describing a suffix notation format (operator follows operands), it would seem that the confusion is total when we now introduce a prefix notation for RPN. The reason is one of simple laziness - it's somewhat simpler to interpret a prefix format, and this utility was designed for maximum simplicity, to provide a baseline representation for use in simple test applications and scripting environments (like Tcl). The demonstration client included with YAZ uses the PQF.

The PQF is defined by the pquery module in the YAZ library. There are two sets of function that have similar behavior. First set operates on a PQF parser handle, second set doesn't. First set set of functions are more flexible than the second set. Second set is obsolete and is only provided to ensure backwards compatibility.

First set of functions all operate on a PQF parser handle:


     #include <yaz/pquery.h>

     YAZ_PQF_Parser yaz_pqf_create (void);

     void yaz_pqf_destroy (YAZ_PQF_Parser p);

     Z_RPNQuery *yaz_pqf_parse (YAZ_PQF_Parser p, ODR o, const char *qbuf);

     Z_AttributesPlusTerm *yaz_pqf_scan (YAZ_PQF_Parser p, ODR o,
                          Odr_oid **attributeSetId, const char *qbuf);


     int yaz_pqf_error (YAZ_PQF_Parser p, const char **msg, size_t *off);
    

A PQF parser is created and destructed by functions yaz_pqf_create and yaz_pqf_destroy respectively. Function yaz_pqf_parse parses query given by string qbuf. If parsing was successful, a Z39.50 RPN Query is returned which is created using ODR stream o. If parsing failed, a NULL pointer is returned. Function yaz_pqf_scan takes a scan query in qbuf. If parsing was successful, the function returns attributes plus term pointer and modifies attributeSetId to hold attribute set for the scan request - both allocated using ODR stream o. If parsing failed, yaz_pqf_scan returns a NULL pointer. Error information for bad queries can be obtained by a call to yaz_pqf_error which returns an error code and modifies *msg to point to an error description, and modifies *off to the offset within last query were parsing failed.

The second set of functions are declared as follows:


     #include <yaz/pquery.h>

     Z_RPNQuery *p_query_rpn (ODR o, oid_proto proto, const char *qbuf);

     Z_AttributesPlusTerm *p_query_scan (ODR o, oid_proto proto,
                             Odr_oid **attributeSetP, const char *qbuf);

     int p_query_attset (const char *arg);
    

The function p_query_rpn() takes as arguments an ODR stream (see section The ODR Module) to provide a memory source (the structure created is released on the next call to odr_reset() on the stream), a protocol identifier (one of the constants PROTO_Z3950 and PROTO_SR), an attribute set reference, and finally a null-terminated string holding the query string.

If the parse went well, p_query_rpn() returns a pointer to a Z_RPNQuery structure which can be placed directly into a Z_SearchRequest. If parsing failed, due to syntax error, a NULL pointer is returned.

The p_query_attset specifies which attribute set to use if the query doesn't specify one by the @attrset operator. The p_query_attset returns 0 if the argument is a valid attribute set specifier; otherwise the function returns -1.

The grammar of the PQF is as follows:


     query ::= top-set query-struct.

     top-set ::= [ '@attrset' string ]

     query-struct ::= attr-spec | simple | complex | '@term' term-type

     attr-spec ::= '@attr' [ string ] string query-struct

     complex ::= operator query-struct query-struct.

     operator ::= '@and' | '@or' | '@not' | '@prox' proximity.

     simple ::= result-set | term.

     result-set ::= '@set' string.

     term ::= string.

     proximity ::= exclusion distance ordered relation which-code unit-code.

     exclusion ::= '1' | '0' | 'void'.

     distance ::= integer.

     ordered ::= '1' | '0'.

     relation ::= integer.

     which-code ::= 'known' | 'private' | integer.

     unit-code ::= integer.

     term-type ::= 'general' | 'numeric' | 'string' | 'oid' | 'datetime' | 'null'.
    

You will note that the syntax above is a fairly faithful representation of RPN, except for the Attribute, which has been moved a step away from the term, allowing you to associate one or more attributes with an entire query structure. The parser will automatically apply the given attributes to each term as required.

The @attr operator is followed by an attribute specification (attr-spec above). The specification consists of optional an attribute set, an attribute type-value pair and a sub query. The attribute type-value pair is packed in one string: an attribute type, a dash, followed by an attribute value. The type is always an integer but the value may be either an integer or a string (if it doesn't start with a digit character).

Version 3 of the Z39.50 specification defines various encoding of terms. Use the @term type, where type is one of: general, numeric, string (for InternationalString), .. If no term type has been given, the general form is used which is the only encoding allowed in both version 2 - and 3 of the Z39.50 standard.

Common Command Language

Not all users enjoy typing in prefix query structures and numerical attribute values, even in a minimalistic test client. In the library world, the more intuitive Common Command Language (or ISO 8777) has enjoyed some popularity - especially before the widespread availability of graphical interfaces. It is still useful in applications where you for some reason or other need to provide a symbolic language for expressing boolean query structures.

The EUROPAGATE research project working under the Libraries programme of the European Commission's DG XIII has, amongst other useful tools, implemented a general-purpose CCL parser which produces an output structure that can be trivially converted to the internal RPN representation of YAZ (The Z_RPNQuery structure). Since the CCL utility - along with the rest of the software produced by EUROPAGATE - is made freely available on a liberal license, it is included as a supplement to YAZ.

CCL Syntax

The CCL parser obeys the following grammar for the FIND argument. The syntax is annotated by in the lines prefixed by ‐‐.


      CCL-Find ::= CCL-Find Op Elements
                | Elements.

      Op ::= "and" | "or" | "not"
      -- The above means that Elements are separated by boolean operators.

      Elements ::= '(' CCL-Find ')'
                | Set
                | Terms
                | Qualifiers Relation Terms
                | Qualifiers Relation '(' CCL-Find ')'
                | Qualifiers '=' string '-' string
      -- Elements is either a recursive definition, a result set reference, a
      -- list of terms, qualifiers followed by terms, qualifiers followed
      -- by a recursive definition or qualifiers in a range (lower - upper).

      Set ::= 'set' = string
      -- Reference to a result set

      Terms ::= Terms Prox Term
             | Term
      -- Proximity of terms.

      Term ::= Term string
            | string
      -- This basically means that a term may include a blank

      Qualifiers ::= Qualifiers ',' string
                  | string
      -- Qualifiers is a list of strings separated by comma

      Relation ::= '=' | '>=' | '<=' | '<>' | '>' | '<'
      -- Relational operators. This really doesn't follow the ISO8777
      -- standard.

      Prox ::= '%' | '!'
      -- Proximity operator

     

The following queries are all valid:


      dylan

      "bob dylan"

      dylan or zimmerman

      set=1

      (dylan and bob) or set=1

     

Assuming that the qualifiers ti, au and date are defined we may use:


      ti=self portrait

      au=(bob dylan and slow train coming)

      date>1980 and (ti=((self portrait)))

     

CCL Qualifiers

Qualifiers are used to direct the search to a particular searchable index, such as title (ti) and author indexes (au). The CCL standard itself doesn't specify a particular set of qualifiers, but it does suggest a few short-hand notations. You can customize the CCL parser to support a particular set of qualifiers to reflect the current target profile. Traditionally, a qualifier would map to a particular use-attribute within the BIB-1 attribute set. However, you could also define qualifiers that would set, for example, the structure-attribute.

Consider a scenario where the target support ranked searches in the title-index. In this case, the user could specify


      ti,ranked=knuth computer
     

and the ranked would map to relation=relevance (2=102) and the ti would map to title (1=4).

A "profile" with a set predefined CCL qualifiers can be read from a file. The YAZ client reads its CCL qualifiers from a file named default.bib. Each line in the file has the form:

qualifier-name type=val type=val ...

where qualifier-name is the name of the qualifier to be used (eg. ti), type is a BIB-1 category type and val is the corresponding BIB-1 attribute value. The type can be either numeric or it may be either u (use), r (relation), p (position), s (structure), t (truncation) or c (completeness). The qualifier-name term has a special meaning. The types and values for this definition is used when no qualifiers are present.

Consider the following definition:


      ti       u=4 s=1
      au       u=1 s=1
      term     s=105
     

Two qualifiers are defined, ti and au. They both set the structure-attribute to phrase (1). ti sets the use-attribute to 4. au sets the use-attribute to 1. When no qualifiers are used in the query the structure-attribute is set to free-form-text (105).

CCL API

All public definitions can be found in the header file ccl.h. A profile identifier is of type CCL_bibset. A profile must be created with the call to the function ccl_qual_mk which returns a profile handle of type CCL_bibset.

To read a file containing qualifier definitions the function ccl_qual_file may be convenient. This function takes an already opened FILE handle pointer as argument along with a CCL_bibset handle.

To parse a simple string with a FIND query use the function


struct ccl_rpn_node *ccl_find_str (CCL_bibset bibset, const char *str,
                                   int *error, int *pos);
     

which takes the CCL profile (bibset) and query (str) as input. Upon successful completion the RPN tree is returned. If an error occur, such as a syntax error, the integer pointed to by error holds the error code and pos holds the offset inside query string in which the parsing failed.

An English representation of the error may be obtained by calling the ccl_err_msg function. The error codes are listed in ccl.h.

To convert the CCL RPN tree (type struct ccl_rpn_node *) to the Z_RPNQuery of YAZ the function ccl_rpn_query must be used. This function which is part of YAZ is implemented in yaz-ccl.c. After calling this function the CCL RPN tree is probably no longer needed. The ccl_rpn_delete destroys the CCL RPN tree.

A CCL profile may be destroyed by calling the ccl_qual_rm function.

The token names for the CCL operators may be changed by setting the globals (all type char *) ccl_token_and, ccl_token_or, ccl_token_not and ccl_token_set. An operator may have aliases, i.e. there may be more than one name for the operator. To do this, separate each alias with a space character.

CQL

CQL - Common Query Language - was defined for the SRW protocol. In many ways CQL has a similar syntax to CCL. The objective of CQL is different. Where CCL aims to be an end-user language, CQL is the protocol query language for SRW.

Tip: If you are new to CQL, read the Gentle Introduction.

The CQL parser in YAZ provides the following:

  • It parses and validates a CQL query.

  • It generates a C structure that allows you to convert a CQL query to some other query language, such as SQL.

  • The parser converts a valid CQL query to PQF, thus providing a way to use CQL for both SRW/SRU servers and Z39.50 targets at the same time.

  • The parser converts CQL to XCQL. XCQL is an XML representation of CQL. XCQL is part of the SRW specification. However, since SRU supports CQL only, we don't expect XCQL to be widely used. Furthermore, CQL has the advantage over XCQL that it is easy to read.

CQL tree

The the query string is validl, the CQL parser generates a tree representing the structure of the CQL query.


struct cql_node *cql_parser_result(CQL_parser cp);
      
cql_parser_result returns the a pointer to the root node of the resulting tree.

Each node in a CQL tree is represented by a struct cql_node. It is defined as follows:

#define CQL_NODE_ST 1
#define CQL_NODE_BOOL 2
#define CQL_NODE_MOD 3
struct cql_node {
    int which;
    union {
        struct {
            char *index;
            char *term;
            char *relation;
            struct cql_node *modifiers;
            struct cql_node *prefixes;
        } st;
        struct {
            char *value;
            struct cql_node *left;
            struct cql_node *right;
            struct cql_node *modifiers;
            struct cql_node *prefixes;
        } boolean;
        struct {
            char *name;
            char *value;
            struct cql_node *next;
        } mod;
    } u;
};
      
There are three kinds of nodes, search term (ST), boolean (BOOL), and modifier (MOD).

The search term node has five members:

The boolean node represents both and, or, not as well as proximity.

  • left and right: left - and right operand respectively.

  • modifiers: proximity arguments.

  • prefixes: index prefixes. The prefixes is a simple linked list (NULL for last entry). Each prefix node is of type MOD.

The modifier node is a "utility" node used for name-value pairs, such as prefixes, proximity arguements, etc.

  • name name of mod node.

  • value value of mod node.

  • next: pointer to next node which is always a mod node (NULL for last entry).

CQL to PQF conversion

Conversion to PQF (and Z39.50 RPN) is tricky by the fact that the resulting RPN depends on the Z39.50 target capabilities (combinations of supported attributes). In addition, the CQL and SRW operates on index prefixes (URI or strings), whereas the RPN uses Object Identifiers for attribute sets.

The CQL library of YAZ defines a cql_transform_t type. It represents a particular mapping between CQL and RPN. This handle is created and destroyed by the functions:

cql_transform_t cql_transform_open_FILE (FILE *f);
cql_transform_t cql_transform_open_fname(const char *fname);
void cql_transform_close(cql_transform_t ct);
      
The first two functions create a tranformation handle from either an already open FILE or from a filename respectively.

The handle is destroyed by cql_transform_close in which case no further reference of the handle is allowed.

When a cql_transform_t handle has been created you can convert to RPN.

int cql_transform_buf(cql_transform_t ct,
                      struct cql_node *cn, char *out, int max);
      
This function converts the CQL tree cn using handle ct. For the resulting PQF, you supply a buffer out which must be able to hold at at least max characters.

If conversion failed, cql_transform_buf returns a non-zero error code; otherwise zero is returned (conversion successful).

If you wish to be able to produce a PQF result in a different way, there are two alternatives.

void cql_transform_pr(cql_transform_t ct,
                      struct cql_node *cn,
                      void (*pr)(const char *buf, void *client_data),
                      void *client_data);

int cql_transform_FILE(cql_transform_t ct,
                       struct cql_node *cn, FILE *f);
      
The former function produces output to a user-defined output stream. The latter writes the result to an already open FILE.

Specification of CQL to RPN mapping

The file supplied to functions cql_transform_open_FILE, cql_transform_open_fname follows a structure found in many Unix utilities. It consists of mapping specifications - one per line. Lines starting with # are ignored (comments).

Each line is of the form


       CQL pattern =   RPN equivalent
      

An RPN pattern is a simple attribute list. Each attribute pair takes the form:


       [settype=value
      

The attribute set is optional. The type is the attribute type, value the attribute value.

The following CQL patterns are recognized:

qualifier.set.name

This pattern is invoked when a CQL qualifier, such as dc.title is converted. set and name is the index set and qualifier name respectively. Typically, the RPN specifies an equivalent use attribute.

For terms not bound by a qualifier the pattern qualifier.srw.serverChoice is used. Here, the prefix srw is defined as http://www.loc.gov/zing/cql/srw-indexes/v1.0/. If this pattern is not defined, the mapping will fail.

relation.relation

This pattern specifies how a CQL relation is mapped to RPN. pattern is name of relation operator. Since = is used as separator between CQL pattern and RPN, CQL relations including = cannot be used directly. To avoid a conflict, the names ge, eq, le, must be used for CQL operators, greater-than-or-equal, equal, less-than-or-equal respectively. The RPN pattern is supposed to include a relation attribute.

For terms not bound by a relation, the pattern relation.scr is used. If the pattern is not defined, the mapping will fail.

The special pattern, relation.* is used when no other relation pattern is matched.

relationModifier.mod

This pattern specifies how a CQL relation modifier is mapped to RPN. The RPN pattern is usually a relation attribute.

structure.type

This pattern specifies how a CQL structure is mapped to RPN. Note that this CQL pattern is somewhat to similar to CQL pattern relation. The type is a CQL relation.

The pattern, structure.* is used when no other structure pattern is matched. Usually, the RPN equivalent specifies a structure attribute.

position.type

This pattern specifies how the anchor (position) of CQL is mapped to RPN. The type is one of first, any, last, firstAndLast.

The pattern, position.* is used when no other position pattern is matched.

set.prefix

This specification defines a CQL index set for a given prefix. The value on the right hand side is the URI for the set - not RPN. All prefixes used in qualifier patterns must be defined this way.