Introduction

GigaBASE is object-relational database system with the same programming interface as FastDB main memory DBMS, but using page pool instead of mapping database file to the memory. That is why GigaBASE is able to handle databases which size exceeds size of computers virtual memory. As FastDB, GigaBASE provides very convenient and efficient C++ interface. GigaBASE doesn't support client-server architecture and provide concurrent access to the database only for different threads within one process. GigaBASE is most efficient for applications fetching records using indices or direct object references. High speed of query execution is provided by elimination of data transfer overhead and very effective locking implementation. Synchronization of concurrent database access is implemented in GigaBASE by means of atomic instructions, adding almost no overhead to query processing. GigaBASE uses modified B+tree indices to provide fast access to disk resident data (with minimal disk read operations).

GigaBASE supports transactions, online backups and automatic recovery after system crash. Transaction commit protocol is based on shadow pages algorithm, performing atomic update of database. Recovery can be done very fast, providing high availability for critical applications. Moreover, elimination of transaction logs improves total system performance and leads to more effective usage of system resources.

GigaBASE is application-oriented database. Database tables are constructed using information about application classes. GigaBASE supports automatic scheme evaluation, allowing you to do changes only in one place - in your application classes. GigaBASE provides flexible and convenient interface for retrieving data from database. SQL-like query language is used to specify queries, and such post-relational capabilities as non-atomic fields, nested arrays, user-defined types and methods, direct interobject references simplifies design of database application and makes them more efficient.

GigaBASE is able to efficiently handle databases with several millions objects and up to terabyte size even at computers having not so much physical memory. Page pool using LRU strategy for page replacement and B+tree indices minimize number of disk operations and so provide high system performance.

Query language

GigaBASE supports query language with SQL-like syntax. GigaBASE uses notation more popular for object-oriented programming then for relational database. Table rows are considered as object instances and the table - as class of these objects. Unlike SQL, GigaBASE is oriented on work with objects instead of SQL tuples. So the result of each query execution is a set of objects of one class. The main differences of GigaBASE query language from standard SQL are:

  1. There are no joins of several tables and nested subqueries. Query always returns set of objects from one table.
  2. Standard C types are used for atomic table columns.
  3. There are no NULL values, except null references. I am completely agree with C.J. Date critics of three-value logic and his proposal to use default values instead.
  4. Structures and arrays can be used as record components. Special exists quantor is provided for locating element in arrays.
  5. User methods can be defined for table records (objects) as well as for record components.
  6. User functions with single string or numeric argument can be defined by application.
  7. References between objects are supported including automatic support of inverse references.
  8. Construction start from follow by performs recursive records traversal using references.
  9. As far as query language is deeply integrated with C++ classes, case sensitive mode is used for language identifiers as well as for keywords.
  10. No implicit conversion of integer and floating types is done to string representation. If such conversion is need, it should be done explicitly.

The following rules in BNF-like notation specifies grammar of GigaBASE query language search predicate:

Grammar conventions
ExampleMeaning
expressionnon-terminals
notterminals
|disjoint alternatives
(not)optional part
{1..9}repeat zero or more times

select-condition ::= ( expression ) ( traverse ) ( order )
expression ::= disjunction
disjunction ::= conjunction 
        | conjunction or disjunction
conjunction ::= comparison 
        | comparison and conjunction
comparison ::= operand = operand 
        | operand != operand 
        | operand <> operand 
        | operand < operand 
        | operand <= operand 
        | operand > operand 
        | operand >= operand 
        | operand (not) like operand 
        | operand (not) like operand escape string
        | operand (not) in operand
        | operand (not) in expressions-list
        | operand (not) between operand and operand
	| operand is (not) null
operand ::= addition
additions ::= multiplication 
        | addition +  multiplication
        | addition || multiplication
        | addition -  multiplication
multiplication ::= power 
        | multiplication * power
        | multiplication / power
power ::= term
        | term ^ power
term ::= identifier | number | string 
        | true | false | null 
	| current | first | last
	| ( expression ) 
        | not comparison
	| - term
	| term [ expression ] 
	| identifier . term 
	| function term
        | exists identifier : term
function ::= abs | length | lower | upper
        | integer | real | string | user-function
string ::= ' { { any-character-except-quote } ('') } '
expressions-list ::= ( expression { , expression } )
order ::= order by sort-list
sort-list ::= field-order { , field-order }
field-order ::= [length] field (asc | desc)
field ::= identifier { . identifier }
traverse ::= start from field ( follow by fields-list )
fields-list ::=  field { , field }
user-function ::= identifier

Identifiers are case sensitive, begin with a..z, A..Z, '_' or '$' character, contain only a-z, A..Z, 0..9 '_' or '$' characters, and do not duplicate a SQL reserved words.

List of reserved words
absandascbetweenby
currentdescescapeexistsfalse
firstfollowfromininteger
islengthlikelastlower
notnullorrealstart
stringtrueupper

ANSI-standard comments may also be used. All character from double-hyphen to the end of the line are ignored.

GigaBASE extends ANSI standard SQL operations by supporting bit manipulation operations. Operators and/or can be applied not only to boolean operands but also to operands of integer type. Result of applying and/or operator to integer operands is integer value with bits set by bit-AND/bit-OR operation. Bits operations can be used for efficient implementation of small sets. Also rasing to a power operation ^ is supported by GigaBASE for integer and floating point types.

Structures

GigaBASE accepts structures as components of records. Field of the structure can be accessed using standard dot notation: company.address.city

Structure fields can be indexed and used in order by specification. Structures can contain other structures as their components and there are no limitations on nesting level.

Programmer can define methods for structures, which can be used in queries with the same syntax as normal structure components. Such methods should have no arguments except pointer to the object to which they belong (this pointer in C++), and should return atomic value (of boolean, numeric, string or reference type). Also method should not change object instance (immutable method). If method returns string, then this string should be allocated using new char operator, because it will be deleted after copying of its value. So user-defined methods can be used for creation virtual components - components which are not stored in database, but instead if this are calculated using values of other components. For example, GigaBASE dbDateTime type contains only integer timestamp component and such methods as dbDateTime::year(), dbDateTime::month()... So it is possible to specify queries like: "delivery.year = 1999" in application, where delivery record field has dbDateTime type. Methods are executed in the context of application, where they are defined, and are not available to other applications and interactive SQL.

Arrays

GigaBASE accepts arrays with dynamic length as components of records. Multidimensional arrays are not supported, but it is possible to define array of arrays. It is possible to sort records in the result set by length of array field. GigaBASE provides a set of special constructions for dealing with arrays:

  1. It is possible to get the number of elements in the array by length() function.
  2. Array elements can be fetched by [] operator. If index expression is out of array range, then exception will be raised.
  3. Operator in can be used for checking if array contains value specified by left operand. This operation can be used only for arrays of atomic types: with boolean, numeric, reference or string components.
  4. Iteration through array elements is performed by exists operator. Variable specified after exists keyword can be used as index in arrays in the expression preceded by exists quantor. This index variable will iterate through all possible array index values, until value of expression will become true or index runs out of range. Condition
            exists i: (contract[i].company.location = 'US')
    
    will select all details which are shipped by companies located in US, while query
            not exists i: (contract[i].company.location = 'US')
    
    will select all details which are shipped only from companies outside US.

    Nested exists clauses are allowed. Using of nested exists quantors is equivalent to nested loops using correspondent index variables. For example query

            exists colon: (exists row: (matrix[colon][row] = 0))
    
    will select all records, containing 0 in elements of matrix field, which has type array of array of integer. This construction is equivalent to the following two nested loops:
           bool result = false;
           for (int colon = 0; colon < matrix.length(); colon++) { 
                for (int row = 0; row < matrix[colon].length(); row++) { 
    	         if (matrix[colon][row] == 0) { 
                         result = true;
    		     break;
                     }
                }
           }
    
    Order of using indices is significant! Result of the following query execution
            exists row: (exists colon: (matrix[colon][row] = 0))
    
    will be completely different with result of previous query. The program can simply hang in last case due to infinite loop for empty matrices.

Strings

All strings in GigaBASE have varying length and programmer should not worry about specification of maximal length for character fields. All operations acceptable for arrays are also applicable to strings. In addition to them strings have a set their own operations. First of all string can be compared with each other using standard relation operators. At the current moment GigaBASE supports only ASCII character set (corresponds to type char in C) and byte-by-byte comparison of strings ignoring locality settings.

Construction like can be used for matching string with a pattern containing special wildcard characters '%' and '_'. Character '_' matches any single character, while character '%' matches any number of characters (including 0). Extended form of like operator with escape part can be used to handle characters '%' and '_' in the pattern as normal characters if they are preceded by special escape character, specified after escape keyword.

It is possible to search substring within string by in operator. Expression ('blue' in color) will be true for all records which color fields contains 'blue' word. If length of searched string is greater than some threshold value (currently 512), then Boyer-Moore substring search algorithm is used instead of straightforward search implementation.

Strings can be concatenated by + or || operators. Last one was added only for compatibility with ANSI SQL standard. As far as GigaBASE doesn't support implicit conversion to string type in expressions, semantic of operator + can be redefined for strings.

References

References can be dereferenced using the same dot notation as used for accessing structure components. For example the following query
        company.address.city = 'Chicago'
will access record referenced by company component of Contract record and extract city component of address field of referenced record from Supplier table.

References can be checked for null by is null or is not null predicates. Also references can be compared for equality with each other as well as with special null keyword. When null reference is dereferenced, exception is be raised by GigaBASE.

There is special keyword current, which can be used to get reference to current record during table search. Usually current keyword is used for comparison of current record identifier with other references or locating it within array of references. For example, the following query will search in Contract table for all active contracts (assuming that field canceledContracts has dbArray< dbReference<Contract> > type):

        current not in supplier.canceledContracts

GigaBASE provides special construction for recursive traverse of records by references:

     start from root-references
     ( follow by list-of-reference-fields )
First part of this construction is used to specify root objects. Nonterminal root-references should be variable of reference or array of reference type. Two special keyword first and last can be used here, locating first/last record in the table correspondingly. If you want to check for some condition all records referenced by array of references or single reference field, then this construction can be used without follow by part.

If you specify follow by part, then GigaBASE will recursively traverse table records starting from root references and using list of reference fields list-of-reference-fields for transition between records. list-of-reference-fields should consists of fields of reference or array of reference type. Traverse is done in depth first top-left-right order (first we visit parent node and then siblings in left-to-right order). Recursion is terminated when null reference is accessed or already visited record is referenced. For example the following query will search tree records with weight larger than 1 in TLR order:

        "weight > 1 start from first follow by left, right"

For the following tree:

                              A:1.1
              B:2.0                             C:1.5
      D:1.3         E:1.8                F:1.2         G:0.8
result of query execution will be:
('A', 1.1), ('B', 2.0), ('D', 1.3), ('E', 1.8), ('C', 1.5), ('F', 1.2)

Functions

Predefined functions
NameArgument typeReturn typeDescription
absintegerintegerabsolute value of the argument
absrealrealabsolute value of the argument
integerrealintegerconversion of real to integer
lengtharrayintegernumber of elements in array
lowerstringstringlowercase string
realintegerrealconversion of integer to real
stringintegerstringconversion of integer to string
stringrealstringconversion of real to string
upperstringstringuppercase string

GigaBASE application can define its own functions. Function should have single argument of int8, real8 or char const* type and return value of bool, int8, real8 or char* type. User functions should be registered by USER_FUNC(f) macro, which creates static object of dbUserFunction class, binding the function pointer and the function name. For example the following statements makes it possible to use sin function in SQL statements:

        #include <math.h>
	...
        USER_FUNC(sin);
Functions can be used only within application, where they are defined. Functions are not accessible from other applications and interactive SQL. Function returning string type should allocate returned value by operator new, because GigaBASE will call destructor after copying returned value.

In GigaBASE function argument can be (but not necessarily) enclosed in parentheses. So both of the following expressions are valid:

        '$' + string(abs(x))
	length string y

C++ interface

One of the primary goals of GigaBASE is to provide flexible and convenient application language interface. Anyone who have to use ODBC or similar SQL interfaces will understand what I am speaking about. In GigaBASE query can be written in C++ in the following way:

    dbQuery q; 
    dbCursor<Contract> contracts;
    dbCursor<Supplier> suppliers;
    int price, quantity;
    q = "(price >=",price,"or quantity >=",quantity,
        ") and delivery.year=1999";
    // input price and quantity values
    if (contracts.select(q) != 0) { 
        do { 
            printf("%s\n", suppliers.at(contracts->supplier)->company);
        } while (contracts.next());
    } 

Table

Data in GigaBASE is stored in tables which corresponds to C++ classes and class instances - to table records. The following C++ types are accepted as GigaBASE record atomic components:

TypeDescription
boolboolean type (true,false)
int1one byte signed integer (-128..127)
int2two bytes signed integer (-65536..65536)
int4four bytes signed integer (-2147483647..2147483647)
int8eight bytes signed integer (-2**63..2**63-1)
real4four bytes ANSI floating point type
real8eight bytes ANSI double precision floating point type
char const*zero terminated string
dbReference<T>reference to class T
dbArray<T>dynamic array of elements of type T

In addition to types specified in the table above, GigaBASE records can also contain nested structures of these components. GigaBASE doesn't support unsigned types to simplify query language, eliminate bugs caused by sign/unsigned comparison and reduce size of database engine.

Unfortunately C++ provides no way to get metainformation about a class at runtime (RTTI is not supported by all compilers and also doesn't provide enough information). That is why programmer has to explicitly enumerate class fields to be included in database table (it also makes mapping between classes and tables more flexible). GigaBASE provides a set of macros and classes to make such mapping as simple as possible.

Each C++ class or structure, which will be used in database, should contain special method describing its fields. Macro TYPE_DESCRIPTOR(field_list) will construct this method. The single argument of this macro is enclosed in parentheses list of class fields descriptors. If you want to define some methods for the class and make them available for database, then macro CLASS_DESCRIPTOR(name, field_list) should be used instead of TYPE_DESCRIPTOR. Class name is needed to get references to member functions.

The following macros can be used for construction field descriptors:

FIELD(name)
Non-indexed field with specified name.
RAWFIELD(name)
Field of raw binary type. Database knows nothing about format of this field and treats it as sequence of bytes. Raw binary fields could not be indexed.
KEY(name, index_type)
Indexed field. index_type should be combination of HASHED and INDEXED flags. Flag HASHED is added only for compatibility with FastDB, but hash tables are not yet supported by GigaBASE. B+tree index is used instead of hash table, so using of HASHED is equivalent to using INDEXED index type. When INDEXED flag is specified, GigaBASE will create B+tree for the table using this field as a key. Length of indexed key should not exceed 4Kb. Flag CASE_INSENSITIVE makes index character case insensitive (when index search is performed, "aBc", "abc" and "Abc" will be treated as the same values).
SUPERCLASS(name)
Specifies information about base class (parent) of the current class.
RELATION(reference, inverse_reference)
Specifies one-to-one, one-to-many or many-to-many relationship between classes (tables). Both reference or inverse_reference fields should have reference or array of reference types. inverse_reference is field of referenced table containing inverse reference(s) to the current table. Inverse references are automatically updated by GigaBASE and also are used for query optimization (see Inverse references).
OWNER(reference, inverse_reference)
Specifies one-to-many or many-to-many relationship between classes (tables) of owner-member type. When owner record is removed all referenced member records are also removed (cascade delete). If member record has reference to owner class, it should be declared with RELATION macro.
METHOD(name)
Specifies method of the class. Method should be instance member function, without any parameters and returning boolean, numeric, reference or string type. Methods should be specified after all other attributes of the class.

Although only atomic fields can be indexed, index type can be also specified for structures. Index will be created for component of the structure only if such type of index is specified in the index type mask of the structure. It makes possible to programmers to enable or disable indices for structure fields depending on the role of the structure in the record.

The following example illustrates creation of type descriptor:

class dbDateTime { 
    int4 stamp;
  public:
 
    int year() { 
	return localtime((time_t*)&stamp)->tm_year + 1900;
    }
    ...

    CLASS_DESCRIPTOR(dbDateTime, 
		     (KEY(stamp,INDEXED), 
		      METHOD(year), METHOD(month), METHOD(day),
		      METHOD(dayOfYear), METHOD(dayOfWeek),
		      METHOD(hour), METHOD(minute), METHOD(second)));
};    

class Detail { 
  public:
    char const* name;
    char const* material;
    char const* color;
    real4       weight;

    dbArray< dbReference<Contract> > contracts;

    TYPE_DESCRIPTOR((KEY(name, INDEXED), 
		     KEY(material, INDEXED), 
		     KEY(color, INDEXED),
		     KEY(weight, INDEXED),
		     RELATION(contracts, detail)));
};

class Contract { 
  public:
    dbDateTime            delivery;
    int4                  quantity;
    int8                  price;
    dbReference<Detail>   detail;
    dbReference<Supplier> supplier;

    TYPE_DESCRIPTOR((KEY(delivery, INDEXED), 
		     KEY(quantity, INDEXED), 
		     KEY(price, INDEXED),
		     RELATION(detail, contracts),
		     RELATION(supplier, contracts)));
};
Type descriptors should be defined for all classes used in database. In addition to defining type descriptors, it is necessary to establish mapping between C++ classes and database tables. Macro REGISTER(name) will do it. Unlike TYPE_DESCRIPTOR, REGISTER macro should be used in implementation file and not in header file. It constructs descriptor of the table associated with the class. If you are going to work with multiple databases from one application, it is possible to register table in concrete database by means of REGISTER_IN(name,database) macro. Parameter database of this macro should be pointer to dbDatabase object. Below is example of registration tables in database:

REGISTER(Detail);
REGISTER(Supplier);
REGISTER(Contract);
Table (and correspondent class) can be used only with one database at each moment of time. When you open database, GigaBASE imports all classes defined in application in database. If class with the same name already exists in database, its descriptor stored in the database is compared with descriptor of this class in application. If there are differences in class definitions, GigaBASE tries to convert records from the table to new format. Any kind of conversions between numeric types (integer to real, real to integer, with extension or truncation, are allowed). Also addition of new fields can be easily handled. But removing of the fields is only possible for empty tables (to avoid accidental data destruction).

After loading all class descriptors, GigaBASE checks if all indices specified in the application class descriptor are already present in database, constructing new indices and removing indices, which are no more used. Reformatting of table and adding/removing indices is only possible when there is no more than one application accessing database. So when first application is attached to database, it can perform table conversion. All other application can only add new classes to database, but not change existed ones.

There is one special preexisted table in database - Metatable, which contains information about other tables in database. C++ programmer need not to access this table, because format of database tables is specified by C++ classes. But in interactive SQL program it is possible to examine this table to get information about record fields.

Query

Class query is used for two purposes:
  1. construct query and bind query parameters
  2. cache compiled queries
GigaBASE provides overloaded = and , C++ operators to construct query statement with parameters. Parameters can be specified directly in places where they are used, eliminating any mapping between parameters placeholders and C variables. In the following example of query pointers to the parameters price and quantity are stored in the query, so that query can be executed several times with different values of parameters. C++ overloaded functions make it possible to automatically determine type of parameter, requiring no extra information to be supplied by programmer (so programmer has no possibility to make a bug).
        dbQuery q;
        int price, quantity;
        q = "price >=",price,"or quantity >=",quantity;
As far as char* type can be used either for specifying part of query (such as "price >=") either for parameter of string type, GigaBASE uses special rule to resolve this ambiguity. This rule is based on the assumption that there is no reason for splitting query text in two strings like ("price ",">=") or specifying more than one parameter sequentially ("color=",color,color). So GigaBASE assumes first string to be part of the query text and switches to operand mode after it. In operand mode GigaBASE treats char* argument as query parameter and switches back to query text mode, and so on... It is also possible not to use this "syntax sugar" and construct query elements explicitly by dbQuery::append(dbQueryElement::ElementType type, void const* ptr) method. Before appending elements to the query, it is necessary to reset query by dbQuery::reset() method (operator = do it automatically).

It is not to possible use C++ numeric constants as query parameters, because parameters are accessed by reference. But it is possible to use string constants, because strings are passed by value. There two possible ways of specifying string parameters in query: using string buffer or pointer to pointer to string:

     dbQuery q;
     char* type;
     char name[256];
     q = "name=",name,"and type=",&type;

     scanf("%s", name);
     type = "A";     
     cursor.select(q);
     ...
     scanf("%s", name);
     type = "B";     
     cursor.select(q);
     ...

Query variable can not be passed to a function as parameter or be assigned to other variable. When GigaBASE compiles the query, it saves compiled tree in this object. Next time the query will be used, no compilation is need and ready compiled tree can be used. It saves some time needed for query compilation.

GigaBASE provides two approaches of integration user-defined types in database. First - definition of class methods - was already mentioned. Another approach deals only with query construction. Programmer should define methods, which will not do actual calculations, but instead of this returns expression in terms of predefine database types, which performs necessary calculation. It is better to describe it by example. GigaBASE has no builtin datetime type. Instead of this normal C++ class dbDateTime can be used by programmer. This class defines methods allowing to compare two dates using normal relational operators and specify datetime field in order list:

class dbDateTime { 
    int4 stamp;
  public:
    ...
    dbQueryExpression operator == (char const* field) { 
	dbQueryExpression expr;
	expr = dbComponent(field,"stamp"),"=",stamp;
	return expr;
    }
    dbQueryExpression operator != (char const* field) { 
	dbQueryExpression expr;
	expr = dbComponent(field,"stamp"),"<>",stamp;
	return expr;
    }
    dbQueryExpression operator < (char const* field) { 
	dbQueryExpression expr;
	expr = dbComponent(field,"stamp"),"<",stamp;
	return expr;
    }
    dbQueryExpression operator <= (char const* field) { 
	dbQueryExpression expr;
	expr = dbComponent(field,"stamp"),"<=",stamp;
	return expr;
    }
    dbQueryExpression operator > (char const* field) { 
	dbQueryExpression expr;
	expr = dbComponent(field,"stamp"),">",stamp;
	return expr;
    }
    dbQueryExpression operator >= (char const* field) { 
	dbQueryExpression expr;
	expr = dbComponent(field,"stamp"),">=",stamp;
	return expr;
    }
    friend dbQueryExpression between(char const* field, dbDateTime& from,
				     dbDateTime& till)
    { 
	dbQueryExpression expr;
	expr=dbComponent(field,"stamp"),"between",from.stamp,"and",till.stamp;
	return expr;
    }

    static dbQueryExpression ascent(char const* field) { 
	dbQueryExpression expr;
	expr=dbComponent(field,"stamp");
	return expr;
    }	
    static dbQueryExpression descent(char const* field) { 
	dbQueryExpression expr;
	expr=dbComponent(field,"stamp"),"desc";
	return expr;
    }	
};
All these method receives as their parameter name of the field in the record. This name is used to contract full name of the records components. It can be done by class dbComponent, which constructor takes name the the structure field and name of the component of the structure and returns compound name separated by '.' symbol. Class dbQueryExpression is used to collect expression items. Expression is automatically enclosed in parentheses, eliminating conflicts with operators precedence.

So, assuming record contains field delivery of dbDateTime type it is possible to construct queries like this:

        dbDateTime from, till;
        q1 = between("delivery", from, till),
	     "order by",dbDateTime::ascent("delivery");
        q2 = till >= "delivery"; 
Except these methods, some class specific method can be also defined in such way, for example method overlaps for region type. The benefit of this approach is that database engine will work with predefined types and is able to apply indices and other optimizations to proceed such query. And from the other side, encapsulation of class implementation is preserved, so programmer should not rewrite all queries when class representation is changed.

Variables of following C++ types can be used as query parameters:

int1bool
int2char const*
int4char **
int8char const**
real4dbReference<T>
real8dbArray< dbReference<T> >

Cursor

Cursors are used to access records returned by select statement. GigaBASE provides typed cursors, i.e. cursors associated with concrete tables. There are two kinds of cursors in GigaBASE: readonly cursors and cursors for update. Cursors in GigaBASE are represented by C++ template class dbCursor<T>, where T is name of C++ classes associated with database table. Cursor type should be specified in constructor of the cursor. By default read-only cursor is created. To create cursor for update, you should pass parameter dbCursorForUpdate to the constructor.

Query is executed by cursor select(dbQuery& q) or select() methods. Last method can be used to iterate through all records in the table. Both methods return number of selected records and set current position to the first record (if available). Cursors can be scrolled in forward or backward directions. Methods next(), prev(), first(), last() can be used to change current position of the cursor. If operation can not be performed (no more records available), these methods return NULL and cursor position is not changed.

Cursor for class T contains instance of class T, used for fetching current record. That is why table classes should have default constructor (constructor without parameters), which has no side effects. GigaBASE optimizes fetching records from database, copying only data from fixed part of the object. String bodies are not copied, instead of this correspondent field points directly in database. The same is true for arrays, which components has the same representation in database as in application (arrays of scalar types or arrays of nested structures of scalar components).

Application should not change elements of strings and arrays in database directly. When array method need to update array body, it create in-memory copy of the array and updates this copy. If programmer wants to update string field, it should assign to the pointer new value, but don't change string directly in database. It is recommended to use char const* type instead of char* for string components, to make it possible to compiler to detect illegal usage of strings.

Cursor class provides get() method for obtaining pointer to the current record (stored inside cursor). Also overloaded operator-> can be used to access components of current record. If cursor is opened for update, current record can be changed and stored in database by update() method or can be removed. If current record is removed, next record becomes current. If there is no next record, then previous record becomes current (if exists). Method removeAll() removes all records in the table and method removeAllSelected - all records selected by the cursor.

When records are updated, database size can be increased and extension of database section in virtual memory is needed. As a result of such remapping, base address of the section can be changed and all pointers to database fields kept by application will become invalid. GigaBASE automatically updates current records in all opened cursors when database section is remapped. So, when database is updated, programmer should access record fields only through the cursors -> method and do not use pointer variables.

Memory used for the current selection can be released by reset() method. This method is automatically called by select(), dbDatabase::commit(), dbDatabase::rollback() methods and cursor destructor, so in most cases there is no need to call reset() method explicitly.

Cursors can be also used to access records by reference. Method at(dbReference const& ref) set cursor to the record pointed by the reference. In this case selection consists exactly of one record and next(), prev() methods will always return NULL. As far as cursors and references in GigaBASE are strictly typed, all necessary checking can be done statically by compiler and no dynamic type checking is needed. The only kind of checking, which is done at runtime, is checking for null reference. Object identifier of current record in the cursor can be obtained by currentId() method.

It is possible to restrict number of records returned by select statement. Cursor has two methods setSelectionLimit(size_t lim) and unsetSelectionLimit(), which can be used to set/unset limitation on number of records returned by query. In some situations programmer wants to receive only one record or only few first records, so query execution time and size of consumed memory can be reduced by limiting size of selection. But if you specify order for selected records, query with restriction for k records will no return first k records with the smallest value of the key. Instead of this arbitrary k records will be taken and then sorted.

So all operations with database data are performed by means of cursors. The only exception is insert operation. GigaBASE provides overloaded insert function:

        template<class T>
        dbReference insert(T const& record);
This function will insert record at the end of the table and return reference of the created object. Order of insertion is strictly specified in GigaBASE and applications can use this assumption about records order in the table. For applications widely using references for navigation between objects it is necessary to have some root object, from which traversal by references can be made. Good candidate for such root object is first record in the table (it is also the oldest record in the table). This record can be accessed by execution select() method without parameter. The current record in the cursor will be the first record in the table.

GigaBASE C++ API defines special null variable of reference type. It is possible to compare null variable with references or assign it to the reference:

        void update(dbReference<Contract> c) {
            if (c != null) { 
	        dbCursor<Contract> contract(dbCursorForUpdate);
		contract.at(c);
		contract->supplier = null;
            }
        }

Database

Class dbDatabase controls interaction of application with database. It performs synchronization of concurrent accesses to the database, transaction management, memory allocation, error handling,...

Constructor of dbDatabase objects allows programmer to specify some database parameters:

    dbDatabase(dbAccessType type = dbAllAccess,
	       size_t poolSize = 0, // autodetect size of available memory
	       size_t dbExtensionQuantum = dbDefaultExtensionQuantum,
	       size_t dbInitIndexSize = dbDefaultInitIndexSize,
	       int nThreads = 1);
Database can be opened in readonly mode (dbDatabase::dbReadOnly access type) or in normal mode allowing modification of database (dbDatabase::dbAllAccess). When database is opened in readonly mode, no new class definitions can be added to database and also definition of existed class and indices can not be altered.

Parameter poolSize specifies number of pages in page pool used to optimize file IO. GigaBASE is using 8Kb pages. Size of pool should not be larger than amount of physical memory at the computer and moreover some amount of memory should be reserved for operating system and other application data structures. When default value 0 of this parameter is used, GigaBASE will automatically select page pool size using information about available physical memory in the system. Current algorithm of page pool size calculation is the following (it is the subject for change in future): GigaBASE uses maximal number which is power of two and less than amount of physical memory in the system unless difference of total amount of available physical memory and this number is greater than some unused memory threshold (currently 64Mb). If the difference is greater than threshold value, then size of pool is taken as size of available physical memory minus threshold.

Parameter dbExtensionQuantum specifies quantum of extension of memory allocation bitmap. Briefly speaking, value of this parameters specifies how much memory will be allocated sequentially without attempt to reuse space of deallocated objects. Default value of this parameter is 16 Mb. See section Memory allocation for more details.

Parameter dbInitIndexSize specifies initial index size. All objects in GigaBASE are accessed through object index. There are two copies of object index: current and committed. Object indices are reallocated on demand and setting initial index size can only reduce (or increase) number of reallocations. Default value of this parameter is 64K object identifiers.

And the last parameter nThreads controls level of query parallelization. If it is greater than 1, then GigaBASE can start parallel execution of some queries (including sorting of result). Specified number of parallel threads will be spawned by GigaBASE engine in this case. Usually there is no sense to specify the value of this parameter greater than number of online CPUs in the system. It is also possible to pass zero as value of the parameter, in this case GigaBASE will automatically detect number of online CPUs in the system. Number of threads can be also set by dbDatabase::setConcurrency method at any moment of time.

Class dbDatabase contains static field dbParallelScanThreshold, which specifies threshold for number of records records in the table after which query parallelization is used. Default value of this parameter is 1000.

Database can be opened by open(char const* fileName = NULL) method. Unlike FastDB, GigaBASE is not needed in database name and only file name should be specified. No any suffixes are implicitly appended to database file name. This is the only difference in interfaces to FastDB and GigaBASE.

As far as some operating systems have limitations for maximal file size, GigaBASE provides way to split one logical data file into several physical segments (operating systems files). Segments can be located at different partitions and file systems. Such file is called in GigaBASE multifile. To create multifile you should specify @ symbol before database file name. In this case GigaBASE will treat this name as name of the file with multifile segments description. Each line of this file (except last) should contain name of the operating system file (or raw partition) corresponds to the multifile segment and size (in 8Kb pages) of the segment. Only the last segment of the multifile can by dynamically extended when database is grown. That is why it is not necessary to specify size of the last segment, so last line should contain only the name of the file. Below is the example of multifile description file consisting of two segments represented by physical disk partitions, first of which has size 4Gb:

/dev/hdb1 524288 
/dev/hdc1 

Method open returns true if database was successfully opened or false if open operation failed. In last case database handleError method is called with DatabaseOpenError error code. Database session can be terminated by close method, which implicitly commits current transaction.

In multithreaded application each thread, which wants to access database, should first be attached to it. Method dbDatabase::attach() allocates thread specific data and attaches thread to the database. This method is automatically called by open() method, so there is no reason to call attach() method for the thread opening database. When thread finishes work with database, it should call dbDatabase::detach() method. Method close automatically invokes detach() method. Method detach() implicitly commits current transaction. Attempt to access database by detached thread causes assertion failure.

GigaBASE is able to perform compilation and execution of queries in parallel, providing significant increase of performance in multiprocessor systems. But concurrent updates of database are not possible (this is a price for efficient log-less transaction mechanism and zero time recovery). When application wants to modify database (open cursor for update or insert new record in the table), it first locks database in exclusive mode, prohibiting accesses to database by other applications, even for read-only queries. So to avoid blocking of database application for a long time, modification transactions should be done as short as possible. No blocking operations (like waiting input from the user) should be done within transaction.

Using only shared and exclusive locks on database level, allows GigaBASE to almost eliminate overhead of locking and optimize speed of execution of non-conflicting operations. But if many applications simultaneously updates different parts of database, then approach used in GigaBASE will be very inefficient. That is why GigaBASE is most suitable for single-application database access model or for multiple applications with read-dominated access pattern model.

Both cursor and query objects should be used only by one thread in multithreaded application. If there are more than one threads in your applications, use local variables for cursors and queries objects in each thread. And dbDatabase object is shared between all threads and uses thread specific data to perform query compilation and execution in parallel with minimal synchronization overhead. There are few global things, which require synchronization: symbol table, pool of tree node,... But scanning, parsing and execution of query can be done without any synchronization, providing high level of concurrency at multiprocessor systems.

Database transaction is started by first select or insert operation. If cursor for update is used, then database is locked in exclusive mode, prohibiting access to the database by other applications and threads. If read-only cursor is used, then database is locked in shared mode preventing other application and threads from modifying database, but allowing concurrent read requests execution. Transaction should be explicitly terminated either by dbDatabase::commit() method, which fixes all changes done by transaction in database, or by dbDatabase::rollback() method which undo all modifications done by transaction. Method dbDatabase::close() automatically commits current transaction.

If several threads are concurrently updating database, it will be possible to increase total performance by using partial transaction commit. Method dbDatabase::precommit() doesn't flush any changes to the disk and switch object index. Instead of this it only release locks hold by transaction allowing other threads to proceed. All cursors opened by the thread are closed by dbDatabase::precommit() method. When the thread will access the database next time, it will have to obtain database locks once again. Using precommit method instead of commit eliminates disk operations and so dramatically increases performance. But it is necessary to remember that if application or system fault will take place after precommit method execution, all changes made by transaction will be lost.

If you start transaction by performing selection using read-only cursor and then use cursor for update to perform some modifications of database, database will be first locked in shared mode and then lock will be upgraded to exclusive. This can cause deadlock problem if database is simultaneously accessed by several applications. Imagine that application A starts read transaction and application B also starts read transaction. Both of them hold shared locks on the database. If both of them wants to upgrade their locks to exclusive, they will forever block each other (exclusive lock can not be granted until shared lock of other process exists). To avoid such situation try to use cursor for update at the beginning of transaction or explicitly use dbdatabase::lock() method. More information about implementation of transactions in GigaBASE can be found in section Transactions.

It is possible to explicitly lock database by lock() method. Locking is usually done automatically and there are few cases when you will want to use this method. It will lock database in exclusive mode until the end of current transaction.

Backup of database can be done by the following method:

        bool dbDatabase::backup(char const* backupFileName);
Backup locks database in shared mode and flush image of database in main memory to specified file. Because of using of shadow object index, database file is always in consistent state, so recovery from the backup can be performed just by renaming backup file (if backup was performed on tape, it should be first restored to the disk). If multifile was used as database storage, then simple renaming or copying of backup file is not possible. GigaBASE provides restore method:
        bool dbDatabase::restore(char const* backupFileName,
                                 const* databaseFileName);
This method should be called before opening database, restore of online database is not possible. If databaseFileName contains @ in first position, the rest of the name is treated as the name of the file with multifile segments description (the same as used by dbDatabase::open method). Database can be also restored using restore code of subsql utility.

Class dbDatabase is also responsible for handling various application errors, such as syntax errors in query compilation, out of range index or null reference access during query execution. There is virtual method dbDatabase::handleError, which handles these errors:

        virtual void handleError(dbErrorClass error, 
                                 char const*  msg = NULL, 
                                 int          arg = 0);
Programmer can derive his own subclass from dbDatabase class and redefine default reaction on errors.

Error classes and default handling
ClassDescriptionArgumentDefault reaction
QueryErrorquery compilation errorposition in query stringabort compilation
ArithmeticErrorarithmetic error during division or power operations-terminate application
IndexOutOfRangeErrorindex is out if array boundsvalue of indexterminate application
DatabaseOpenErrorerror while database opening-open method will return false
FileErrorfailure of file IO operationerror codeterminate application
OutOfMemoryErrornot enough memory for object allocationrequested allocation sizeterminate application
Deadlockupgrading lock cause deadlock-terminate application
NullReferenceErrornull reference is accessed during query execution-terminate application

Query optimization

To reduce query execution time GigaBASE uses indices, inverse references and query parallelization. The following sections supplies more information about these optimizations.

Using indices in queries

Indices is traditional approach for increasing RDBMS performance. GigaBASE uses B+tree for implementing index access to the data. GigaBASE uses simple rules for applying indices, allowing programmer to predict when index will be used. Check for index applicability is done during each query execution, so decision can be made depending on values of operands. The following rules describes algorithm of applying indices by GigaBASE:

If index is used to search prefix of like expression, and suffix is not just '%' character, then index search operation can return more records than really match the pattern. In this case we should filter index search output by applying pattern match operation.

When search condition is disjunction of several subexpressions (expression contains several alternatives combined by or operator), then several indices can be used for query execution. To avoid record duplicates in this case, bitmap is used in cursor to mark records already included in the selection.

If search condition requires sequential table scan, B+tree index still can be used if order by clause contains the single record field for which B+tree index is defined. As far as sorting is very expensive operation, using of index instead of sorting significantly reduce time of query execution.

It is possible to check which indices are used for query execution and number of probes done during index search be compiling GigaBASE with option -DDEBUG=DEBUG_TRACE. In this case GigaBASE will dump trace information about database functionality including information about indices.

Inverse references

Inverse references provides efficient and reliable way of establishing relations between tables. GigaBASE uses information about inverse reference when record is inserted/updated/deleted and also for query optimization. Relations between records can be of one of the following types: one-to-one, one-to-many and many-to-many.