The Architecture Of SQLite

(This page was last modified on 2000/09/29 13:30:55 GMT)

Introduction

Block Diagram Of SQLite

This file describes the architecture of the SQLite library. A block diagram showing the main components of SQLite and how they interrelate is shown at the right. The text that follows will provide a quick overview of each of these components.

Interface

Most of the public interface to the SQLite library is implemented by four functions found in the main.c source file. The sqlite_get_table() routine is implemented in table.c. The Tcl interface is implemented by tclsqlite.c. More information on the C interface to SQLite is available separately.

To avoid name collisions with other software, all external symbols in the SQLite library begin with the prefix sqlite. Those symbols that are intended for external use (as oppose to those which are for internal use only but which have to be exported do to limitations of the C linker's scoping mechanism) begin with sqlite_.

Tokenizer

When a string containing SQL statements is to be executed, the interface passes that string to the tokenizer. The job of the tokenizer is to break the original string up into tokens and pass those tokens one by one to the parser. The tokenizer is hand-coded in C. (There is no "lex" code here.) All of the code for the tokenizer is contained in the tokenize.c source file.

Note that in this design, the tokenizer calls the parser. People who are familiar with YACC and BISON may be used to doing things the other way around -- having the parser call the tokenizer. This author as done it both ways, and finds things generally work out nicer for the tokenizer to call the parser. YACC has it backwards.

Parser

The parser is the piece that assigns meaning to tokens based on their context. The parser for SQLite is generated using the Lemon LALR(1) parser generator. Lemon does the same job as YACC/BISON, but is uses a different input syntax which is less error-prone. Lemon also generates a parser which is reentrant and thread-safe. And lemon defines the concept of a non-terminal destructor so that it does not leak memory when syntax errors are encountered. The source file that drives Lemon is found in parse.y.

Because lemon is a program not normally found on development machines, the complete source code to lemon (just one C file) is included in the SQLite distribution in the "tool" subdirectory. Documentation on lemon is found in the "doc" subdirectory of the distribution.

Code Generator

After the parser assembles tokens into complete SQL statements, it calls the code generator to produce virtual machine code that will do the work that the SQL statements request. There are seven files in the code generator: build.c, delete.c, expr.c, insert.c select.c, update.c, and where.c. In these files is where most of the serious magic happens. expr.c handles code generation for expressions. where.c handles code generation for WHERE clauses on SELECT, UPDATE and DELETE statements. The files delete.c, insert.c, select.c, and update.c handle the code generation for SQL statements with the same names. (Each of these files calls routines in expr.c and where.c as necessary.) All other SQL statements are coded out of build.c.

Virtual Machine

The program generated by the code generator is executed by the virtual machine. Additional information about the virtual machine is available separately. To summarize, the virtual machine implements an abstract computing engine specifically designed to manipulate database files. The machine has a stack which is used for intermediate storage. Each instruction contains an opcode and up to three additional operands.

The virtual machine is entirely contained in a single source file vdbe.c. The virtual machine also has its own header file vdbe.h that defines an interface between the virtual machine and the rest of the SQLite library.

Backend

The last layer in the design of SQLite is the backend. The backend implements an interface between the virtual machine and the underlying data file library -- GDBM in this case. The interface is designed to make it easy to substitute a different database library, such as the Berkeley DB. The backend abstracts many of the low-level details to help reduce the complexity of the virtual machine.

The backend is contained in the single source file dbbe.c. The backend also has a header file dbbe.h that defines the interface between the backend and the rest of the SQLite library.



Back to the SQLite Home Page