SQLAlchemy 0.3 Documentation

Multiple Pages | One Page
Version: 0.3.10 Last Updated: 07/20/07 17:20:04

The Engine is the starting point for any SQLAlchemy application. It's "home base" for the actual database and its DBAPI, delivered to the SQLAlchemy application through a connection pool and a Dialect, which describes how to talk to a specific kind of database and DBAPI combination.

The general structure is this:

                                     +-----------+                        __________
                                 /---|   Pool    |---\                   (__________)
             +-------------+    /    +-----------+    \     +--------+   |          |
connect() <--|   Engine    |---x                       x----| DBAPI  |---| database |
             +-------------+    \    +-----------+    /     +--------+   |          |
                                 \---|  Dialect  |---/                   |__________|
                                     +-----------+                       (__________)

Where above, a sqlalchemy.engine.Engine references both a sqlalchemy.engine.Dialect and sqlalchemy.pool.Pool, which together interpret the DBAPI's module functions as well as the behavior of the database.

Creating an engine is just a matter of issuing a single call, create_engine():

engine = create_engine('postgres://scott:tiger@localhost:5432/mydatabase')

The above engine invokes the postgres dialect and a connection pool which references localhost:5432.

The engine can be used directly to issue SQL to the database. The most generic way is to use connections, which you get via the connect() method:

connection = engine.connect()
result = connection.execute("select username from users")
for row in result:
    print "username:", row['username']
connection.close()

The connection is an instance of sqlalchemy.engine.Connection, which is a proxy object for an actual DBAPI connection. The returned result is an instance of sqlalchemy.engine.ResultProxy, which acts very much like a DBAPI cursor.

When you say engine.connect(), a new Connection object is created, and a DBAPI connection is retrieved from the connection pool. Later, when you call connection.close(), the DBAPI connection is returned to the pool; nothing is actually "closed" from the perspective of the database.

To execute some SQL more quickly, you can skip the Connection part and just say:

result = engine.execute("select username from users")
for row in result:
    print "username:", row['username']
result.close()

Where above, the execute() method on the Engine does the connect() part for you, and returns the ResultProxy directly. The actual Connection is inside the ResultProxy, waiting for you to finish reading the result. In this case, when you close() the ResultProxy, the underlying Connection is closed, which returns the DBAPI connection to the pool.

To summarize the above two examples, when you use a Connection object, its known as explicit execution. When you don't see the Connection object, but you still use the execute() method on the Engine, its called explicit, connectionless execution. A third variant of execution also exists called implicit execution; this will be described later.

The Engine and Connection can do a lot more than what we illustrated above; SQL strings are only its most rudimental function. Later chapters will describe how "constructed SQL" expressions can be used with engines; in many cases, you don't have to deal with the Engine at all after it's created. The Object Relational Mapper (ORM), an optional feature of SQLAlchemy, also uses the Engine in order to get at connections; that's also a case where you can often create the engine once, and then forget about it.

Supported Databases

Recall that the Dialect is used to describe how to talk to a specific kind of database. Dialects are included with SQLAlchemy for SQLite, Postgres, MySQL, MS-SQL, Firebird, Informix, and Oracle; these can each be seen as a Python module present in the sqlalchemy.databases package. Each dialect requires the appropriate DBAPI drivers to be installed separately.

Downloads for each DBAPI at the time of this writing are as follows:

The SQLAlchemy Wiki contains a page of database notes, describing whatever quirks and behaviors have been observed. Its a good place to check for issues with specific databases. Database Notes

back to section top

create_engine() URL Arguments

SQLAlchemy indicates the source of an Engine strictly via RFC-1738 style URLs, combined with optional keyword arguments to specify options for the Engine. The form of the URL is:

driver://username:password@host:port/database

Available drivernames are sqlite, mysql, postgres, oracle, mssql, and firebird. For sqlite, the database name is the filename to connect to, or the special name ":memory:" which indicates an in-memory database. The URL is typically sent as a string to the create_engine() function:

# postgres
pg_db = create_engine('postgres://scott:tiger@localhost:5432/mydatabase')

# sqlite (note the four slashes for an absolute path)
sqlite_db = create_engine('sqlite:////absolute/path/to/database.txt')
sqlite_db = create_engine('sqlite:///relative/path/to/database.txt')
sqlite_db = create_engine('sqlite://')  # in-memory database
sqlite_db = create_engine('sqlite://:memory:')  # the same

# mysql
mysql_db = create_engine('mysql://localhost/foo')

# oracle via TNS name
oracle_db = create_engine('oracle://scott:tiger@dsn')

# oracle will feed host/port/SID into cx_oracle.makedsn
oracle_db = create_engine('oracle://scott:tiger@127.0.0.1:1521/sidname')

The Engine will ask the connection pool for a connection when the connect() or execute() methods are called. The default connection pool, QueuePool, as well as the default connection pool used with SQLite, SingletonThreadPool, will open connections to the database on an as-needed basis. As concurrent statements are executed, QueuePool will grow its pool of connections to a default size of five, and will allow a default "overflow" of ten. Since the Engine is essentially "home base" for the connection pool, it follows that you should keep a single Engine per database established within an application, rather than creating a new one for each connection.

Custom DBAPI connect() arguments

Custom arguments used when issuing the connect() call to the underlying DBAPI may be issued in three distinct ways. String-based arguments can be passed directly from the URL string as query arguments:

db = create_engine('postgres://scott:tiger@localhost/test?argument1=foo&argument2=bar')

If SQLAlchemy's database connector is aware of a particular query argument, it may convert its type from string to its proper type.

create_engine also takes an argument connect_args which is an additional dictionary that will be passed to connect(). This can be used when arguments of a type other than string are required, and SQLAlchemy's database connector has no type conversion logic present for that parameter:

db = create_engine('postgres://scott:tiger@localhost/test', connect_args = {'argument1':17, 'argument2':'bar'})

The most customizable connection method of all is to pass a creator argument, which specifies a callable that returns a DBAPI connection:

def connect():
    return psycopg.connect(user='scott', host='localhost')

db = create_engine('postgres://', creator=connect)
back to section top

Database Engine Options

Keyword options can also be specified to create_engine(), following the string URL as follows:

db = create_engine('postgres://...', encoding='latin1', echo=True)

A list of all standard options, as well as several that are used by particular database dialects, is as follows:

back to section top

More On Connections

Recall from the beginning of this section that the Engine provides a connect() method which returns a Connection object. Connection is a proxy object which maintains a reference to a DBAPI connection instance. The close() method on Connection does not actually close the DBAPI connection, but instead returns it to the connection pool referenced by the Engine. Connection will also automatically return its resources to the connection pool when the object is garbage collected, i.e. its __del__() method is called. When using the standard C implementation of Python, this method is usually called immediately as soon as the object is dereferenced. With other Python implementations such as Jython, this is not so guaranteed.

The execute() methods on both Engine and Connection can also receive SQL clause constructs as well:

connection = engine.connect()
result = connection.execute(select([table1], table1.c.col1==5))
for row in result:
    print row['col1'], row['col2']
connection.close()

The above SQL construct is known as a select(). The full range of SQL constructs available are described in Constructing SQL Queries via Python Expressions.

Both Connection and Engine fulfill an interface known as Connectable which specifies common functionality between the two objects, namely being able to call connect() to return a Connection object (Connection just returns itself), and being able to call execute() to get a result set. Following this, most SQLAlchemy functions and objects which accept an Engine as a parameter or attribute with which to execute SQL will also accept a Connection. As of SQLAlchemy 0.3.9, this argument is named bind.

Specify Engine or Connection
engine = create_engine('sqlite:///:memory:')

# specify some Table metadata
metadata = MetaData()
table = Table('sometable', metadata, Column('col1', Integer))

# create the table with the Engine
table.create(bind=engine)

# drop the table with a Connection off the Engine
connection = engine.connect()
table.drop(bind=connection)

Connection facts:

back to section top

Using Transactions with Connection

The Connection object provides a begin() method which returns a Transaction object. This object is usually used within a try/except clause so that it is guaranteed to rollback() or commit():

trans = connection.begin()
try:
    r1 = connection.execute(table1.select())
    connection.execute(table1.insert(), col1=7, col2='this is some data')
    trans.commit()
except:
    trans.rollback()
    raise

The Transaction object also handles "nested" behavior by keeping track of the outermost begin/commit pair. In this example, two functions both issue a transaction on a Connection, but only the outermost Transaction object actually takes effect when it is committed.

# method_a starts a transaction and calls method_b
def method_a(connection):
    trans = connection.begin() # open a transaction
    try:
        method_b(connection)
        trans.commit()  # transaction is committed here
    except:
        trans.rollback() # this rolls back the transaction unconditionally
        raise

# method_b also starts a transaction
def method_b(connection):
    trans = connection.begin() # open a transaction - this runs in the context of method_a's transaction
    try:
        connection.execute("insert into mytable values ('bat', 'lala')")
        connection.execute(mytable.insert(), col1='bat', col2='lala')
        trans.commit()  # transaction is not committed yet
    except:
        trans.rollback() # this rolls back the transaction unconditionally
        raise

# open a Connection and call method_a
conn = engine.connect()                
method_a(conn)
conn.close()

Above, method_a is called first, which calls connection.begin(). Then it calls method_b. When method_b calls connection.begin(), it just increments a counter that is decremented when it calls commit(). If either method_a or method_b calls rollback(), the whole transaction is rolled back. The transaction is not committed until method_a calls the commit() method. This "nesting" behavior allows the creation of functions which "guarantee" that a transaction will be used if one was not already available, but will automatically participate in an enclosing transaction if one exists.

Note that SQLAlchemy's Object Relational Mapper also provides a way to control transaction scope at a higher level; this is described in SessionTransaction.

Transaction Facts:

Understanding Autocommit

The above transaction example illustrates how to use Transaction so that several executions can take part in the same transaction. What happens when we issue an INSERT, UPDATE or DELETE call without using Transaction? The answer is autocommit. While many DBAPIs implement a flag called autocommit, the current SQLAlchemy behavior is such that it implements its own autocommit. This is achieved by searching the statement for strings like INSERT, UPDATE, DELETE, etc. and then issuing a COMMIT automatically if no transaction is in progress.

conn = engine.connect()
conn.execute("INSERT INTO users VALUES (1, 'john')")  # autocommits
back to section top

Connectionless Execution, Implicit Execution

Recall from the first section we mentioned executing with and without a Connection. Connectionless execution refers to calling the execute() method on an object which is not a Connection, which could be on the the Engine itself, or could be a constructed SQL object. When we say "implicit", we mean that we are calling the execute() method on an object which is neither a Connection nor an Engine object; this can only be used with constructed SQL objects which have their own execute() method, and can be "bound" to an Engine. A description of "constructed SQL objects" may be found in Constructing SQL Queries via Python Expressions.

A summary of all three methods follows below. First, assume the usage of the following MetaData and Table objects; while we haven't yet introduced these concepts, for now you only need to know that we are representing a database table, and are creating an "executeable" SQL construct which issues a statement to the database. These objects are described in Database Meta Data.

meta = MetaData()
users_table = Table('users', meta, 
    Column('id', Integer, primary_key=True), 
    Column('name', String(50))
)

Explicit execution delivers the SQL text or constructed SQL expression to the execute() method of Connection:

engine = create_engine('sqlite:///file.db')
connection = engine.connect()
result = connection.execute(users_table.select())
for row in result:
    # ....
connection.close()

Explicit, connectionless execution delivers the expression to the execute() method of Engine:

engine = create_engine('sqlite:///file.db')
result = engine.execute(users_table.select())
for row in result:
    # ....
result.close()

Implicit execution is also connectionless, and calls the execute() method on the expression itself, utilizing the fact that either an Engine or Connection has been bound to the expression object (binding is discussed further in the next section, Database Meta Data):

engine = create_engine('sqlite:///file.db')
meta.connect(engine)
result = users_table.select().execute()
for row in result:
    # ....
result.close()

In both "connectionless" examples, the Connection is created behind the scenes; the ResultProxy returned by the execute() call references the Connection used to issue the SQL statement. When we issue close() on the ResultProxy, or if the result set object falls out of scope and is garbage collected, the underlying Connection is closed for us, resulting in the DBAPI connection being returned to the pool.

Using the Threadlocal Execution Strategy

With connectionless execution, each returned ResultProxy object references its own distinct DBAPI connection object. This means that multiple executions will result in multiple DBAPI connections being used at the same time; the example below illustrates this:

db = create_engine('mysql://localhost/test')

# execute one statement and receive results.  r1 now references a DBAPI connection resource.
r1 = db.execute("select * from table1")

# execute a second statement and receive results.  r2 now references a *second* DBAPI connection resource.
r2 = db.execute("select * from table2")
for row in r1:
    ...
for row in r2:
    ...
# release connection 1
r1.close()

# release connection 2
r2.close()

Where above, we have two result sets in scope at the same time, therefore we have two distinct DBAPI connections, both separately checked out from the connection pool, in scope at the same time.

An option exists to create_engine() called strategy="threadlocal", which changes this behavior. When this option is used, the Engine which is returned by create_engine() is a special subclass of engine called TLEngine. This engine, when it creates the Connection used by a connectionless execution, checks a threadlocal variable for an existing DBAPI connection that was already checked out from the pool, within the current thread. If one exists, it uses that one.

The usage of "threadlocal" modifies the underlying behavior of our example above, as follows:

Threadlocal Strategy
db = create_engine('mysql://localhost/test', strategy='threadlocal')

# execute one statement and receive results.  r1 now references a DBAPI connection resource.
r1 = db.execute("select * from table1")

# execute a second statement and receive results.  r2 now references the *same* resource as r1
r2 = db.execute("select * from table2")

for row in r1:
    ...
for row in r2:
    ...
# close r1.  the connection is still held by r2.
r1.close()

# close r2.  with no more references to the underlying connection resources, they
# are returned to the pool.
r2.close()

Where above, we again have two result sets in scope at the same time, but because they are present in the same thread, there is only one DBAPI connection in use.

While the above distinction may not seem like much, it has several potentially desireable effects. One is that you can in some cases reduce the number of concurrent connections checked out from the connection pool, in the case that a ResultProxy is still opened and a second statement is issued. A second advantage is that by limiting the number of checked out connections in a thread to just one, you eliminate the issue of deadlocks within a single thread, such as when connection A locks a table, and connection B attempts to read from the same table in the same thread, it will "deadlock" on waiting for connection A to release its lock; the threadlocal strategy eliminates this possibility.

A third advantage to the threadlocal strategy is that it allows the Transaction object to be used in combination with connectionless execution. Recall from the section on transactions, that the Transaction is returned by the begin() method on a Connection; all statements which wish to participate in this transaction must be executed by the same Connection, thereby forcing the usage of an explicit connection. However, the TLEngine provides a Transaction that is local to the current thread; using it, one can issue many "connectionless" statements within a thread and they will all automatically partake in the current transaction, as in the example below:

threadlocal connection sharing
# get a TLEngine
engine = create_engine('mysql://localhost/test', strategy='threadlocal')

engine.begin()
try:
    engine.execute("insert into users values (?, ?)", 1, "john")
    users.update(users.c.user_id==5).execute(name='ed')
    engine.commit()
except:
    engine.rollback()

Notice that no Connection needed to be used; the begin() method on TLEngine (which note is not available on the regular Engine) created a Transaction as well as a Connection, and held onto both in a context corresponding to the current thread. Each execute() call made use of the same connection, allowing them all to participate in the same transaction.

Complex application flows can take advantage of the "threadlocal" strategy in order to allow many disparate parts of an application to take place in the same transaction automatically. The example below demonstrates several forms of "connectionless execution" as well as some specialized explicit ones:

threadlocal connection sharing
engine = create_engine('mysql://localhost/test', strategy='threadlocal')

def dosomethingimplicit():
    table1.execute("some sql")
    table1.execute("some other sql")

def dosomethingelse():
    table2.execute("some sql")
    conn = engine.contextual_connect()
    # do stuff with conn
    conn.execute("some other sql")
    conn.close()

def dosomethingtransactional():
    conn = engine.contextual_connect()
    trans = conn.begin()
     # do stuff
    trans.commit()

engine.begin()
try:
    dosomethingimplicit()
    dosomethingelse()
    dosomethingtransactional()
    engine.commit()
except:
    engine.rollback()

In the above example, the program calls three functions dosomethingimplicit(), dosomethingelse() and dosomethingtransactional(). All three functions use either connectionless execution, or a special function contextual_connect() which we will describe in a moment. These two styles of execution both indicate that all executions will use the same connection object. Additionally, the method dosomethingtransactional() begins and commits its own Transaction. But only one transaction is used, too; it's controlled completely by the engine.begin()/engine.commit() calls at the bottom. Recall that Transaction supports "nesting" behavior, whereby transactions begun on a Connection which already has a tranasaction open, will "nest" into the enclosing transaction. Since the transaction opened in dosomethingtransactional() occurs using the same connection which already has a transaction begun, it "nests" into that transaction and therefore has no effect on the actual transaction scope (unless it calls rollback()).

Some of the functions in the above example make use of a method called engine.contextual_connect(). This method is available on both Engine as well as TLEngine, and returns the Connection that applies to the current connection context. When using the TLEngine, this is just another term for the "thread local connection" that is being used for all connectionless executions. When using just the regular Engine (i.e. the "default" strategy), contextual_connect() is synonymous with connect(). Below we illustrate that two connections opened via contextual_connect() at the same time, both reference the same underlying DBAPI connection:

Contextual Connection
# threadlocal strategy
db = create_engine('mysql://localhost/test', strategy='threadlocal')

conn1 = db.contextual_connect()
conn2 = db.contextual_connect()

>>> conn1.connection is conn2.connection
True

The basic idea of contextual_connect() is that its the "connection used by connectionless execution". It's different from the connect() method in that connect() is always used when handling an explicit Connection, which will always reference distinct DBAPI connection. Using connect() in combination with TLEngine allows one to "circumvent" the current thread local context, as in this example where a single statement issues data to the database externally to the current transaction:

engine.begin()
engine.execute("insert into users values (?, ?)", 1, "john")
connection = engine.connect()
connection.execute(users.update(users.c.user_id==5).execute(name='ed'))
engine.rollback()

In the above example, a thread-local transaction is begun, but is later rolled back. The statement insert into users values (?, ?) is executed without using a connection, therefore uses the thread-local transaction. So its data is rolled back when the transaction is rolled back. However, the users.update() statement is executed using a distinct Connection returned by the engine.connect() method, so it therefore is not part of the threadlocal transaction; it autocommits immediately.

back to section top

Configuring Logging

As of the 0.3 series of SQLAlchemy, Python's standard logging module is used to implement informational and debug log output. This allows SQLAlchemy's logging to integrate in a standard way with other applications and libraries. The echo and echo_pool flags that are present on create_engine(), as well as the echo_uow flag used on Session, all interact with regular loggers.

This section assumes familiarity with the above linked logging module. All logging performed by SQLAlchemy exists underneath the sqlalchemy namespace, as used by logging.getLogger('sqlalchemy'). When logging has been configured (i.e. such as via logging.basicConfig()), the general namespace of SA loggers that can be turned on is as follows:

For example, to log SQL queries as well as unit of work debugging:

import logging

logging.basicConfig()
logging.getLogger('sqlalchemy.engine').setLevel(logging.INFO)
logging.getLogger('sqlalchemy.orm.unitofwork').setLevel(logging.DEBUG)

By default, the log level is set to logging.ERROR within the entire sqlalchemy namespace so that no log operations occur, even within an application that has logging enabled otherwise.

The echo flags present as keyword arguments to create_engine() and others as well as the echo property on Engine, when set to True, will first attempt to ensure that logging is enabled. Unfortunately, the logging module provides no way of determining if output has already been configured (note we are referring to if a logging configuration has been set up, not just that the logging level is set). For this reason, any echo=True flags will result in a call to logging.basicConfig() using sys.stdout as the destination. It also sets up a default format using the level name, timestamp, and logger name. Note that this configuration has the affect of being configured in addition to any existing logger configurations. Therefore, when using Python logging, ensure all echo flags are set to False at all times, to avoid getting duplicate log lines.

back to section top
Previous: Tutorial | Next: Database Meta Data