SQLAlchemy 0.3 Documentation

Multiple Pages | One Page
Version: 0.3.10 Last Updated: 07/20/07 17:20:04

Table of Contents

   (view full table)

Table of Contents: Full

   (view brief table)

This tutorial provides a relatively simple walking tour through the basic concepts of SQLAlchemy. You may wish to skip it and dive into the main manual which is more reference-oriented. The examples in this tutorial comprise a fully working interactive Python session, and are guaranteed to be functioning courtesy of doctest.

Installation

Installing SQLAlchemy

Installing SQLAlchemy from scratch is most easily achieved with setuptools. (setuptools installation). Just run this from the command-line:

# easy_install SQLAlchemy

This command will download the latest version of SQLAlchemy from the Python Cheese Shop and install it to your system.

Otherwise, you can install from the distribution using the setup.py script:

# python setup.py install
back to section top

Installing a Database API

SQLAlchemy is designed to operate with a DBAPI implementation built for a particular database, and includes support for the most popular databases. If you have one of the supported DBAPI implementations, you can proceed to the following section. Otherwise SQLite is an easy-to-use database to get started with, which works with plain files or in-memory databases.

SQLite is included with Python 2.5 and greater.

If you are working with Python 2.3 or 2.4, SQLite and the Python API for SQLite can be installed from the following packages:

Note that the SQLite library download is not required with Windows, as the Windows Pysqlite library already includes it linked in. Pysqlite and SQLite can also be installed on Linux or FreeBSD via pre-made packages or from sources.

back to section top

Getting Started

Imports

To start connecting to databases and begin issuing queries, we want to import the base of SQLAlchemy's functionality, which is provided under the module name of sqlalchemy. For the purposes of this tutorial, we will import its full list of symbols into our own local namespace.

>>> from sqlalchemy import *

Note that importing using the * operator pulls all the names from sqlalchemy into the local module namespace, which in a real application can produce name conflicts. Therefore its recommended in practice to either import the individual symbols desired (i.e. from sqlalchemy import Table, Column) or to import under a distinct namespace (i.e. import sqlalchemy as sa).

back to section top

Connecting to the Database

After our imports, the next thing we need is a handle to the desired database, represented by an Engine object. This object handles the business of managing connections and dealing with the specifics of a particular database. Below, we will make a SQLite connection to a file-based database called "tutorial.db".

>>> db = create_engine('sqlite:///tutorial.db')

Technically, the above statement did not make an actual connection to the sqlite database just yet. As soon as we begine working with the engine, it will start creating connections. In the case of SQLite, the tutorial.db file will actually be created at the moment it is first used, if the file does not exist already.

For full information on creating database engines, including those for SQLite and others, see Database Engines.

back to section top

SQLAlchemy is Two Libraries in One

Now that the basics of installing SQLAlchemy and connecting to our database are established, we can start getting in to actually doing something. But first, a little bit of explanation is required.

A central concept of SQLAlchemy is that it actually contains two distinct areas of functionality, one of which builds upon the other. One is a SQL Construction Language and the other is an Object Relational Mapper ("ORM" for short). The SQL construction language allows you to construct objects called ClauseElements which represent SQL expressions. These ClauseElements can then be executed against any database, where they are compiled into strings that are appropriate for the target database, and return an object called a ResultProxy, which is essentially a result set object that acts very much like a deluxe version of the dbapi cursor object.

The Object Relational Mapper (ORM) is a set of tools completely distinct from the SQL Construction Language which serve the purpose of mapping Python object instances into database rows, providing a rich selection interface with which to retrieve instances from tables as well as a comprehensive solution to persisting changes on those instances back into the database. When working with the ORM, its underlying workings as well as its public API make extensive use of the SQL Construction Language, however the general theory of operation is slightly different. Instead of working with database rows directly, you work with your own user-defined classes and object instances. Additionally, the method of issuing queries to the database is different, as the ORM handles the job of generating most of the SQL required, and instead requires more information about what kind of class instances you'd like to load and where you'd like to put them.

Where SA is somewhat unique, more powerful, and slightly more complicated is that the two areas of functionality can be mixed together in many ways. A key strategy to working with SA effectively is to have a solid awareness of these two distinct toolsets, and which concepts of SA belong to each - even some publications have confused the SQL Construction Language with the ORM. The key difference between the two is that when you're working with cursor-like result sets its the SQL Construction Language, and when working with collections of your own class instances its the Object Relational Mapper.

This tutorial will first focus on the basic configuration that is common to using both the SQL Construction Language as well as the ORM, which is to declare information about your database called table metadata. This will be followed by some constructed SQL examples, and then into usage of the ORM utilizing the same data we established in the SQL construction examples.

back to section top

Working with Database Objects

Defining Metadata, Binding to Engines

Configuring SQLAlchemy for your database consists of creating objects called Tables, each of which represent an actual table in the database. A collection of Table objects resides in a MetaData object which is essentially a table collection. We will create a MetaData and connect it to our Engine (connecting a schema object to an Engine is called binding):

>>> metadata = MetaData()
>>> metadata.bind = db

An equivalent operation is to create the MetaData object directly with the Engine:

>>> metadata = MetaData(db)

Now, when we tell "metadata" about the tables in our database, we can issue CREATE statements for those tables, as well as execute SQL statements derived from them, without needing to open or close any connections; that will be all done automatically.

Note that SQLALchemy allows us to use explicit connection objects for everything, if we wanted to, and there are reasons why you might want to do this. But for the purposes of this tutorial, using bind removes the need for us to deal with explicit connections.

back to section top

Creating a Table

With metadata as our established home for tables, lets make a Table for it:

>>> users_table = Table('users', metadata,
...     Column('user_id', Integer, primary_key=True),
...     Column('user_name', String(40)),
...     Column('password', String(15))
... )

As you might have guessed, we have just defined a table named users which has three columns: user_id (which is a primary key column), user_name and password. Currently it is just an object that doesn't necessarily correspond to an existing table in our database. To actually create the table, we use the create() method. To make it interesting, we will have SQLAlchemy echo the SQL statements it sends to the database, by setting the echo flag on the Engine associated with our MetaData:

>>> metadata.engine.echo = True
>>> users_table.create() 
CREATE TABLE users (
    user_id INTEGER NOT NULL,
    user_name VARCHAR(40),
    password VARCHAR(15),
    PRIMARY KEY (user_id)
)
...

Alternatively, the users table might already exist (such as, if you're running examples from this tutorial for the second time), in which case you can just skip the create() method call. You can even skip defining the individual columns in the users table and ask SQLAlchemy to load its definition from the database:

>>> users_table = Table('users', metadata, autoload=True)
>>> list(users_table.columns)[0].name
'user_id'

Loading a table's columns from the database is called reflection. Documentation on table metadata, including reflection, is available in Database Meta Data.

back to section top

Inserting Rows

Inserting is achieved via the insert() method, which defines a clause object (known as a ClauseElement) representing an INSERT statement:

>>> i = users_table.insert()
>>> i 
<sqlalchemy.sql._Insert object at 0x...>
>>> # the string form of the Insert object is a generic SQL representation
>>> print i
INSERT INTO users (user_id, user_name, password) VALUES (?, ?, ?)

Since we created this insert statement object from the users table which is bound to our Engine, the statement itself is also bound to the Engine, and supports executing itself. The execute() method of the clause object will compile the object into a string according to the underlying dialect of the Engine to which the statement is bound, and will then execute the resulting statement.

>>> # insert a single row
>>> i.execute(user_name='Mary', password='secure') 
INSERT INTO users (user_name, password) VALUES (?, ?)
['Mary', 'secure']
COMMIT
<sqlalchemy.engine.base.ResultProxy object at 0x...>

>>> # insert multiple rows simultaneously
>>> i.execute({'user_name':'Tom'}, {'user_name':'Fred'}, {'user_name':'Harry'}) 
INSERT INTO users (user_name) VALUES (?)
[['Tom'], ['Fred'], ['Harry']]
COMMIT
<sqlalchemy.engine.base.ResultProxy object at 0x...>

Note that the VALUES clause of each INSERT statement was automatically adjusted to correspond to the parameters sent to the execute() method. This is because the compilation step of a ClauseElement takes into account not just the constructed SQL object and the specifics of the type of database being used, but the execution parameters sent along as well.

When constructing clause objects, SQLAlchemy will bind all literal values into bind parameters. On the construction side, bind parameters are always treated as named parameters. At compilation time, SQLAlchemy will convert them into their proper format, based on the paramstyle of the underlying DBAPI. This works equally well for all named and positional bind parameter formats described in the DBAPI specification.

Documentation on inserting: Inserts.

back to section top

Selecting

Let's check that the data we have put into users table is actually there. The procedure is analogous to the insert example above, except you now call the select() method off the users table:

>>> s = users_table.select()
>>> print s
SELECT users.user_id, users.user_name, users.password 
FROM users
>>> r = s.execute()
SELECT users.user_id, users.user_name, users.password 
FROM users
[]

This time, we won't ignore the return value of execute(). Its an instance of ResultProxy, which is a result-holding object that behaves very similarly to the cursor object one deals with directly with a database API:

>>> r 
<sqlalchemy.engine.base.ResultProxy object at 0x...>
>>> r.fetchone()
(1, u'Mary', u'secure')
>>> r.fetchall()
[(2, u'Tom', None), (3, u'Fred', None), (4, u'Harry', None)]

Query criterion for the select is specified using Python expressions, using the Column objects in the Table as a base. All expressions constructed from Column objects are themselves instances of ClauseElements, just like the Select, Insert, and Table objects themselves.

>>> r = users_table.select(users_table.c.user_name=='Harry').execute()
SELECT users.user_id, users.user_name, users.password 
FROM users 
WHERE users.user_name = ?
['Harry']
>>> row = r.fetchone()
>>> print row
(4, u'Harry', None)

Pretty much the full range of standard SQL operations are supported as constructed Python expressions, including joins, ordering, grouping, functions, correlated subqueries, unions, etc. Documentation on selecting: Simple Select.

back to section top

Working with Rows

You can see that when we print out the rows returned by an execution result, it prints the rows as tuples. These rows support both the list and dictionary interfaces. The dictionary interface allows the addressing of columns by string column name, or even the original Column object:

>>> row.keys()
['user_id', 'user_name', 'password']
>>> row['user_id'], row[1], row[users_table.c.password] 
(4, u'Harry', None)

Addressing the columns in a row based on the original Column object is especially handy, as it eliminates the need to work with literal column names altogether.

Result sets also support iteration. We'll show this with a slightly different form of select that allows you to specify the specific columns to be selected:

>>> for row in select([users_table.c.user_id, users_table.c.user_name]).execute(): 
...     print row
SELECT users.user_id, users.user_name
FROM users
[]
(1, u'Mary')
(2, u'Tom')
(3, u'Fred')
(4, u'Harry')
back to section top

Table Relationships

Lets create a second table, email_addresses, which references the users table. To define the relationship between the two tables, we will use the ForeignKey construct. We will also issue the CREATE statement for the table:

>>> email_addresses_table = Table('email_addresses', metadata,
...     Column('address_id', Integer, primary_key=True),
...     Column('email_address', String(100), nullable=False),
...     Column('user_id', Integer, ForeignKey('users.user_id')))
>>> email_addresses_table.create() 
CREATE TABLE email_addresses (
    address_id INTEGER NOT NULL,
    email_address VARCHAR(100) NOT NULL,
    user_id INTEGER,
    PRIMARY KEY (address_id),
    FOREIGN KEY(user_id) REFERENCES users (user_id)
)
...

Above, the email_addresses table is related to the users table via the ForeignKey('users.user_id'). The ForeignKey constructor can take a Column object or a string representing the table and column name. When using the string argument, the referenced table must exist within the same MetaData object; thats where it looks for the other table!

Next, lets put a few rows in:

>>> email_addresses_table.insert().execute(
...     {'email_address':'tom@tom.com', 'user_id':2},
...     {'email_address':'mary@mary.com', 'user_id':1}) 
INSERT INTO email_addresses (email_address, user_id) VALUES (?, ?)
[['tom@tom.com', 2], ['mary@mary.com', 1]]
COMMIT
<sqlalchemy.engine.base.ResultProxy object at 0x...>

With two related tables, we can now construct a join amongst them using the join method:

>>> r = users_table.join(email_addresses_table).select(order_by=users_table.c.user_id).execute()
SELECT users.user_id, users.user_name, users.password, email_addresses.address_id, email_addresses.email_address, email_addresses.user_id 
FROM users JOIN email_addresses ON users.user_id = email_addresses.user_id ORDER BY users.user_id
[]
>>> print [row for row in r]
[(1, u'Mary', u'secure', 2, u'mary@mary.com', 1), (2, u'Tom', None, 1, u'tom@tom.com', 2)]

The join method is also a standalone function in the sqlalchemy namespace. The join condition is figured out from the foreign keys of the Table objects given. The condition (also called the "ON clause") can be specified explicitly, such as in this example which creates a join representing all users that used their email address as their password:

>>> print join(users_table, email_addresses_table, 
...     and_(users_table.c.user_id==email_addresses_table.c.user_id, 
...     users_table.c.password==email_addresses_table.c.email_address)
...     )
users JOIN email_addresses ON users.user_id = email_addresses.user_id AND users.password = email_addresses.email_address
back to section top

Working with Object Mappers

Now that we have a little bit of Table and SQL operations covered, lets look into SQLAlchemy's ORM (object relational mapper). With the ORM, you associate Tables (and other Selectable units, like queries and table aliases) with Python classes, into units called Mappers. Then you can execute queries that return lists of object instances, instead of result sets. The object instances themselves are associated with an object called a Session, which automatically tracks changes on each object and supports a "save all at once" operation called a flush.

To start, we will import the names necessary to use SQLAlchemy's ORM, again using import * for simplicities sake, even though we all know that in real life we should be importing individual names via "from sqlalchemy.orm import symbol1, symbol2, ..." or "import sqlalchemy.orm as orm":

>>> from sqlalchemy.orm import *

It should be noted that the above step is technically not needed when working with the 0.3 series of SQLAlchemy; all symbols from the orm package are also included in the sqlalchemy package. However, a future release (most likely the 0.4 series) will make the separate orm import required in order to use the object relational mapper, so its a good practice for now.

Creating a Mapper

A Mapper is usually created once per Python class, and at its core primarily means to say, "objects of this class are to be stored as rows in this table". Lets create a class called User, which will represent a user object that is stored in our users table:

>>> class User(object):
...     def __repr__(self):
...        return "%s(%r,%r)" % (
...            self.__class__.__name__, self.user_name, self.password)

The class is a new style class (i.e. it extends object) and does not require a constructor (although one may be provided if desired). We just have one __repr__ method on it which will display basic information about the User. Note that the __repr__ method references the instance variables user_name and password which otherwise aren't defined. While we are free to explicitly define these attributes and treat them normally, this is optional; as SQLAlchemy's Mapper construct will manage them for us, since their names correspond to the names of columns in the users table. Lets create a mapper, and observe that these attributes are now defined:

>>> mapper(User, users_table) 
<sqlalchemy.orm.mapper.Mapper object at 0x...>
>>> u1 = User()
>>> print u1.user_name
None
>>> print u1.password
None

The mapper function returns a new instance of Mapper. As it is the first Mapper we have created for the User class, it is known as the classes' primary mapper. We generally don't need to hold onto the return value of the mapper function; SA can automatically locate this Mapper as needed when it deals with the User class.

back to section top

Obtaining a Session

After you create a Mapper, all operations with that Mapper require the usage of an important object called a Session. All objects loaded or saved by the Mapper must be attached to a Session object, which represents a kind of "workspace" of objects that are loaded into memory. A particular object instance can only be attached to one Session at a time (but of course can be moved around or detached altogether).

By default, you have to create a Session object explicitly before you can load or save objects. Theres several ways to manage sessions, but the most straightforward is to just create one, which we will do by saying, create_session():

>>> session = create_session()
>>> session 
<sqlalchemy.orm.session.Session object at 0x...>
back to section top

The Query Object

The Session has all kinds of methods on it to manage and inspect its collection of objects. The Session also provides an easy interface which can be used to query the database, by giving you an instance to a Query object corresponding to a particular Python class:

>>> query = session.query(User)
>>> print query.filter_by(user_name='Harry').all()
SELECT users.user_name AS users_user_name, users.password AS users_password, users.user_id AS users_user_id 
FROM users 
WHERE users.user_name = ? ORDER BY users.oid
['Harry']
[User(u'Harry',None)]

All querying for objects is performed via an instance of Query. The various select methods on an instance of Mapper also use an underlying Query object to perform the operation. A Query is always bound to a specific Session.

Lets turn off the database echoing for a moment, and try out a few methods on Query. The two methods used to narrow results are filter() and filter_by(), and the two most common methods used to load results are all() and first(). The get() method is used for a quick lookup by primary key. filter_by() works with keyword arguments, and filter() works with ClauseElement objects, which are constructed by using Column objects inside of Python expressions, in the same way as we did with our SQL select example in the previous section of this tutorial. Using ClauseElement structures to query objects is more verbose but more flexible:

>>> metadata.engine.echo = False
>>> print query.filter(User.c.user_id==3).all()
[User(u'Fred',None)]
>>> print query.get(2)
User(u'Tom',None)
>>> print query.filter_by(user_name='Mary').first()
User(u'Mary',u'secure')
>>> print query.filter(User.c.password==None).first()
User(u'Tom',None)
>>> print query.count()
4

Notice that our User class has a special attribute c attached to it. This 'c' represents the columns on the User's mapper's Table object. Saying User.c.user_name is synonymous with saying users_table.c.user_name, recalling that User is the Python class and users_table is our Table object.

back to section top

Making Changes

With a little experience in loading objects, lets see what its like to make changes. First, lets create a new user "Ed". We do this by just constructing the new object. Then, we just add it to the session:

>>> ed = User()
>>> ed.user_name = 'Ed'
>>> ed.password = 'edspassword'
>>> session.save(ed)
>>> ed in session
True

Lets also make a few changes on some of the objects in the database. We will load them with our Query object, and then change some things.

>>> mary = query.filter_by(user_name='Mary').first()
>>> harry = query.filter_by(user_name='Harry').first()
>>> mary.password = 'marysnewpassword'
>>> harry.password = 'harrysnewpassword'

At the moment, nothing has been saved to the database; all of our changes are in memory only. What happens if some other part of the application also tries to load 'Mary' from the database and make some changes before we had a chance to save it ? Assuming that the same Session is used, loading 'Mary' from the database a second time will issue a second query in order locate the primary key of 'Mary', but will return the same object instance as the one already loaded. This behavior is due to an important property of the Session known as the identity map:

>>> mary2 = query.filter_by(user_name='Mary').first()
>>> mary is mary2
True

With the identity map, a single Session can be relied upon to keep all loaded instances straight.

As far as the issue of the same object being modified in two different Sessions, that's an issue of concurrency detection; SQLAlchemy does some basic concurrency checks when saving objects, with the option for a stronger check using version ids. See advdatamapping_arguments for more details.

back to section top

Saving

With a new user "ed" and some changes made on "Mary" and "Harry", lets also mark "Fred" as deleted:

>>> fred = query.filter_by(user_name='Fred').first()
>>> session.delete(fred)

Then to send all of our changes to the database, we flush() the Session. Lets turn echo back on to see this happen!:

>>> metadata.engine.echo = True
>>> session.flush()
BEGIN
UPDATE users SET password=? WHERE users.user_id = ?
['marysnewpassword', 1]
UPDATE users SET password=? WHERE users.user_id = ?
['harrysnewpassword', 4]
INSERT INTO users (user_name, password) VALUES (?, ?)
['Ed', 'edspassword']
DELETE FROM users WHERE users.user_id = ?
[3]
COMMIT
back to section top

Relationships

When our User object contains relationships to other kinds of information, such as a list of email addresses, we can indicate this by using a function when creating the Mapper called relation(). While there is a lot you can do with relations, we'll cover a simple one here. First, recall that our users table has a foreign key relationship to another table called email_addresses. A single row in email_addresses has a column user_id that references a row in the users table; since many rows in the email_addresses table can reference a single row in users, this is called a one to many relationship.

To illustrate this relationship, we will start with a new mapper configuration. Since our User class has a mapper assigned to it, we want to discard it and start over again. So we issue the clear_mappers() function first, which removes all mapping associations from classes:

>>> clear_mappers()

When removing mappers, it is usually best to remove all mappings at the same time, since mappers usually have relationships to each other which will become invalid if only part of the mapper collection is removed. In practice, a particular mapping setup will usually remain throughout the lifetime of an application. Clearing out the mappers and making new ones is a practice that is generally limited to writing mapper unit tests and experimenting from the console.

Next, we want to create a class/mapping that corresponds to the email_addresses table. We will create a new class Address which represents a single row in the email_addresses table, and a corresponding Mapper which will associate the Address class with the email_addresses table:

>>> class Address(object):
...     def __init__(self, email_address):
...         self.email_address = email_address
...     def __repr__(self):
...         return "%s(%r)" % (
...            self.__class__.__name__, self.email_address)    
>>> mapper(Address, email_addresses_table) 
<sqlalchemy.orm.mapper.Mapper object at 0x...>

We then create a mapper for the User class which contains a relationship to the Address class using the relation() function:

>>> mapper(User, users_table, properties={ 
...    'addresses':relation(Address)
... })
<sqlalchemy.orm.mapper.Mapper object at 0x...>

The relation() function takes either a class or a Mapper as its first argument, and has many options to further control its behavior. When this mapping relationship is used, each new User instance will contain an attribute called addresses. SQLAlchemy will automatically determine that this relationship is a one-to-many relationship, and will subsequently create addresses as a list. When a new User is created, this list will begin as empty.

The order in which the mapping definitions for User and Address is created is not significant. When the mapper() function is called, it creates an uncompiled mapping record corresponding to the given class/table combination. When the mappers are first used, the entire collection of mappers created up until that point will be compiled, which involves the establishment of class instrumentation as well as the resolution of all mapping relationships.

Lets try out this new mapping configuration, and see what we get for the email addresses already in the database. Since we have made a new mapping configuration, its best that we clear out our Session, which is currently holding onto every User object we have already loaded:

>>> session.clear()

We can then treat the addresses attribute on each User object like a regular list:

>>> mary = query.filter_by(user_name='Mary').first() 
SELECT users.user_name AS users_user_name, users.password AS users_password, users.user_id AS users_user_id 
FROM users 
WHERE users.user_name = ? ORDER BY users.oid 
LIMIT 1 OFFSET 0
['Mary']
>>> print [a for a in mary.addresses]
SELECT email_addresses.user_id AS email_addresses_user_id, email_addresses.address_id AS email_addresses_address_id, email_addresses.email_address AS email_addresses_email_address 
FROM email_addresses 
WHERE ? = email_addresses.user_id ORDER BY email_addresses.oid
[1]
[Address(u'mary@mary.com')]

Adding to the list is just as easy. New Address objects will be detected and saved when we flush the Session:

>>> mary.addresses.append(Address('mary2@gmail.com'))
>>> session.flush() 
BEGIN
INSERT INTO email_addresses (email_address, user_id) VALUES (?, ?)
['mary2@gmail.com', 1]
COMMIT

Main documentation for using mappers: Data Mapping

back to section top

Transactions

You may have noticed from the example above that when we say session.flush(), SQLAlchemy indicates the names BEGIN and COMMIT to indicate a transaction with the database. The flush() method, since it may execute many statements in a row, will automatically use a transaction in order to execute these instructions. But what if we want to use flush() inside of a larger transaction? This is performed via the SessionTransaction object, which we can establish using session.create_transaction(). Below, we will perform a more complicated SELECT statement, make several changes to our collection of users and email addresess, and then create a new user with two email addresses, within the context of a transaction. We will perform a flush() in the middle of it to write the changes we have so far, and then allow the remaining changes to be written when we finally commit() the transaction. We enclose our operations within a try/except block to ensure that resources are properly freed:

>>> transaction = session.create_transaction()
>>> try: 
...     (ed, harry, mary) = session.query(User).filter(
...         User.c.user_name.in_('Ed', 'Harry', 'Mary')
...     ).order_by(User.c.user_name).all()
...     del mary.addresses[1]
...     harry.addresses.append(Address('harry2@gmail.com'))
...     session.flush()
...     print "***flushed the session***"
...     fred = User()
...     fred.user_name = 'fred_again'
...     fred.addresses.append(Address('fred@fred.com'))
...     fred.addresses.append(Address('fredsnewemail@fred.com'))
...     session.save(fred)
...     transaction.commit()
... except:
...     transaction.rollback()
...     raise
BEGIN
SELECT users.user_name AS users_user_name, users.password AS users_password, users.user_id AS users_user_id 
FROM users 
WHERE users.user_name IN (?, ?, ?) ORDER BY users.user_name
['Ed', 'Harry', 'Mary']
SELECT email_addresses.user_id AS email_addresses_user_id, email_addresses.address_id AS email_addresses_address_id, email_addresses.email_address AS email_addresses_email_address 
FROM email_addresses 
WHERE ? = email_addresses.user_id ORDER BY email_addresses.oid
[4]
UPDATE email_addresses SET user_id=? WHERE email_addresses.address_id = ?
[None, 3]
INSERT INTO email_addresses (email_address, user_id) VALUES (?, ?)
['harry2@gmail.com', 4]
***flushed the session***    
INSERT INTO users (user_name, password) VALUES (?, ?)
['fred_again', None]
INSERT INTO email_addresses (email_address, user_id) VALUES (?, ?)
['fred@fred.com', 6]
INSERT INTO email_addresses (email_address, user_id) VALUES (?, ?)
['fredsnewemail@fred.com', 6]
COMMIT

The SessionTransaction process above is due to be greatly simplified in version 0.4 of SQLAlchemy, where the Session will be able to wrap its whole lifespan in a transaction automatically.

Main documentation: Session / Unit of Work

back to section top

Next Steps

That covers a quick tour through the basic idea of SQLAlchemy, in its simplest form. Beyond that, one should familiarize oneself with the basics of Sessions, the various patterns that can be used to define different kinds of Mappers and relations among them, the rudimentary SQL types that are available when constructing Tables, and the basics of Engines, SQL statements, and database Connections.

back to section top

The Engine is the starting point for any SQLAlchemy application. It's "home base" for the actual database and its DBAPI, delivered to the SQLAlchemy application through a connection pool and a Dialect, which describes how to talk to a specific kind of database and DBAPI combination.

The general structure is this:

                                     +-----------+                        __________
                                 /---|   Pool    |---\                   (__________)
             +-------------+    /    +-----------+    \     +--------+   |          |
connect() <--|   Engine    |---x                       x----| DBAPI  |---| database |
             +-------------+    \    +-----------+    /     +--------+   |          |
                                 \---|  Dialect  |---/                   |__________|
                                     +-----------+                       (__________)

Where above, a sqlalchemy.engine.Engine references both a sqlalchemy.engine.Dialect and sqlalchemy.pool.Pool, which together interpret the DBAPI's module functions as well as the behavior of the database.

Creating an engine is just a matter of issuing a single call, create_engine():

engine = create_engine('postgres://scott:tiger@localhost:5432/mydatabase')

The above engine invokes the postgres dialect and a connection pool which references localhost:5432.

The engine can be used directly to issue SQL to the database. The most generic way is to use connections, which you get via the connect() method:

connection = engine.connect()
result = connection.execute("select username from users")
for row in result:
    print "username:", row['username']
connection.close()

The connection is an instance of sqlalchemy.engine.Connection, which is a proxy object for an actual DBAPI connection. The returned result is an instance of sqlalchemy.engine.ResultProxy, which acts very much like a DBAPI cursor.

When you say engine.connect(), a new Connection object is created, and a DBAPI connection is retrieved from the connection pool. Later, when you call connection.close(), the DBAPI connection is returned to the pool; nothing is actually "closed" from the perspective of the database.

To execute some SQL more quickly, you can skip the Connection part and just say:

result = engine.execute("select username from users")
for row in result:
    print "username:", row['username']
result.close()

Where above, the execute() method on the Engine does the connect() part for you, and returns the ResultProxy directly. The actual Connection is inside the ResultProxy, waiting for you to finish reading the result. In this case, when you close() the ResultProxy, the underlying Connection is closed, which returns the DBAPI connection to the pool.

To summarize the above two examples, when you use a Connection object, its known as explicit execution. When you don't see the Connection object, but you still use the execute() method on the Engine, its called explicit, connectionless execution. A third variant of execution also exists called implicit execution; this will be described later.

The Engine and Connection can do a lot more than what we illustrated above; SQL strings are only its most rudimental function. Later chapters will describe how "constructed SQL" expressions can be used with engines; in many cases, you don't have to deal with the Engine at all after it's created. The Object Relational Mapper (ORM), an optional feature of SQLAlchemy, also uses the Engine in order to get at connections; that's also a case where you can often create the engine once, and then forget about it.

Supported Databases

Recall that the Dialect is used to describe how to talk to a specific kind of database. Dialects are included with SQLAlchemy for SQLite, Postgres, MySQL, MS-SQL, Firebird, Informix, and Oracle; these can each be seen as a Python module present in the sqlalchemy.databases package. Each dialect requires the appropriate DBAPI drivers to be installed separately.

Downloads for each DBAPI at the time of this writing are as follows:

The SQLAlchemy Wiki contains a page of database notes, describing whatever quirks and behaviors have been observed. Its a good place to check for issues with specific databases. Database Notes

back to section top

create_engine() URL Arguments

SQLAlchemy indicates the source of an Engine strictly via RFC-1738 style URLs, combined with optional keyword arguments to specify options for the Engine. The form of the URL is:

driver://username:password@host:port/database

Available drivernames are sqlite, mysql, postgres, oracle, mssql, and firebird. For sqlite, the database name is the filename to connect to, or the special name ":memory:" which indicates an in-memory database. The URL is typically sent as a string to the create_engine() function:

# postgres
pg_db = create_engine('postgres://scott:tiger@localhost:5432/mydatabase')

# sqlite (note the four slashes for an absolute path)
sqlite_db = create_engine('sqlite:////absolute/path/to/database.txt')
sqlite_db = create_engine('sqlite:///relative/path/to/database.txt')
sqlite_db = create_engine('sqlite://')  # in-memory database
sqlite_db = create_engine('sqlite://:memory:')  # the same

# mysql
mysql_db = create_engine('mysql://localhost/foo')

# oracle via TNS name
oracle_db = create_engine('oracle://scott:tiger@dsn')

# oracle will feed host/port/SID into cx_oracle.makedsn
oracle_db = create_engine('oracle://scott:tiger@127.0.0.1:1521/sidname')

The Engine will ask the connection pool for a connection when the connect() or execute() methods are called. The default connection pool, QueuePool, as well as the default connection pool used with SQLite, SingletonThreadPool, will open connections to the database on an as-needed basis. As concurrent statements are executed, QueuePool will grow its pool of connections to a default size of five, and will allow a default "overflow" of ten. Since the Engine is essentially "home base" for the connection pool, it follows that you should keep a single Engine per database established within an application, rather than creating a new one for each connection.

Custom DBAPI connect() arguments

Custom arguments used when issuing the connect() call to the underlying DBAPI may be issued in three distinct ways. String-based arguments can be passed directly from the URL string as query arguments:

db = create_engine('postgres://scott:tiger@localhost/test?argument1=foo&argument2=bar')

If SQLAlchemy's database connector is aware of a particular query argument, it may convert its type from string to its proper type.

create_engine also takes an argument connect_args which is an additional dictionary that will be passed to connect(). This can be used when arguments of a type other than string are required, and SQLAlchemy's database connector has no type conversion logic present for that parameter:

db = create_engine('postgres://scott:tiger@localhost/test', connect_args = {'argument1':17, 'argument2':'bar'})

The most customizable connection method of all is to pass a creator argument, which specifies a callable that returns a DBAPI connection:

def connect():
    return psycopg.connect(user='scott', host='localhost')

db = create_engine('postgres://', creator=connect)
back to section top

Database Engine Options

Keyword options can also be specified to create_engine(), following the string URL as follows:

db = create_engine('postgres://...', encoding='latin1', echo=True)

A list of all standard options, as well as several that are used by particular database dialects, is as follows:

back to section top

More On Connections

Recall from the beginning of this section that the Engine provides a connect() method which returns a Connection object. Connection is a proxy object which maintains a reference to a DBAPI connection instance. The close() method on Connection does not actually close the DBAPI connection, but instead returns it to the connection pool referenced by the Engine. Connection will also automatically return its resources to the connection pool when the object is garbage collected, i.e. its __del__() method is called. When using the standard C implementation of Python, this method is usually called immediately as soon as the object is dereferenced. With other Python implementations such as Jython, this is not so guaranteed.

The execute() methods on both Engine and Connection can also receive SQL clause constructs as well:

connection = engine.connect()
result = connection.execute(select([table1], table1.c.col1==5))
for row in result:
    print row['col1'], row['col2']
connection.close()

The above SQL construct is known as a select(). The full range of SQL constructs available are described in Constructing SQL Queries via Python Expressions.

Both Connection and Engine fulfill an interface known as Connectable which specifies common functionality between the two objects, namely being able to call connect() to return a Connection object (Connection just returns itself), and being able to call execute() to get a result set. Following this, most SQLAlchemy functions and objects which accept an Engine as a parameter or attribute with which to execute SQL will also accept a Connection. As of SQLAlchemy 0.3.9, this argument is named bind.

Specify Engine or Connection
engine = create_engine('sqlite:///:memory:')

# specify some Table metadata
metadata = MetaData()
table = Table('sometable', metadata, Column('col1', Integer))

# create the table with the Engine
table.create(bind=engine)

# drop the table with a Connection off the Engine
connection = engine.connect()
table.drop(bind=connection)

Connection facts:

back to section top

Using Transactions with Connection

The Connection object provides a begin() method which returns a Transaction object. This object is usually used within a try/except clause so that it is guaranteed to rollback() or commit():

trans = connection.begin()
try:
    r1 = connection.execute(table1.select())
    connection.execute(table1.insert(), col1=7, col2='this is some data')
    trans.commit()
except:
    trans.rollback()
    raise

The Transaction object also handles "nested" behavior by keeping track of the outermost begin/commit pair. In this example, two functions both issue a transaction on a Connection, but only the outermost Transaction object actually takes effect when it is committed.

# method_a starts a transaction and calls method_b
def method_a(connection):
    trans = connection.begin() # open a transaction
    try:
        method_b(connection)
        trans.commit()  # transaction is committed here
    except:
        trans.rollback() # this rolls back the transaction unconditionally
        raise

# method_b also starts a transaction
def method_b(connection):
    trans = connection.begin() # open a transaction - this runs in the context of method_a's transaction
    try:
        connection.execute("insert into mytable values ('bat', 'lala')")
        connection.execute(mytable.insert(), col1='bat', col2='lala')
        trans.commit()  # transaction is not committed yet
    except:
        trans.rollback() # this rolls back the transaction unconditionally
        raise

# open a Connection and call method_a
conn = engine.connect()                
method_a(conn)
conn.close()

Above, method_a is called first, which calls connection.begin(). Then it calls method_b. When method_b calls connection.begin(), it just increments a counter that is decremented when it calls commit(). If either method_a or method_b calls rollback(), the whole transaction is rolled back. The transaction is not committed until method_a calls the commit() method. This "nesting" behavior allows the creation of functions which "guarantee" that a transaction will be used if one was not already available, but will automatically participate in an enclosing transaction if one exists.

Note that SQLAlchemy's Object Relational Mapper also provides a way to control transaction scope at a higher level; this is described in SessionTransaction.

Transaction Facts:

Understanding Autocommit

The above transaction example illustrates how to use Transaction so that several executions can take part in the same transaction. What happens when we issue an INSERT, UPDATE or DELETE call without using Transaction? The answer is autocommit. While many DBAPIs implement a flag called autocommit, the current SQLAlchemy behavior is such that it implements its own autocommit. This is achieved by searching the statement for strings like INSERT, UPDATE, DELETE, etc. and then issuing a COMMIT automatically if no transaction is in progress.

conn = engine.connect()
conn.execute("INSERT INTO users VALUES (1, 'john')")  # autocommits
back to section top

Connectionless Execution, Implicit Execution

Recall from the first section we mentioned executing with and without a Connection. Connectionless execution refers to calling the execute() method on an object which is not a Connection, which could be on the the Engine itself, or could be a constructed SQL object. When we say "implicit", we mean that we are calling the execute() method on an object which is neither a Connection nor an Engine object; this can only be used with constructed SQL objects which have their own execute() method, and can be "bound" to an Engine. A description of "constructed SQL objects" may be found in Constructing SQL Queries via Python Expressions.

A summary of all three methods follows below. First, assume the usage of the following MetaData and Table objects; while we haven't yet introduced these concepts, for now you only need to know that we are representing a database table, and are creating an "executeable" SQL construct which issues a statement to the database. These objects are described in Database Meta Data.

meta = MetaData()
users_table = Table('users', meta, 
    Column('id', Integer, primary_key=True), 
    Column('name', String(50))
)

Explicit execution delivers the SQL text or constructed SQL expression to the execute() method of Connection:

engine = create_engine('sqlite:///file.db')
connection = engine.connect()
result = connection.execute(users_table.select())
for row in result:
    # ....
connection.close()

Explicit, connectionless execution delivers the expression to the execute() method of Engine:

engine = create_engine('sqlite:///file.db')
result = engine.execute(users_table.select())
for row in result:
    # ....
result.close()

Implicit execution is also connectionless, and calls the execute() method on the expression itself, utilizing the fact that either an Engine or Connection has been bound to the expression object (binding is discussed further in the next section, Database Meta Data):

engine = create_engine('sqlite:///file.db')
meta.connect(engine)
result = users_table.select().execute()
for row in result:
    # ....
result.close()

In both "connectionless" examples, the Connection is created behind the scenes; the ResultProxy returned by the execute() call references the Connection used to issue the SQL statement. When we issue close() on the ResultProxy, or if the result set object falls out of scope and is garbage collected, the underlying Connection is closed for us, resulting in the DBAPI connection being returned to the pool.

Using the Threadlocal Execution Strategy

With connectionless execution, each returned ResultProxy object references its own distinct DBAPI connection object. This means that multiple executions will result in multiple DBAPI connections being used at the same time; the example below illustrates this:

db = create_engine('mysql://localhost/test')

# execute one statement and receive results.  r1 now references a DBAPI connection resource.
r1 = db.execute("select * from table1")

# execute a second statement and receive results.  r2 now references a *second* DBAPI connection resource.
r2 = db.execute("select * from table2")
for row in r1:
    ...
for row in r2:
    ...
# release connection 1
r1.close()

# release connection 2
r2.close()

Where above, we have two result sets in scope at the same time, therefore we have two distinct DBAPI connections, both separately checked out from the connection pool, in scope at the same time.

An option exists to create_engine() called strategy="threadlocal", which changes this behavior. When this option is used, the Engine which is returned by create_engine() is a special subclass of engine called TLEngine. This engine, when it creates the Connection used by a connectionless execution, checks a threadlocal variable for an existing DBAPI connection that was already checked out from the pool, within the current thread. If one exists, it uses that one.

The usage of "threadlocal" modifies the underlying behavior of our example above, as follows:

Threadlocal Strategy
db = create_engine('mysql://localhost/test', strategy='threadlocal')

# execute one statement and receive results.  r1 now references a DBAPI connection resource.
r1 = db.execute("select * from table1")

# execute a second statement and receive results.  r2 now references the *same* resource as r1
r2 = db.execute("select * from table2")

for row in r1:
    ...
for row in r2:
    ...
# close r1.  the connection is still held by r2.
r1.close()

# close r2.  with no more references to the underlying connection resources, they
# are returned to the pool.
r2.close()

Where above, we again have two result sets in scope at the same time, but because they are present in the same thread, there is only one DBAPI connection in use.

While the above distinction may not seem like much, it has several potentially desireable effects. One is that you can in some cases reduce the number of concurrent connections checked out from the connection pool, in the case that a ResultProxy is still opened and a second statement is issued. A second advantage is that by limiting the number of checked out connections in a thread to just one, you eliminate the issue of deadlocks within a single thread, such as when connection A locks a table, and connection B attempts to read from the same table in the same thread, it will "deadlock" on waiting for connection A to release its lock; the threadlocal strategy eliminates this possibility.

A third advantage to the threadlocal strategy is that it allows the Transaction object to be used in combination with connectionless execution. Recall from the section on transactions, that the Transaction is returned by the begin() method on a Connection; all statements which wish to participate in this transaction must be executed by the same Connection, thereby forcing the usage of an explicit connection. However, the TLEngine provides a Transaction that is local to the current thread; using it, one can issue many "connectionless" statements within a thread and they will all automatically partake in the current transaction, as in the example below:

threadlocal connection sharing
# get a TLEngine
engine = create_engine('mysql://localhost/test', strategy='threadlocal')

engine.begin()
try:
    engine.execute("insert into users values (?, ?)", 1, "john")
    users.update(users.c.user_id==5).execute(name='ed')
    engine.commit()
except:
    engine.rollback()

Notice that no Connection needed to be used; the begin() method on TLEngine (which note is not available on the regular Engine) created a Transaction as well as a Connection, and held onto both in a context corresponding to the current thread. Each execute() call made use of the same connection, allowing them all to participate in the same transaction.

Complex application flows can take advantage of the "threadlocal" strategy in order to allow many disparate parts of an application to take place in the same transaction automatically. The example below demonstrates several forms of "connectionless execution" as well as some specialized explicit ones:

threadlocal connection sharing
engine = create_engine('mysql://localhost/test', strategy='threadlocal')

def dosomethingimplicit():
    table1.execute("some sql")
    table1.execute("some other sql")

def dosomethingelse():
    table2.execute("some sql")
    conn = engine.contextual_connect()
    # do stuff with conn
    conn.execute("some other sql")
    conn.close()

def dosomethingtransactional():
    conn = engine.contextual_connect()
    trans = conn.begin()
     # do stuff
    trans.commit()

engine.begin()
try:
    dosomethingimplicit()
    dosomethingelse()
    dosomethingtransactional()
    engine.commit()
except:
    engine.rollback()

In the above example, the program calls three functions dosomethingimplicit(), dosomethingelse() and dosomethingtransactional(). All three functions use either connectionless execution, or a special function contextual_connect() which we will describe in a moment. These two styles of execution both indicate that all executions will use the same connection object. Additionally, the method dosomethingtransactional() begins and commits its own Transaction. But only one transaction is used, too; it's controlled completely by the engine.begin()/engine.commit() calls at the bottom. Recall that Transaction supports "nesting" behavior, whereby transactions begun on a Connection which already has a tranasaction open, will "nest" into the enclosing transaction. Since the transaction opened in dosomethingtransactional() occurs using the same connection which already has a transaction begun, it "nests" into that transaction and therefore has no effect on the actual transaction scope (unless it calls rollback()).

Some of the functions in the above example make use of a method called engine.contextual_connect(). This method is available on both Engine as well as TLEngine, and returns the Connection that applies to the current connection context. When using the TLEngine, this is just another term for the "thread local connection" that is being used for all connectionless executions. When using just the regular Engine (i.e. the "default" strategy), contextual_connect() is synonymous with connect(). Below we illustrate that two connections opened via contextual_connect() at the same time, both reference the same underlying DBAPI connection:

Contextual Connection
# threadlocal strategy
db = create_engine('mysql://localhost/test', strategy='threadlocal')

conn1 = db.contextual_connect()
conn2 = db.contextual_connect()

>>> conn1.connection is conn2.connection
True

The basic idea of contextual_connect() is that its the "connection used by connectionless execution". It's different from the connect() method in that connect() is always used when handling an explicit Connection, which will always reference distinct DBAPI connection. Using connect() in combination with TLEngine allows one to "circumvent" the current thread local context, as in this example where a single statement issues data to the database externally to the current transaction:

engine.begin()
engine.execute("insert into users values (?, ?)", 1, "john")
connection = engine.connect()
connection.execute(users.update(users.c.user_id==5).execute(name='ed'))
engine.rollback()

In the above example, a thread-local transaction is begun, but is later rolled back. The statement insert into users values (?, ?) is executed without using a connection, therefore uses the thread-local transaction. So its data is rolled back when the transaction is rolled back. However, the users.update() statement is executed using a distinct Connection returned by the engine.connect() method, so it therefore is not part of the threadlocal transaction; it autocommits immediately.

back to section top

Configuring Logging

As of the 0.3 series of SQLAlchemy, Python's standard logging module is used to implement informational and debug log output. This allows SQLAlchemy's logging to integrate in a standard way with other applications and libraries. The echo and echo_pool flags that are present on create_engine(), as well as the echo_uow flag used on Session, all interact with regular loggers.

This section assumes familiarity with the above linked logging module. All logging performed by SQLAlchemy exists underneath the sqlalchemy namespace, as used by logging.getLogger('sqlalchemy'). When logging has been configured (i.e. such as via logging.basicConfig()), the general namespace of SA loggers that can be turned on is as follows:

For example, to log SQL queries as well as unit of work debugging:

import logging

logging.basicConfig()
logging.getLogger('sqlalchemy.engine').setLevel(logging.INFO)
logging.getLogger('sqlalchemy.orm.unitofwork').setLevel(logging.DEBUG)

By default, the log level is set to logging.ERROR within the entire sqlalchemy namespace so that no log operations occur, even within an application that has logging enabled otherwise.

The echo flags present as keyword arguments to create_engine() and others as well as the echo property on Engine, when set to True, will first attempt to ensure that logging is enabled. Unfortunately, the logging module provides no way of determining if output has already been configured (note we are referring to if a logging configuration has been set up, not just that the logging level is set). For this reason, any echo=True flags will result in a call to logging.basicConfig() using sys.stdout as the destination. It also sets up a default format using the level name, timestamp, and logger name. Note that this configuration has the affect of being configured in addition to any existing logger configurations. Therefore, when using Python logging, ensure all echo flags are set to False at all times, to avoid getting duplicate log lines.

back to section top

Describing Databases with MetaData

The core of SQLAlchemy's query and object mapping operations are supported by database metadata, which is comprised of Python objects that describe tables and other schema-level objects. These objects can be created by explicitly naming the various components and their properties, using the Table, Column, ForeignKey, Index, and Sequence objects imported from sqlalchemy.schema. There is also support for reflection of some entities, which means you only specify the name of the entities and they are recreated from the database automatically.

A collection of metadata entities is stored in an object aptly named MetaData:

from sqlalchemy import *

metadata = MetaData()

To represent a Table, use the Table class:

users = Table('users', metadata, 
    Column('user_id', Integer, primary_key = True),
    Column('user_name', String(16), nullable = False),
    Column('email_address', String(60), key='email'),
    Column('password', String(20), nullable = False)
)

user_prefs = Table('user_prefs', metadata, 
    Column('pref_id', Integer, primary_key=True),
    Column('user_id', Integer, ForeignKey("users.user_id"), nullable=False),
    Column('pref_name', String(40), nullable=False),
    Column('pref_value', String(100))
)

The specific datatypes for each Column, such as Integer, String, etc. are described in The Types System, and exist within the module sqlalchemy.types as well as the global sqlalchemy namespace.

Foreign keys are most easily specified by the ForeignKey object within a Column object. For a composite foreign key, i.e. a foreign key that contains multiple columns referencing multiple columns to a composite primary key, an explicit syntax is provided which allows the correct table CREATE statements to be generated:

# a table with a composite primary key
invoices = Table('invoices', metadata, 
    Column('invoice_id', Integer, primary_key=True),
    Column('ref_num', Integer, primary_key=True),
    Column('description', String(60), nullable=False)
)

# a table with a composite foreign key referencing the parent table
invoice_items = Table('invoice_items', metadata, 
    Column('item_id', Integer, primary_key=True),
    Column('item_name', String(60), nullable=False),
    Column('invoice_id', Integer, nullable=False),
    Column('ref_num', Integer, nullable=False),
    ForeignKeyConstraint(['invoice_id', 'ref_num'], ['invoices.invoice_id', 'invoices.ref_num'])
)

Above, the invoice_items table will have ForeignKey objects automatically added to the invoice_id and ref_num Column objects as a result of the additional ForeignKeyConstraint object.

The MetaData object supports some handy methods, such as getting a list of Tables in the order (or reverse) of their dependency:

>>> for t in metadata.table_iterator(reverse=False):
...    print t.name
users
user_prefs

And Table provides an interface to the table's properties as well as that of its columns:

employees = Table('employees', metadata, 
    Column('employee_id', Integer, primary_key=True),
    Column('employee_name', String(60), nullable=False, key='name'),
    Column('employee_dept', Integer, ForeignKey("departments.department_id"))
)

# access the column "EMPLOYEE_ID":
employees.columns.employee_id

# or just
employees.c.employee_id

# via string
employees.c['employee_id']

# iterate through all columns
for c in employees.c:
    # ...

# get the table's primary key columns
for primary_key in employees.primary_key:
    # ...

# get the table's foreign key objects:
for fkey in employees.foreign_keys:
    # ...

# access the table's MetaData:
employees.metadata

# access the table's bound Engine or Connection, if its MetaData is bound:
employees.bind

# access a column's name, type, nullable, primary key, foreign key
employees.c.employee_id.name
employees.c.employee_id.type
employees.c.employee_id.nullable
employees.c.employee_id.primary_key
employees.c.employee_dept.foreign_key

# get the "key" of a column, which defaults to its name, but can 
# be any user-defined string:
employees.c.name.key

# access a column's table:
employees.c.employee_id.table is employees
>>> True

# get the table related by a foreign key
fcolumn = employees.c.employee_dept.foreign_key.column.table

Binding MetaData to an Engine or Connection

A MetaData object can be associated with an Engine or an individual Connection; this process is called binding. The term used to describe "an engine or a connection" is often referred to as a connectable. Binding allows the MetaData and the elements which it contains to perform operations against the database directly, using the connection resources to which it's bound. Common operations which are made more convenient through binding include being able to generate SQL constructs which know how to execute themselves, creating Table objects which query the database for their column and constraint information, and issuing CREATE or DROP statements.

To bind MetaData to an Engine, use the connect() method:

engine = create_engine('sqlite://', **kwargs)

# create MetaData 
meta = MetaData()

# bind to an engine
meta.bind = engine

Once this is done, the MetaData and its contained Table objects can access the database directly:

meta.create_all()  # issue CREATE statements for all tables

# describe a table called 'users', query the database for its columns
users_table = Table('users', meta, autoload=True)

# generate a SELECT statement and execute
result = users_table.select().execute()

Note that the feature of binding engines is completely optional. All of the operations which take advantage of "bound" MetaData also can be given an Engine or Connection explicitly with which to perform the operation. The equivalent "non-bound" of the above would be:

meta.create_all(engine)  # issue CREATE statements for all tables

# describe a table called 'users',  query the database for its columns
users_table = Table('users', meta, autoload=True, autoload_with=engine)

# generate a SELECT statement and execute
result = engine.execute(users_table.select())
back to section top

Reflecting Tables

A Table object can be created without specifying any of its contained attributes, using the argument autoload=True in conjunction with the table's name and possibly its schema (if not the databases "default" schema). This will issue the appropriate queries to the database in order to locate all properties of the table required for SQLAlchemy to use it effectively, including its column names and datatypes, foreign and primary key constraints, and in some cases its default-value generating attributes. To use autoload=True, the table's MetaData object need be bound to an Engine or Connection, or alternatively the autoload_with=<some connectable> argument can be passed. Below we illustrate autoloading a table and then iterating through the names of its columns:

>>> messages = Table('messages', meta, autoload=True)
>>> [c.name for c in messages.columns]
['message_id', 'message_name', 'date']

Note that if a reflected table has a foreign key referencing another table, the related Table object will be automatically created within the MetaData object if it does not exist already. Below, suppose table shopping_cart_items references a table shopping_carts. After reflecting, the shopping carts table is present:

>>> shopping_cart_items = Table('shopping_cart_items', meta, autoload=True)
>>> 'shopping_carts' in meta.tables:
True

To get direct access to 'shopping_carts', simply instantiate it via the Table constructor. Table uses a special contructor that will return the already created Table instance if its already present:

shopping_carts = Table('shopping_carts', meta)

Of course, its a good idea to use autoload=True with the above table regardless. This is so that if it hadn't been loaded already, the operation will load the table. The autoload operation only occurs for the table if it hasn't already been loaded; once loaded, new calls to Table will not re-issue any reflection queries.

Overriding Reflected Columns

Individual columns can be overridden with explicit values when reflecting tables; this is handy for specifying custom datatypes, constraints such as primary keys that may not be configured within the database, etc.

>>> mytable = Table('mytable', meta,
... Column('id', Integer, primary_key=True),   # override reflected 'id' to have primary key
... Column('mydata', Unicode(50)),    # override reflected 'mydata' to be Unicode
... autoload=True)
back to section top

Specifying the Schema Name

Some databases support the concept of multiple schemas. A Table can reference this by specifying the schema keyword argument:

financial_info = Table('financial_info', meta,
    Column('id', Integer, primary_key=True),
    Column('value', String(100), nullable=False),
    schema='remote_banks'
)

Within the MetaData collection, this table will be identified by the combination of financial_info and remote_banks. If another table called financial_info is referenced without the remote_banks schema, it will refer to a different Table. ForeignKey objects can reference columns in this table using the form remote_banks.financial_info.id.

back to section top

ON UPDATE and ON DELETE

ON UPDATE and ON DELETE clauses to a table create are specified within the ForeignKeyConstraint object, using the onupdate and ondelete keyword arguments:

foobar = Table('foobar', meta,
    Column('id', Integer, primary_key=True),
    Column('lala', String(40)),
    ForeignKeyConstraint(['lala'],['hoho.lala'], onupdate="CASCADE", ondelete="CASCADE"))

Note that these clauses are not supported on SQLite, and require InnoDB tables when used with MySQL. They may also not be supported on other databases.

back to section top

Other Options

Tables may support database-specific options, such as MySQL's engine option that can specify "MyISAM", "InnoDB", and other backends for the table:

addresses = Table('engine_email_addresses', meta,
    Column('address_id', Integer, primary_key = True),
    Column('remote_user_id', Integer, ForeignKey(users.c.user_id)),
    Column('email_address', String(20)),
    mysql_engine='InnoDB'
)
back to section top

Creating and Dropping Database Tables

Creating and dropping individual tables can be done via the create() and drop() methods of Table; these methods take an optional bind parameter which references an Engine or a Connection. If not supplied, the Engine bound to the MetaData will be used, else an error is raised:

meta = MetaData()
meta.bind = 'sqlite:///:memory:'

employees = Table('employees', meta, 
    Column('employee_id', Integer, primary_key=True),
    Column('employee_name', String(60), nullable=False, key='name'),
    Column('employee_dept', Integer, ForeignKey("departments.department_id"))
)
sqlemployees.create()

drop() method:

sqlemployees.drop(bind=e)

The create() and drop() methods also support an optional keyword argument checkfirst which will issue the database's appropriate pragma statements to check if the table exists before creating or dropping:

employees.create(bind=e, checkfirst=True)
employees.drop(checkfirst=False)

Entire groups of Tables can be created and dropped directly from the MetaData object with create_all() and drop_all(). These methods always check for the existence of each table before creating or dropping. Each method takes an optional bind keyword argument which can reference an Engine or a Connection. If no engine is specified, the underlying bound Engine, if any, is used:

engine = create_engine('sqlite:///:memory:')

metadata = MetaData()

users = Table('users', metadata, 
    Column('user_id', Integer, primary_key = True),
    Column('user_name', String(16), nullable = False),
    Column('email_address', String(60), key='email'),
    Column('password', String(20), nullable = False)
)

user_prefs = Table('user_prefs', metadata, 
    Column('pref_id', Integer, primary_key=True),
    Column('user_id', Integer, ForeignKey("users.user_id"), nullable=False),
    Column('pref_name', String(40), nullable=False),
    Column('pref_value', String(100))
)

sqlmetadata.create_all(bind=engine)
back to section top

Column Defaults and OnUpdates

SQLAlchemy includes flexible constructs in which to create default values for columns upon the insertion of rows, as well as upon update. These defaults can take several forms: a constant, a Python callable to be pre-executed before the SQL is executed, a SQL expression or function to be pre-executed before the SQL is executed, a pre-executed Sequence (for databases that support sequences), or a "passive" default, which is a default function triggered by the database itself upon insert, the value of which can then be post-fetched by the engine, provided the row provides a primary key in which to call upon.

Pre-Executed Insert Defaults

A basic default is most easily specified by the "default" keyword argument to Column. This defines a value, function, or SQL expression that will be pre-executed to produce the new value, before the row is inserted:

# a function to create primary key ids
i = 0
def mydefault():
    global i
    i += 1
    return i

t = Table("mytable", meta, 
    # function-based default
    Column('id', Integer, primary_key=True, default=mydefault),

    # a scalar default
    Column('key', String(10), default="default")
)

The "default" keyword can also take SQL expressions, including select statements or direct function calls:

t = Table("mytable", meta, 
    Column('id', Integer, primary_key=True),

    # define 'create_date' to default to now()
    Column('create_date', DateTime, default=func.now()),

    # define 'key' to pull its default from the 'keyvalues' table
    Column('key', String(20), default=keyvalues.select(keyvalues.c.type='type1', limit=1))
    )

The "default" keyword argument is shorthand for using a ColumnDefault object in a column definition. This syntax is optional, but is required for other types of defaults, futher described below:

Column('mycolumn', String(30), ColumnDefault(func.get_data()))
back to section top

Pre-Executed OnUpdate Defaults

Similar to an on-insert default is an on-update default, which is most easily specified by the "onupdate" keyword to Column, which also can be a constant, plain Python function or SQL expression:

t = Table("mytable", meta, 
    Column('id', Integer, primary_key=True),

    # define 'last_updated' to be populated with current_timestamp (the ANSI-SQL version of now())
    Column('last_updated', DateTime, onupdate=func.current_timestamp()),
)

To use an explicit ColumnDefault object to specify an on-update, use the "for_update" keyword argument:

Column('mycolumn', String(30), ColumnDefault(func.get_data(), for_update=True))
back to section top

Inline Default Execution: PassiveDefault

A PassiveDefault indicates an column default that is executed upon INSERT by the database. This construct is used to specify a SQL function that will be specified as "DEFAULT" when creating tables.

t = Table('test', meta, 
    Column('mycolumn', DateTime, PassiveDefault("sysdate"))
)

A create call for the above table will produce:

CREATE TABLE test (
    mycolumn datetime default sysdate
)

PassiveDefault also sends a message to the Engine that data is available after an insert. The object-relational mapper system uses this information to post-fetch rows after the insert, so that instances can be refreshed with the new data. Below is a simplified version:

# table with passive defaults
mytable = Table('mytable', engine, 
    Column('my_id', Integer, primary_key=True),

    # an on-insert database-side default
    Column('data1', Integer, PassiveDefault("d1_func()")),
)
# insert a row
r = mytable.insert().execute(name='fred')

# check the result: were there defaults fired off on that row ?
if r.lastrow_has_defaults():
    # postfetch the row based on primary key.
    # this only works for a table with primary key columns defined
    primary_key = r.last_inserted_ids()
    row = table.select(table.c.id == primary_key[0])

When Tables are reflected from the database using autoload=True, any DEFAULT values set on the columns will be reflected in the Table object as PassiveDefault instances.

The Catch: Postgres Primary Key Defaults always Pre-Execute

Current Postgres support does not rely upon OID's to determine the identity of a row. This is because the usage of OIDs has been deprecated with Postgres and they are disabled by default for table creates as of PG version 8. Pyscopg2's "cursor.lastrowid" function only returns OIDs. Therefore, when inserting a new row which has passive defaults set on the primary key columns, the default function is still pre-executed since SQLAlchemy would otherwise have no way of retrieving the row just inserted.

back to section top

Defining Sequences

A table with a sequence looks like:

table = Table("cartitems", meta, 
    Column("cart_id", Integer, Sequence('cart_id_seq'), primary_key=True),
    Column("description", String(40)),
    Column("createdate", DateTime())
)

The Sequence is used with Postgres or Oracle to indicate the name of a database sequence that will be used to create default values for a column. When a table with a Sequence on a column is created in the database by SQLAlchemy, the database sequence object is also created. Similarly, the database sequence is dropped when the table is dropped. Sequences are typically used with primary key columns. When using Postgres, if an integer primary key column defines no explicit Sequence or other default method, SQLAlchemy will create the column with the SERIAL keyword, and will pre-execute a sequence named "tablename_columnname_seq" in order to retrieve new primary key values, if they were not otherwise explicitly stated. Oracle, which has no "auto-increment" keyword, requires that a Sequence be specified for a table if automatic primary key generation is desired.

A Sequence object can be defined on a Table that is then also used with a non-sequence-supporting database. In that case, the Sequence object is simply ignored. Note that a Sequence object is entirely optional for all databases except Oracle, as other databases offer options for auto-creating primary key values, such as AUTOINCREMENT, SERIAL, etc. SQLAlchemy will use these default methods for creating primary key values if no Sequence is present on the table metadata.

A sequence can also be specified with optional=True which indicates the Sequence should only be used on a database that requires an explicit sequence, and not those that supply some other method of providing integer values. At the moment, it essentially means "use this sequence only with Oracle and not Postgres".

back to section top

Defining Constraints and Indexes

UNIQUE Constraint

Unique constraints can be created anonymously on a single column using the unique keyword on Column. Explicitly named unique constraints and/or those with multiple columns are created via the UniqueConstraint table-level construct.

meta = MetaData()
mytable = Table('mytable', meta,

    # per-column anonymous unique constraint
    Column('col1', Integer, unique=True),

    Column('col2', Integer),
    Column('col3', Integer),

    # explicit/composite unique constraint.  'name' is optional.
    UniqueConstraint('col2', 'col3', name='uix_1')
    )
back to section top

CHECK Constraint

Check constraints can be named or unnamed and can be created at the Column or Table level, using the CheckConstraint construct. The text of the check constraint is passed directly through to the database, so there is limited "database independent" behavior. Column level check constraints generally should only refer to the column to which they are placed, while table level constraints can refer to any columns in the table.

Note that some databases do not actively support check constraints such as MySQL and sqlite.

meta = MetaData()
mytable = Table('mytable', meta,

    # per-column CHECK constraint
    Column('col1', Integer, CheckConstraint('col1>5')),

    Column('col2', Integer),
    Column('col3', Integer),

    # table level CHECK constraint.  'name' is optional.
    CheckConstraint('col2 > col3 + 5', name='check1')
    )
back to section top

Indexes

Indexes can be created anonymously (using an auto-generated name "ix_") for a single column using the inline index keyword on Column, which also modifies the usage of unique to apply the uniqueness to the index itself, instead of adding a separate UNIQUE constraint. For indexes with specific names or which encompass more than one column, use the Index construct, which requires a name.

Note that the Index construct is created externally to the table which it corresponds, using Column objects and not strings.

meta = MetaData()
mytable = Table('mytable', meta,
    # an indexed column, with index "ix_mytable_col1"
    Column('col1', Integer, index=True),

    # a uniquely indexed column with index "ix_mytable_col2"
    Column('col2', Integer, index=True, unique=True),

    Column('col3', Integer),
    Column('col4', Integer),

    Column('col5', Integer),
    Column('col6', Integer),
    )

# place an index on col3, col4
Index('idx_col34', mytable.c.col3, mytable.c.col4)

# place a unique index on col5, col6
Index('myindex', mytable.c.col5, mytable.c.col6, unique=True)

The Index objects will be created along with the CREATE statements for the table itself. An index can also be created on its own independently of the table:

# create a table
sometable.create()

# define an index
i = Index('someindex', sometable.c.col5)

# create the index, will use the table's bound connectable if the `bind` keyword argument not specified
i.create()
back to section top

Adapting Tables to Alternate Metadata

A Table object created against a specific MetaData object can be re-created against a new MetaData using the tometadata method:

# create two metadata
meta1 = MetaData('sqlite:///querytest.db')
meta2 = MetaData()

# load 'users' from the sqlite engine
users_table = Table('users', meta1, autoload=True)

# create the same Table object for the plain metadata
users_table_2 = users_table.tometadata(meta2)
back to section top

Note: This section describes how to use SQLAlchemy to construct SQL queries and receive result sets. It does not cover the object relational mapping capabilities of SQLAlchemy; that is covered later on in Data Mapping. However, both areas of functionality work similarly in how selection criterion is constructed, so if you are interested just in ORM, you should probably skim through basic WHERE Clause construction before moving on.

Once you have used the sqlalchemy.schema module to construct your tables and/or reflect them from the database, performing SQL queries using those table meta data objects is done via the sqlalchemy.sql package. This package defines a large set of classes, each of which represents a particular kind of lexical construct within a SQL query; all are descendants of the common base class sqlalchemy.sql.ClauseElement. A full query is represented via a structure of ClauseElements. A set of reasonably intuitive creation functions is provided by the sqlalchemy.sql package to create these structures; these functions are described in the rest of this section.

Executing a ClauseElement structure can be performed in two general ways. You can use an Engine or a Connection object's execute() method to which you pass the query structure; this is known as explicit style. Or, if the ClauseElement structure is built upon Table metadata which is bound to an Engine directly, you can simply call execute() on the structure itself, known as implicit style. In both cases, the execution returns a cursor-like object (more on that later). The same clause structure can be executed repeatedly. The ClauseElement is compiled into a string representation by an underlying Compiler object which is associated with the Engine via its Dialect.

The examples below all include a dump of the generated SQL corresponding to the query object, as well as a dump of the statement's bind parameters. In all cases, bind parameters are shown as named parameters using the colon format (i.e. ':name'). When the statement is compiled into a database-specific version, the named-parameter statement and its bind values are converted to the proper paramstyle for that database automatically.

For this section, we will mostly use the implcit style of execution, meaning the Table objects are associated with a bound instance of MetaData, and constructed ClauseElement objects support self-execution. Assume the following configuration:

from sqlalchemy import *
metadata = MetaData('sqlite:///mydb.db', echo=True)

# a table to store users
users = Table('users', metadata,
    Column('user_id', Integer, primary_key = True),
    Column('user_name', String(40)),
    Column('password', String(80))
)

# a table that stores mailing addresses associated with a specific user
addresses = Table('addresses', metadata,
    Column('address_id', Integer, primary_key = True),
    Column('user_id', Integer, ForeignKey("users.user_id")),
    Column('street', String(100)),
    Column('city', String(80)),
    Column('state', String(2)),
    Column('zip', String(10))
)

# a table that stores keywords
keywords = Table('keywords', metadata,
    Column('keyword_id', Integer, primary_key = True),
    Column('name', VARCHAR(50))
)

# a table that associates keywords with users
userkeywords = Table('userkeywords', metadata,
    Column('user_id', INT, ForeignKey("users")),
    Column('keyword_id', INT, ForeignKey("keywords"))
)

Simple Select

A select is done by constructing a Select object with the proper arguments [api], adding any extra arguments if desired, then calling its execute() method.

Basic Select
from sqlalchemy import *

# use the select() function defined in the sql package
s = select([users])

# or, call the select() method off of a Table object
s = users.select()

# then, call execute on the Select object:
sqlresult = s.execute()
# the SQL text of any clause object can also be viewed via the str() call:
>>> str(s)
SELECT users.user_id, users.user_name, users.password FROM users

Explicit Execution

As mentioned above, ClauseElement structures can also be executed with a Connection object explicitly:

engine = create_engine('sqlite:///myfile.db')
conn = engine.connect()

sqlresult = conn.execute(users.select())
conn.close()
back to section top

Binding ClauseElements to Engines

For queries that don't contain any "bound" tables, ClauseElements that represent a fully executeable statement support an bind keyword parameter which can bind the object to an Engine or Connection, thereby allowing implicit execution:

# select using a table
sqlselect([users], bind=myengine).execute()
# select a literal
sqlselect(["current_time"], bind=myengine).execute()
# select a function
sqlselect([func.now()], bind=db).execute()
back to section top

Getting Results

The object returned by execute() is a sqlalchemy.engine.ResultProxy object, which acts much like a DBAPI cursor object in the context of a result set, except that the rows returned can address their columns by ordinal position, column name, or even column object:

Using the ResultProxy
# select rows, get resulting ResultProxy object
sqlresult = users.select().execute()
# get one row
row = result.fetchone()

# get the 'user_id' column via integer index:
user_id = row[0]

# or column name
user_name = row['user_name']

# or column object
password = row[users.c.password]

# or column accessor
password = row.password

# ResultProxy object also supports fetchall()
rows = result.fetchall()

# or get the underlying DBAPI cursor object
cursor = result.cursor

# after an INSERT, return the last inserted primary key value
# returned as a list of primary key values for *one* row 
# (a list since primary keys can be composite)
id = result.last_inserted_ids()

# close the result.  If the statement was implicitly executed 
# (i.e. without an explicit Connection), this will
# return the underlying connection resources back to 
# the connection pool.  de-referencing the result
# will also have the same effect.  if an explicit Connection was 
# used, then close() just closes the underlying cursor object.
result.close()
back to section top

Using Column Labels

A common need when writing statements that reference multiple tables is to create labels for columns, thereby separating columns from different tables with the same name. The Select construct supports automatic generation of column labels via the use_labels=True parameter:

use_labels Flag
sqlc = select([users, addresses], 
users.c.user_id==addresses.c.address_id, 
use_labels=True).execute()

The table name part of the label is affected if you use a construct such as a table alias:

use_labels with an Alias
person = users.alias('person')
sqlc = select([person, addresses], 
    person.c.user_id==addresses.c.address_id, 
    use_labels=True).execute()

Labels are also generated in such a way as to never go beyond 30 characters. Most databases support a limit on the length of symbols, such as Postgres, and particularly Oracle which has a rather short limit of 30:

use_labels Generates Abbreviated Labels
long_named_table = users.alias('this_is_the_person_table')
sqlc = select([long_named_table], use_labels=True).execute()

You can also specify custom labels on a per-column basis using the label() function:

label() Function on Column
sqlc = select([users.c.user_id.label('id'), 
           users.c.user_name.label('name')]).execute()
back to section top

Table/Column Specification

Calling select off a table automatically generates a column clause which includes all the table's columns, in the order they are specified in the source Table object.

But in addition to selecting all the columns off a single table, any set of columns can be specified, as well as full tables, and any combination of the two:

Specify Columns to Select
# individual columns
sqlc = select([users.c.user_id, users.c.user_name]).execute()
# full tables
sqlc = select([users, addresses]).execute()
# combinations
sqlc = select([users, addresses.c.zip]).execute()
back to section top

WHERE Clause

The WHERE condition is the named keyword argument whereclause, or the second positional argument to the select() constructor and the first positional argument to the select() method of Table.

WHERE conditions are constructed using column objects, literal values, and functions defined in the sqlalchemy.sql module. Column objects override the standard Python operators to provide clause compositional objects, which compile down to SQL operations:

Basic WHERE Clause
sqlc = users.select(users.c.user_id == 7).execute()

Notice that the literal value "7" was broken out of the query and placed into a bind parameter. Databases such as Oracle must parse incoming SQL and create a "plan" when new queries are received, which is an expensive process. By using bind parameters, the same query with various literal values can have its plan compiled only once, and used repeatedly with less overhead.

More where clauses:

# another comparison operator
sqlc = select([users], users.c.user_id>7).execute()
# OR keyword
sqlc = users.select(or_(users.c.user_name=='jack', users.c.user_name=='ed')).execute()
# AND keyword
sqlc = users.select(and_(users.c.user_name=='jack', users.c.password=='dog')).execute()
# NOT keyword
sqlc = users.select(not_(
        or_(users.c.user_name=='jack', users.c.password=='dog')
    )).execute()
# IN clause
sqlc = users.select(users.c.user_name.in_('jack', 'ed', 'fred')).execute()
# join users and addresses together
sqlc = select([users, addresses], users.c.user_id==addresses.c.address_id).execute()
# join users and addresses together, but dont specify "addresses" in the 
# selection criterion.  The WHERE criterion adds it to the FROM list 
# automatically.
sqlc = select([users], and_(
                users.c.user_id==addresses.c.user_id,
                users.c.user_name=='fred'
            )).execute()

Select statements can also generate a WHERE clause based on the parameters you give it. If a given parameter, which matches the name of a column or its "label" (the combined tablename + "_" + column name), and does not already correspond to a bind parameter in the select object, it will be added as a comparison against that column. This is a shortcut to creating a full WHERE clause:

# specify a match for the "user_name" column
sqlc = users.select().execute(user_name='ed')
# specify a full where clause for the "user_name" column, as well as a
# comparison for the "user_id" column
sqlc = users.select(users.c.user_name=='ed').execute(user_id=10)

Operators

Supported column operators so far are all the numerical comparison operators, i.e. '==', '>', '>=', etc., as well as like(), startswith(), endswith(), between(), and in_(). Boolean operators include not_(), and_() and or_(), which also can be used inline via '~', '&', and '|'. Math operators are '+', '-', '*', '/'. Any custom operator can be specified via the op() function shown below.

# "like" operator
users.select(users.c.user_name.like('%ter'))

# equality operator
users.select(users.c.user_name == 'jane')

# in opertator
users.select(users.c.user_id.in_(1,2,3))

# and_, endswith, equality operators
users.select(and_(addresses.c.street.endswith('green street'),
                addresses.c.zip=='11234'))

# & operator subsituting for 'and_'
users.select(addresses.c.street.endswith('green street') & (addresses.c.zip=='11234'))

# + concatenation operator
select([users.c.user_name + '_name'])

# NOT operator
users.select(~(addresses.c.street == 'Green Street'))

# any custom operator
select([users.c.user_name.op('||')('_category')])

# "null" comparison via == (converts to IS)
sqlusers.select(users.c.user_name==None).execute()
# or via explicit null() construct
sqlusers.select(users.c.user_name==null()).execute()
back to section top

Functions

Functions can be specified using the func keyword:

sqlselect([func.count(users.c.user_id)]).execute()
sqlusers.select(func.substr(users.c.user_name, 1) == 'J').execute()

Functions also are callable as standalone values:

# call the "now()" function
time = func.now(bind=myengine).scalar()

# call myfunc(1,2,3)
myvalue = func.myfunc(1, 2, 3, bind=db).execute()

# or call them off the engine
db.func.now().scalar()
back to section top

Literals

You can drop in a literal value anywhere there isnt a column to attach to via the literal keyword:

sqlselect([literal('foo') + literal('bar'), users.c.user_name]).execute()
# literals have all the same comparison functions as columns
sqlselect([literal('foo') == literal('bar')], bind=myengine).scalar()

Literals also take an optional type parameter to give literals a type. This can sometimes be significant, for example when using the "+" operator with SQLite, the String type is detected and the operator is converted to "||":

sqlselect([literal('foo', type=String) + 'bar'], bind=e).execute()
back to section top

Order By

The ORDER BY clause of a select statement can be specified as individual columns to order by within an array specified via the order_by parameter, and optional usage of the asc() and desc() functions:

# straight order by
sqlc = users.select(order_by=[users.c.user_name]).execute()
# descending/ascending order by on multiple columns
sqlc = users.select(
    users.c.user_name>'J', 
    order_by=[desc(users.c.user_id), asc(users.c.user_name)]).execute()
back to section top

DISTINCT, LIMIT and OFFSET

These are specified as keyword arguments:

sqlc = select([users.c.user_name], distinct=True).execute()
sqlc = users.select(limit=10, offset=20).execute()

The Oracle driver does not support LIMIT and OFFSET directly, but instead wraps the generated query into a subquery and uses the "rownum" variable to control the rows selected (this is somewhat experimental). Similarly, the Firebird and MSSQL drivers convert LIMIT into queries using FIRST and TOP, respectively.

back to section top

Inner and Outer Joins

As some of the examples indicated above, a regular inner join can be implicitly stated, just like in a SQL expression, by just specifying the tables to be joined as well as their join conditions:

sqladdresses.select(addresses.c.user_id==users.c.user_id).execute()

There is also an explicit join constructor, which can be embedded into a select query via the from_obj parameter of the select statement:

sqladdresses.select(from_obj=[
    addresses.join(users, addresses.c.user_id==users.c.user_id)
]).execute()

The join constructor can also be used by itself:

sqljoin(users, addresses, users.c.user_id==addresses.c.user_id).select().execute()

The join criterion in a join() call is optional. If not specified, the condition will be derived from the foreign key relationships of the two tables. If no criterion can be constructed, an exception will be raised.

sqljoin(users, addresses).select().execute()

Notice that this is the first example where the FROM criterion of the select statement is explicitly specified. In most cases, the FROM criterion is automatically determined from the columns requested as well as the WHERE clause. The from_obj keyword argument indicates a list of explicit FROM clauses to be used in the statement.

A join can be created on its own using the join or outerjoin functions, or can be created off of an existing Table or other selectable unit via the join or outerjoin methods:

sqlouterjoin(users, addresses, 
           users.c.user_id==addresses.c.address_id).select().execute()
sqlusers.select(keywords.c.name=='running', from_obj=[
        users.join(
            userkeywords, userkeywords.c.user_id==users.c.user_id).join(
                keywords, keywords.c.keyword_id==userkeywords.c.keyword_id)
        ]).execute()

Joins also provide a keyword argument fold_equivalents on the select() function which allows the column list of the resulting select to be "folded" to the minimal list of columns, based on those columns that are known to be equivalent from the "onclause" of the join. This saves the effort of constructing column lists manually in conjunction with databases like Postgres which can be picky about "ambiguous columns". In this example, only the "users.user_id" column, but not the "addresses.user_id" column, shows up in the column clause of the resulting select:

sqlusers.join(addresses).select(fold_equivalents=True).execute()

The fold_equivalents argument will recursively apply to "chained" joins as well, i.e. a.join(b).join(c)....

back to section top

Table Aliases

Aliases are used primarily when you want to use the same table more than once as a FROM expression in a statement:

address_b = addresses.alias('addressb')
sql# select users who have an address on Green street as well as Orange street
users.select(and_(
    users.c.user_id==addresses.c.user_id,
    addresses.c.street.like('%Green%'),
    users.c.user_id==address_b.c.user_id,
    address_b.c.street.like('%Orange%')
)).execute()
back to section top

Subqueries

SQLAlchemy allows the creation of select statements from not just Table objects, but from a whole class of objects that implement the Selectable interface. This includes Tables, Aliases, Joins and Selects. Therefore, if you have a Select, you can select from the Select:

>>> s = users.select()
>>> str(s)
SELECT users.user_id, users.user_name, users.password FROM users

>>> s = s.select()
>>> str(s)
SELECT user_id, user_name, password
FROM (SELECT users.user_id, users.user_name, users.password FROM users)

Any Select, Join, or Alias object supports the same column accessors as a Table:

>>> s = users.select()
>>> [c.key for c in s.columns]
['user_id', 'user_name', 'password']

When you use use_labels=True in a Select object, the label version of the column names become the keys of the accessible columns. In effect you can create your own "view objects":

s = select([users, addresses], users.c.user_id==addresses.c.user_id, use_labels=True)
sqlselect([
    s.c.users_user_name, s.c.addresses_street, s.c.addresses_zip
], s.c.addresses_city=='San Francisco').execute()

To specify a SELECT statement as one of the selectable units in a FROM clause, it usually should be given an alias.

sqls = users.select().alias('u')
select([addresses, s]).execute()

Select objects can be used in a WHERE condition, in operators such as IN:

# select user ids for all users whos name starts with a "p"
s = select([users.c.user_id], users.c.user_name.like('p%'))

# now select all addresses for those users
sqladdresses.select(addresses.c.user_id.in_(s)).execute()

The sql package supports embedding select statements into other select statements as the criterion in a WHERE condition, or as one of the "selectable" objects in the FROM list of the query. It does not at the moment directly support embedding a SELECT statement as one of the column criterion for a statement, although this can be achieved via direct text insertion, described later.

Scalar Column Subqueries

Subqueries can be used in the column clause of a select statement by specifying the scalar=True flag:

sqlselect([table2.c.col1, table2.c.col2, 
                select([table1.c.col1], table1.c.col2==7, scalar=True)])
back to section top

Correlated Subqueries

When a select object is embedded inside of another select object, and both objects reference the same table, SQLAlchemy makes the assumption that the table should be correlated from the child query to the parent query. To disable this behavior, specify the flag correlate=False to the Select statement.

# make an alias of a regular select.   
s = select([addresses.c.street], addresses.c.user_id==users.c.user_id).alias('s')
>>> str(s)
SELECT addresses.street FROM addresses, users 
WHERE addresses.user_id = users.user_id

# now embed that select into another one.  the "users" table is removed from
# the embedded query's FROM list and is instead correlated to the parent query
s2 = select([users, s.c.street])
>>> str(s2)
SELECT users.user_id, users.user_name, users.password, s.street
FROM users, (SELECT addresses.street FROM addresses
WHERE addresses.user_id = users.user_id) s

EXISTS Clauses

An EXISTS clause can function as a higher-scaling version of an IN clause, and is usually used in a correlated fashion:

# find all users who have an address on Green street:
sqlusers.select(
    exists(
        [addresses.c.address_id], 
        and_(
            addresses.c.user_id==users.c.user_id, 
            addresses.c.street.like('%Green%')
        )
    )
)
back to section top

Unions

Unions come in two flavors, UNION and UNION ALL, which are available via module level functions or methods off a Selectable:

sqlunion(
    addresses.select(addresses.c.street=='123 Green Street'),
    addresses.select(addresses.c.street=='44 Park Ave.'),
    addresses.select(addresses.c.street=='3 Mill Road'),
    order_by=[addresses.c.street]
).execute()
sqlusers.select(
    users.c.user_id==7
  ).union_all(
      users.select(
          users.c.user_id==9
      ), 
      order_by=[users.c.user_id]   # order_by is an argument to union_all()
  ).execute()
back to section top

Custom Bind Parameters

Throughout all these examples, SQLAlchemy is busy creating bind parameters wherever literal expressions occur. You can also specify your own bind parameters with your own names, and use the same statement repeatedly. The bind parameters, shown here in the "named" format, will be converted to the appropriate named or positional style according to the database implementation being used.

Custom Bind Params
s = users.select(users.c.user_name==bindparam('username'))

# execute implicitly
sqls.execute(username='fred')
# execute explicitly
conn = engine.connect()
sqlconn.execute(s, username='fred')

executemany() is also available by supplying multiple dictionary arguments instead of keyword arguments to the execute() method of ClauseElement or Connection. Examples can be found later in the sections on INSERT/UPDATE/DELETE.

Precompiling a Query

By throwing the compile() method onto the end of any query object, the query can be "compiled" by the Engine into a sqlalchemy.sql.Compiled object just once, and the resulting compiled object reused, which eliminates repeated internal compilation of the SQL string:

s = users.select(users.c.user_name==bindparam('username')).compile()
s.execute(username='fred')
s.execute(username='jane')
s.execute(username='mary')
back to section top

Literal Text Blocks

The sql package tries to allow free textual placement in as many ways as possible. In the examples below, note that the from_obj parameter is used only when no other information exists within the select object with which to determine table metadata. Also note that in a query where there isnt even table metadata used, the Engine to be used for the query has to be explicitly specified:

# strings as column clauses
sqlselect(["user_id", "user_name"], from_obj=[users]).execute()
# strings for full column lists
sqlselect(
        ["user_id, user_name, password, addresses.*"], 
        from_obj=[users.alias('u'), addresses]).execute()
# functions, etc.
sqlselect([users.c.user_id, "process_string(user_name)"]).execute()
# where clauses
sqlusers.select(and_(users.c.user_id==7, "process_string(user_name)=27")).execute()
# subqueries
sqlusers.select(
    "exists (select 1 from addresses where addresses.user_id=users.user_id)").execute()
# custom FROM objects
sqlselect(
        ["*"], 
        from_obj=["(select user_id, user_name from users)"], 
        bind=db).execute()
# a full query
sqltext("select user_name from users", bind=db).execute()

Using Bind Parameters in Text Blocks

Use the format ':paramname' to define bind parameters inside of a text block. They will be converted to the appropriate format upon compilation:

t = text("select foo from mytable where lala=:hoho", bind=engine)
r = t.execute(hoho=7)

Bind parameters can also be explicit, which allows typing information to be added. Just specify them as a list with keys that match those inside the textual statement:

t = text("select foo from mytable where lala=:hoho", 
                bindparams=[bindparam('hoho', type=types.String)], bind=engine)
    r = t.execute(hoho="im hoho")

Result-row type processing can be added via the typemap argument, which is a dictionary of return columns mapped to types:

# specify DateTime type for the 'foo' column in the result set
# sqlite, for example, uses result-row post-processing to construct dates
t = text("select foo from mytable where lala=:hoho", 
        bindparams=[bindparam('hoho', type=types.String)],
        typemap={'foo':types.DateTime}, bind=engine
        )
r = t.execute(hoho="im hoho")

# 'foo' is a datetime
year = r.fetchone()['foo'].year
back to section top

Building Select Objects

One of the primary motivations for a programmatic SQL library is to allow the piecemeal construction of a SQL statement based on program variables. All the above examples typically show Select objects being created all at once. The Select object also includes "builder" methods to allow building up an object. The below example is a "user search" function, where users can be selected based on primary key, user name, street address, keywords, or any combination:

def find_users(id=None, name=None, street=None, keywords=None):
    statement = users.select()
    if id is not None:
        statement.append_whereclause(users.c.user_id==id)
    if name is not None:
        statement.append_whereclause(users.c.user_name==name)
    if street is not None:
        # append_whereclause joins "WHERE" conditions together with AND
        statement.append_whereclause(users.c.user_id==addresses.c.user_id)
        statement.append_whereclause(addresses.c.street==street)
    if keywords is not None:
        statement.append_from(
                users.join(userkeywords, users.c.user_id==userkeywords.c.user_id).join(
                        keywords, userkeywords.c.keyword_id==keywords.c.keyword_id))
        statement.append_whereclause(keywords.c.name.in_(keywords))
        # to avoid multiple repeats, set query to be DISTINCT:
        statement.distinct=True
    return statement.execute()

sqlfind_users(id=7)
sqlfind_users(street='123 Green Street')
sqlfind_users(name='Jack', keywords=['jack','foo'])
back to section top

Inserts

An INSERT involves just one table. The Insert object is used via the insert() function, and the specified columns determine what columns show up in the generated SQL. If primary key columns are left out of the criterion, the SQL generator will try to populate them as specified by the particular database engine and sequences, i.e. relying upon an auto-incremented column or explicitly calling a sequence beforehand. Insert statements, as well as updates and deletes, can also execute multiple parameters in one pass via specifying an array of dictionaries as parameters.

The values to be populated for an INSERT or an UPDATE can be specified to the insert()/update() functions as the values named argument, or the query will be compiled based on the values of the parameters sent to the execute() method.

Using insert()
# basic insert
sqlusers.insert().execute(user_id=1, user_name='jack', password='asdfdaf')
# insert just user_name, NULL for others
# will auto-populate primary key columns if they are configured
# to do so
sqlusers.insert().execute(user_name='ed')
# INSERT with a list:
sqlusers.insert(values=(3, 'jane', 'sdfadfas')).execute()
# INSERT with user-defined bind parameters
i = users.insert(
    values={'user_name':bindparam('name'), 'password':bindparam('pw')}
    )
sqli.execute(name='mary', pw='adas5fs')
# INSERT many - if no explicit 'values' parameter is sent,
# the first parameter list in the list determines
# the generated SQL of the insert (i.e. what columns are present)
# executemany() is used at the DBAPI level
sqlusers.insert().execute(
    {'user_id':7, 'user_name':'jack', 'password':'asdfasdf'},
    {'user_id':8, 'user_name':'ed', 'password':'asdffcadf'},
    {'user_id':9, 'user_name':'fred', 'password':'asttf'},
)
back to section top

Updates

Updates work a lot like INSERTS, except there is an additional WHERE clause that can be specified.

Using update()
# change 'jack' to 'ed'
sqlusers.update(users.c.user_name=='jack').execute(user_name='ed')
            UPDATE users SET user_name=:user_name WHERE users.user_name = :users_user_name
{'users_user_name': 'jack', 'user_name': 'ed'}

# use bind parameters
u = users.update(users.c.user_name==bindparam('name'), 
                values={'user_name':bindparam('newname')})
{sql}u.execute(name='jack', newname='ed')
# update a column to another column
sqlusers.update(values={users.c.password:users.c.user_name}).execute()
# expressions OK too
sqlusers.update(values={users.c.user_id:users.c.user_id + 17}).execute()
# multi-update
sqlusers.update(users.c.user_id==bindparam('id')).execute(
        {'id':7, 'user_name':'jack', 'password':'fh5jks'},
        {'id':8, 'user_name':'ed', 'password':'fsr234ks'},
        {'id':9, 'user_name':'mary', 'password':'7h5jse'},
    )

Correlated Updates

A correlated update lets you update a table using selection from another table, or the same table:

s = select([addresses.c.city], addresses.c.user_id==users.c.user_id)
sqlusers.update(
    and_(users.c.user_id>10, users.c.user_id<20), 
    values={users.c.user_name:s}
).execute()
back to section top

Deletes

A delete is formulated like an update, except theres no values:

users.delete(users.c.user_id==7).execute()
users.delete(users.c.user_name.like(bindparam('name'))).execute(
        {'name':'%Jack%'},
        {'name':'%Ed%'},
        {'name':'%Jane%'},
    )
users.delete(exists())
back to section top

Basic Data Mapping

Data mapping describes the process of defining Mapper objects, which associate table metadata with user-defined classes.

When a Mapper is created to associate a Table object with a class, all of the columns defined in the Table object are associated with the class via property accessors, which add overriding functionality to the normal process of setting and getting object attributes. These property accessors keep track of changes to object attributes; these changes will be stored to the database when the application "flushes" the current state of objects. This pattern is called a Unit of Work pattern.

back to section top

Synopsis

Starting with a Table definition and a minimal class construct, the two are associated with each other via the mapper() function [api], which generates an object called a Mapper. SA associates the class and all instances of that class with this particular Mapper, which is then stored in a global registry.

from sqlalchemy import *

# metadata
meta = MetaData()

# table object
users_table = Table('users', meta, 
    Column('user_id', Integer, primary_key=True),
    Column('user_name', String(16)),
    Column('fullname', String(100)),
    Column('password', String(20))
)

# class definition 
class User(object):
    pass

# create a mapper and associate it with the User class.
mapper(User, users_table)

Thats all for configuration. Next, we will create an Engine and bind it to a Session, which represents a local collection of mapped objects to be operated upon.

# engine
engine = create_engine("sqlite://mydb.db")

# session
session = create_session(bind=engine)

The session represents a "workspace" which can load objects and persist changes to the database. Note also that the bind parameter is optional; if the underlying Table objects are bound as described in Binding MetaData to an Engine or Connection, it's not needed. A Session [doc] [api] is best created as local to a particular set of related data operations, such as scoped within a function call, or within a single application request cycle. Next we illustrate a rudimental query which will load a single object instance. We will modify one of its attributes and persist the change back to the database.

# select
sqluser = session.query(User).filter_by(user_name='fred').first()
# modify
user.user_name = 'fred jones'

# flush - saves everything that changed
# within the scope of our Session
sqlsession.flush()

Things to note from the above include that the loaded User object has an attribute named user_name on it, which corresponds to the user_name column in users_table; this attribute was configured at the class level by the Mapper, as part of it's post-initialization process (this process occurs normally when the mapper is first used). Our modify operation on this attribute caused the object to be marked as "dirty", which was picked up automatically within the subsequent flush() process. The flush() is the point at which all changes to objects within the Session are persisted to the database, and the User object is no longer marked as "dirty" until it is again modified.

back to section top

The Query Object

The method session.query(class_or_mapper) returns a Query object [api]. Query implements methods which are used to produce and execute select statements tailored for loading object instances. It returns object instances in all cases; usually as a list, but in some cases scalar objects, or lists of tuples which contain multiple kinds of objects and sometimes individual scalar values.

A Query is created from the Session, relative to a particular class we wish to load.

# get a query from a Session based on class:
query = session.query(User)

Alternatively, an actual Mapper instance can be specified instead of a class:

# locate the mapper corresponding to the User class
usermapper = class_mapper(User)

# create query against the User mapper
query = session.query(usermapper)

A query which joins across multiple tables may also be used to request multiple entities, such as:

query = session.query(User, Address)

Once we have a query, we can start loading objects. The methods filter() and filter_by() handle narrowing results, and the methods all(), one(), and first() exist to return all, exactly one, or the first result of the total set of results. Note that all methods are generative, meaning that on each call that doesn't return results, you get a new Query instance.

The filter_by() method works with keyword arguments, which are combined together via AND:

sqlresult = session.query(User).filter_by(name='john', fullname='John Smith').all()

Whereas filter() works with constructed SQL expressions, i.e. those described in Constructing SQL Queries via Python Expressions:

sqlresult = session.query(User).filter(users_table.c.name=='john').all()

Sometimes, constructing SQL via expressions can be cumbersome. For quick SQL expression, the filter() method can also accomodate straight text:

sqlresult = session.query(User).filter("user_id>224").all()

When using text, bind parameters can be specified the same way as in a text() clause, using a colon. To specify the bind parameter values, use the params() method:

sqlresult = session.query(User).filter("user_id>:value").params(value=224).all()

Multiple filter() and filter_by() expressions may be combined together. The resulting statement groups them using AND.

result = session.query(User).filter(users_table.c.user_id>224).filter_by(name='john').
sql            filter(users.c.fullname=='John Smith').all()

filter_by()'s keyword arguments can also take mapped object instances as comparison arguments. We'll illustrate this later when we talk about object relationships.

Note that all conjunctions are available explicitly, such as and_() and or_(), when using filter():

result = session.query(User).filter(
    and_(users_table.c.user_id>224, or_(users_table.c.name=='john', users_table.c.name=='ed'))
    ).all()

Its also straightforward to use an entirely string-based statement, using from_statement(); just ensure that the columns clause of the statement contains the column names normally used by the mapper (here illustrated using an asterisk):

sqlresult = session.query(User).from_statement("SELECT * FROM users").all()

from_statement() can also accomodate select() constructs:

result = session.query(User).from_statement(
    select([users], users.c.name<'e', having=users.c.name==func.max(users.c.name), group_by=[c for c in users.c])
sql    ).all()

Any set of filtered criterion (or no criterion) can be distilled into a count of rows using count():

sqlnum = session.query(Users).filter(users_table.c.user_id>224).count()

Rows are limited and offset using limit() and offset():

sqlresult = session.query(User).limit(20).offset(5).all()

And ordering is applied, using Column objects and related SQL constructs, with order_by():

sqlresult = session.query(User).order_by(desc(users_table.c.user_name)).all()

The first() and one() methods will also limit rows, and both will return a single object, instead of a list. In the case of first(), rows are limited to just one, and the result is returned as a scalar. In the case of one(), rows are limited to two; however, only one is returned. If two rows are matched, an exception is raised.

# load the first result
user = session.query(User).first()

# load exactly *one* result - if more than one result matches, an exception is raised
user = session.query(User).filter_by(name='jack').one()

The Query, when evaluated as an iterator, executes results immediately, using whatever state has been built up:

sqlresult = list(session.query(User))

Array indexes and slices work too, adding the corresponding LIMIT and OFFSET clauses:

sqlresult = list(session.query(User)[1:3])

A scalar index returns a scalar result immediately:

sqluser = session.query(User)[2]

Theres also a way to combine scalar results with objects, using add_column(). This is often used for functions and aggregates. When add_column() (or its cousin add_entity(), described later) is used, tuples are returned:

result = session.query(User).add_column(func.max(users_table.c.name)).group_by([c for c in users_table.c]).all()
for r in result:
    print "user:", r[0]
    print "max name:", r[1]

Later in this chapter, we'll discuss how to configure relations between mapped classes. Once that's done, we'll discuss how to use table joins in Querying with Joins.

Loading by Primary Key

The get() method loads a single instance, given the primary key value of the desired entity:

# load user with primary key 15
user = query.get(15)

The get() method, because it has the actual primary key value of the instance, can return an already-loaded instance from the Session without performing any SQL. It is the only result-returning method on Query that does not issue SQL to the database in all cases.

To issue a composite primary key to get(), use a tuple. The order of the arguments matches that of the primary key columns of the table:

myobj = query.get((27, 3, 'receipts'))

Another special method on Query is load(). This method has the same signature as get(), except it always refreshes the returned instance with the latest data from the database. This is in fact a unique behavior, since as we will see in the Session / Unit of Work chapter, most Query methods do not reload the contents of instances which are already present in the session.

back to section top

Column Objects Available via their Mapped Class

Some of the above examples above illustrate the usage of the mapper's Table object to provide the columns for a WHERE Clause. These columns are also accessible off of the mapped class directly. When a mapper is assigned to a class, it also attaches a special property accessor c to the class itself, which can be used just like that of a Table object to access the columns of the table:

userlist = session.query(User).filter(User.c.user_id==12).first()

In version 0.4 of SQLAlchemy, the "c" prefix will no longer be needed.

back to section top

Saving Objects

When objects corresponding to mapped classes are created or manipulated, all changes are logged by the Session object. The changes are then written to the database when an application calls flush(). This pattern is known as a Unit of Work, and has many advantages over saving individual objects or attributes on those objects with individual method invocations. Domain models can be built with far greater complexity with no concern over the order of saves and deletes, excessive database round-trips and write operations, or deadlocking issues. The flush() operation batches its SQL statements into a transaction, and can also perform optimistic concurrency checks (using a version id column) to ensure the proper number of rows were in fact affected.

The Unit of Work is a powerful tool, and has some important concepts that should be understood in order to use it effectively. See the Session / Unit of Work section for a full description on all its operations.

When a mapper is created, the target class has its mapped properties decorated by specialized property accessors that track changes. New objects by default must be explicitly added to the Session using the save() method:

mapper(User, users_table)

# create a new User
myuser = User()
myuser.user_name = 'jane'
myuser.password = 'hello123'

# create another new User      
myuser2 = User()
myuser2.user_name = 'ed'
myuser2.password = 'lalalala'

# create a Session and save them
sess = create_session()
sess.save(myuser)
sess.save(myuser2)

# load a third User from the database            
sqlmyuser3 = sess.query(User).filter_by(name='fred').all()[0]
myuser3.user_name = 'fredjones'

# save all changes            
sqlsession.flush()

The mapped class can also specify whatever methods and/or constructor it wants:

class User(object):
    def __init__(self, user_name, password):
        self.user_id = None
        self.user_name = user_name
        self.password = password
    def get_name(self):
        return self.user_name
    def __repr__(self):
        return "User id %s name %s password %s" % (repr(self.user_id), 
            repr(self.user_name), repr(self.password))
mapper(User, users_table)

sess = create_session()
u = User('john', 'foo')
sess.save(u)
sqlsession.flush()
>>> u
User id 1 name 'john' password 'foo'

Note that the __init__() method is not called when the instance is loaded. This is so that classes can define operations that are specific to their initial construction which are not re-called when the object is restored from the database, and is similar in concept to how Python's pickle module calls __new__() when deserializing instances. To allow __init__() to be called at object load time, or to define any other sort of on-load operation, create a MapperExtension which supplies the create_instance() method (see Extending Mapper, as well as the example in the FAQ).

back to section top

Defining and Using Relationships

So that covers how to map the columns in a table to an object, how to load objects, create new ones, and save changes. The next step is how to define an object's relationships to other database-persisted objects. This is done via the relation function [doc][api] provided by the orm module.

One to Many

With our User class, lets also define the User has having one or more mailing addresses. First, the table metadata:

from sqlalchemy import *

metadata = MetaData()

# define user table
users_table = Table('users', metadata, 
    Column('user_id', Integer, primary_key=True),
    Column('user_name', String(16)),
    Column('password', String(20))
)

# define user address table
addresses_table = Table('addresses', metadata,
    Column('address_id', Integer, primary_key=True),
    Column('user_id', Integer, ForeignKey("users.user_id")),
    Column('street', String(100)),
    Column('city', String(80)),
    Column('state', String(2)),
    Column('zip', String(10))
)

Of importance here is the addresses table's definition of a foreign key relationship to the users table, relating the user_id column into a parent-child relationship. When a Mapper wants to indicate a relation of one object to another, the ForeignKey relationships are the default method by which the relationship is determined (options also exist to describe the relationships explicitly).

So then lets define two classes, the familiar User class, as well as an Address class:

class User(object):
    def __init__(self, user_name, password):
        self.user_name = user_name
        self.password = password

class Address(object):
    def __init__(self, street, city, state, zip):
        self.street = street
        self.city = city
        self.state = state
        self.zip = zip

And then a Mapper that will define a relationship of the User and the Address classes to each other as well as their table metadata. We will add an additional mapper keyword argument properties which is a dictionary relating the names of class attributes to database relationships, in this case a relation object against a newly defined mapper for the Address class:

mapper(Address, addresses_table)
mapper(User, users_table, properties = {
        'addresses' : relation(Address)
    }
  )

Lets do some operations with these classes and see what happens:

engine = create_engine('sqlite:///mydb.db')

# create tables
metadata.create_all(engine)

session = create_session(bind=engine)

u = User('jane', 'hihilala')
u.addresses.append(Address('123 anywhere street', 'big city', 'UT', '76543'))
u.addresses.append(Address('1 Park Place', 'some other city', 'OK', '83923'))

session.save(u)
session.flush()
INSERT INTO users (user_name, password) VALUES (:user_name, :password)
{'password': 'hihilala', 'user_name': 'jane'}

INSERT INTO addresses (user_id, street, city, state, zip) VALUES (:user_id, :street, :city, :state, :zip)
{'city': 'big city', 'state': 'UT', 'street': '123 anywhere street', 'user_id':1, 'zip': '76543'}
INSERT INTO addresses (user_id, street, city, state, zip) VALUES (:user_id, :street, :city, :state, :zip)
{'city': 'some other city', 'state': 'OK', 'street': '1 Park Place', 'user_id':1, 'zip': '83923'}

A lot just happened there! The Mapper figured out how to relate rows in the addresses table to the users table, and also upon flush had to determine the proper order in which to insert rows. After the insert, all the User and Address objects have their new primary and foreign key attributes populated.

Also notice that when we created a Mapper on the User class which defined an addresses relation, the newly created User instance magically had an "addresses" attribute which behaved like a list. This list is in reality a Python property which will return an instance of sqlalchemy.orm.attributes.InstrumentedList. This is a generic collection-bearing object which can represent lists, sets, dictionaries, or any user-defined collection class. By default it represents a list:

del u.addresses[1]
u.addresses.append(Address('27 New Place', 'Houston', 'TX', '34839'))

session.flush()
UPDATE addresses SET user_id=:user_id
WHERE addresses.address_id = :addresses_address_id
[{'user_id': None, 'addresses_address_id': 2}]

INSERT INTO addresses (user_id, street, city, state, zip)
VALUES (:user_id, :street, :city, :state, :zip)
{'city': 'Houston', 'state': 'TX', 'street': '27 New Place', 'user_id': 1, 'zip': '34839'}

Note that when creating a relation with the relation() function, the target can either be a class, in which case the primary mapper for that class is used as the target, or a Mapper instance itself, as returned by the mapper() function.

back to section top

Lifecycle Relations

In the previous example, a single address was removed from the addresses attribute of a User object, resulting in the corresponding database row being updated to have a user_id of None. But now, theres a mailing address with no user_id floating around in the database of no use to anyone. How can we avoid this ? This is acheived by using the cascade parameter of relation:

session.clear()  # clear session
clear_mappers()  # clear mappers from the previous example

mapper(Address, addresses_table)
mapper(User, users_table, properties = {
        'addresses' : relation(Address, cascade="all, delete-orphan")
    }
  )

# reload the user
u = session.query(User).get(u.user_id)

del u.addresses[1]
u.addresses.append(Address('27 New Place', 'Houston', 'TX', '34839'))

session.flush()
INSERT INTO addresses (user_id, street, city, state, zip)
VALUES (:user_id, :street, :city, :state, :zip)
{'city': 'Houston', 'state': 'TX', 'street': '27 New Place', 'user_id': 1, 'zip': '34839'}

DELETE FROM addresses WHERE addresses.address_id = :address_id
[{'address_id': 2}]

In this case, with the delete-orphan cascade rule set, the element that was removed from the addresses list was also removed from the database. Specifying cascade="all, delete-orphan" means that every persistence operation performed on the parent object will be cascaded to the child object or objects handled by the relation, and additionally that each child object cannot exist without being attached to a parent. Such a relationship indicates that the lifecycle of the Address objects are bounded by that of their parent User object.

Cascading is described fully in Cascade rules.

back to section top

Backreferences

By creating relations with the backref keyword, a bi-directional relationship can be created which will keep both ends of the relationship updated automatically, independently of database operations. Below, the User mapper is created with an addresses property, and the corresponding Address mapper receives a "backreference" to the User object via the property name user:

Address = mapper(Address, addresses_table)
User = mapper(User, users_table, properties = {
                'addresses' : relation(Address, backref='user')
            }
          )

u = User('fred', 'hi')
a1 = Address('123 anywhere street', 'big city', 'UT', '76543')
a2 = Address('1 Park Place', 'some other city', 'OK', '83923')

# append a1 to u
u.addresses.append(a1)

# attach u to a2
a2.user = u

# the bi-directional relation is maintained
>>> u.addresses == [a1, a2]
True
>>> a1.user is user and a2.user is user
True

The backreference feature also works with many-to-many relationships, which are described later. When creating a backreference, a corresponding property (i.e. a second relation()) is placed on the child mapper. The default arguments to this property can be overridden using the backref() function:

mapper(User, users_table)
mapper(Address, addresses_table, properties={
    'user':relation(User, backref=backref('addresses', cascade="all, delete-orphan"))
})

The backref() function is often used to set up a bi-directional one-to-one relationship. This is because the relation() function by default creates a "one-to-many" relationship when presented with a primary key/foreign key relationship, but the backref() function can redefine the uselist property to make it a scalar:

mapper(User, users_table)
mapper(Address, addresses_table, properties={
    'user' : relation(User, backref=backref('address', uselist=False))
})
back to section top

Querying with Joins

When using mappers that have relationships to other mappers, the need to specify query criterion across multiple tables arises. SQLAlchemy provides several core techniques which offer this functionality.

One way is just to build up the join criterion yourself. This is easy to do using filter():

sqll = session.query(User).filter(users.c.user_id==addresses.c.user_id).
             filter(addresses.c.street=='123 Green Street').all()

Above, we specified selection criterion that included columns from both the users and the addresses table. Note that in this case, we had to specify not just the matching condition to the street column on addresses, but also the join condition between the users and addresses table. If we didn't do that, we'd get a cartesian product of both tables. The Query object never "guesses" what kind of join you'd like to use, but makes it easy using the join() method which we'll get to in a moment.

A way to specify joins very explicitly, using the SQL join() construct, is possible via the select_from() method on Query:

sqll = session.query(User).select_from(users_table.join(addresses_table)).
        filter(addresses_table.c.street=='123 Green Street').all()

But the easiest way to join is automatically, using the join() method on Query. Just give this method the path from A to B, using the name of a mapped relationship directly:

sqll = session.query(User).join('addresses').
        filter(addresses_table.c.street=='123 Green Street').all()

Each time the join() is called on Query, the joinpoint of the query is moved to be that of the endpoint of the join. As above, when we joined from users_table to addresses_table, all subsequent criterion used by filter_by() are against the addresses table. If we wanted to filter back on the starting table again, we can use the reset_joinpoint() function:

l = session.query(User).join('addresses').
        filter_by(street='123 Green Street').
        reset_joinpoint().filter_by(user_name='ed').all()

With reset_joinpoint(), we can also issue new join()s which will start back from the root table.

In all cases, we can get the User and the matching Address objects back at the same time, by telling the session we want both. This returns the results as a list of tuples:

result = session.query(User, Address).join('addresses').
        filter(addresses_table.c.street=='123 Green Street').all()
for r in result:
    print "User:", r[0]
    print "Address:", r[1]

The above syntax is shorthand for using the add_entity() method:

session.query(User).add_entity(Address).join('addresses').all()

To join across multiple relationships, specify them in a list. Below, we load a ShoppingCart, limiting its cartitems collection to the single item which has a price object whose amount column is 47.95:

cart = session.query(ShoppingCart).join(['cartitems', 'price']).filter_by(amount=47.95).one()

filter_by() can also generate joins in some cases, such as when comparing to an object instance:

# get an instance of Address. assume its primary key identity
# is 12.
someaddress = session.query(Address).filter_by(street='123 Green Street').one()

# look for User instances which have the 
# "someaddress" instance in their "addresses" collection
sqll = session.query(User).filter_by(addresses=someaddress).all()

You can also create joins in "reverse", that is, to find an object with a certain parent. This is accomplished using with_parent():

# load a user
someuser = session.query(User).get(2)

# load an address with that user as a parent and email address foo@bar.com
sqlsomeaddresses = session.query(Address).with_parent(someuser).
    filter_by(email_address="foo@bar.com").all()

Sometimes it's necessary to create repeated joins that are independent of each other, even though they reference the same tables. Using our one-to-many setup, an example is to locate users who have two partcular email addresses. We can do this using table aliases:

ad1 = addresses_table.alias('ad1')
ad2 = addresses_table.alias('ad2')
sqlresult = session.query(User).filter(and_(
    ad1.c.user_id==users.c.user_id,
    ad1.c.email_address=='foo@bar.com',
    ad2.c.user_id==users.c.user_id,
    ad2.c.email_address=='lala@yahoo.com'
    )).all()

Version 0.4 of SQLAlchemy will include better ability to issue queries like the above with less verbosity.

back to section top

Loading Relationships

We've seen how the relation specifier affects the saving of an object and its child items, and also how it allows us to build joins. How to we get the actual related items loaded ? By default, the relation() function indicates that the related property should be attached a lazy loader when instances of the parent object are loaded from the database; this is just a callable function that when accessed will invoke a second SQL query to load the child objects of the parent.

# define a user mapper
mapper(User, users_table, properties = {
      'addresses' : relation(Address)
    })

# define an address mapper
mapper(Address, addresses_table)

# select users where username is 'jane', get the first element of the list
# this will incur a load operation for the parent table
sqluser = session.query(User).filter(User.c.user_name=='jane')[0]
# iterate through the User object's addresses.  this will incur an
# immediate load of those child items
sqlfor a in user.addresses:
print repr(a)

Eager Loading

Eager Loading is another way for relationships to be loaded. It describes the loading of parent and child objects across a relation using a single query. The purpose of eager loading is strictly one of performance enhancement; eager loading has no impact on the results of a query, except that when traversing child objects within the results, lazy loaders will not need to issue separate queries to load those child objects.

With just a single parameter lazy=False specified to the relation object, the parent and child SQL queries can be joined together.

mapper(Address, addresses_table)
mapper(User, users_table, properties = {
        'addresses' : relation(Address, lazy=False)
    }
  )

sqlusers = session.query(User).filter(User.c.user_name=='Jane').all()
for u in users:
    print repr(u)
    for a in u.addresses:
        print repr(a)

Above, a pretty ambitious query is generated just by specifying that the User should be loaded with its child Addresses in one query. When the mapper processes the results, it uses an Identity Map to keep track of objects that were already loaded, based on their primary key identity. Through this method, the redundant rows produced by the join are organized into the distinct object instances they represent.

Recall that eager loading has no impact on the results of the query. What if our query included our own join criterion? The eager loading query accomodates this using aliases, and is immune to the effects of additional joins being specified in the original query. Joining against the "addresses" table to locate users with a certain street results in this behavior:

sqlusers = session.query(User).join('addresses').filter_by(street='123 Green Street').all()

The join resulting from join('addresses') is separate from the join produced by the eager join, which is "aliasized" to prevent conflicts.

back to section top

Using Options to Change the Loading Strategy

The options() method on the Query object is allows modifications to the underlying querying methodology. The most common use of this feature is to change the "eager/lazy" loading behavior of a particular mapper, via the functions eagerload(), lazyload() and noload():

# user mapper with lazy addresses
mapper(User, users_table, properties = {
             'addresses' : relation(mapper(Address, addresses_table))
         }
)

# query object
query = session.query(User)

# make an eager loading query
eagerquery = query.options(eagerload('addresses'))
u = eagerquery.all()

# make another query that wont load the addresses at all
plainquery = query.options(noload('addresses'))

# multiple options can be specified
myquery = oldquery.options(lazyload('tracker'), noload('streets'), eagerload('members'))

# to specify a relation on a relation, separate the property names by a "."
myquery = oldquery.options(eagerload('orders.items'))
back to section top

More Relationships

Previously, we've discussed how to set up a one-to-many relationship. This section will go over the remaining major types of relationships that can be configured. More detail on on relationships as well as more advanced patterns can be found in Advanced Data Mapping.

One to One/Many to One

The above examples focused on the "one-to-many" relationship. To do other forms of relationship is easy, as the relation() function can usually figure out what you want:

metadata = MetaData()

# a table to store a user's preferences for a site
prefs_table = Table('user_prefs', metadata,
    Column('pref_id', Integer, primary_key = True),
    Column('stylename', String(20)),
    Column('save_password', Boolean, nullable = False),
    Column('timezone', CHAR(3), nullable = False)
)

# user table with a 'preference_id' column
users_table = Table('users', metadata, 
    Column('user_id', Integer, primary_key = True),
    Column('user_name', String(16), nullable = False),
    Column('password', String(20), nullable = False),
    Column('preference_id', Integer, ForeignKey("user_prefs.pref_id"))
)

# engine and some test data
engine = create_engine('sqlite:///', echo=True)
metadata.create_all(engine)
engine.execute(prefs_table.insert(), dict(pref_id=1, stylename='green', save_password=1, timezone='EST'))
engine.execute(users_table.insert(), dict(user_name = 'fred', password='45nfss', preference_id=1))

# classes
class User(object):
    def __init__(self, user_name, password):
        self.user_name = user_name
        self.password = password

class UserPrefs(object):
    pass

mapper(UserPrefs, prefs_table)

mapper(User, users_table, properties = {
    'preferences':relation(UserPrefs, lazy=False, cascade="all, delete-orphan"),
})

# select
session = create_session(bind=engine)
sqluser = session.query(User).filter_by(user_name='fred').one()
save_password = user.preferences.save_password

# modify
user.preferences.stylename = 'bluesteel'

# flush
sqlsession.flush()
back to section top

Many to Many

The relation() function handles a basic many-to-many relationship when you specify an association table using the secondary argument:

metadata = MetaData()

articles_table = Table('articles', metadata,
    Column('article_id', Integer, primary_key = True),
    Column('headline', String(150), key='headline'),
    Column('body', TEXT, key='body'),
)

keywords_table = Table('keywords', metadata,
    Column('keyword_id', Integer, primary_key = True),
    Column('keyword_name', String(50))
)

itemkeywords_table = Table('article_keywords', metadata,
    Column('article_id', Integer, ForeignKey("articles.article_id")),
    Column('keyword_id', Integer, ForeignKey("keywords.keyword_id"))
)

engine = create_engine('sqlite:///')
metadata.create_all(engine)

# class definitions
class Keyword(object):
    def __init__(self, name):
        self.keyword_name = name

class Article(object):
    pass

mapper(Keyword, keywords_table)

# define a mapper that does many-to-many on the 'itemkeywords' association 
# table
mapper(Article, articles_table, properties = {
    'keywords':relation(Keyword, secondary=itemkeywords_table, lazy=False)
    }
)

session = create_session(bind=engine)

article = Article()
article.headline = 'a headline'
article.body = 'this is the body'
article.keywords.append(Keyword('politics'))
article.keywords.append(Keyword('entertainment'))
session.save(article)

sqlsession.flush()
# select articles based on a keyword.  
sqlarticles = session.query(Article).join('keywords').filter_by(keyword_name='politics').all()
a = articles[0]

# clear out keywords with a new list
a.keywords = []
a.keywords.append(Keyword('topstories'))
a.keywords.append(Keyword('government'))

# flush
sqlsession.flush()
back to section top

Association Object

Many to Many can also be done with an association object, that adds additional information about how two items are related. In this pattern, the "secondary" option to relation() is no longer used; instead, the association object becomes a mapped entity itself, mapped to the association table. If the association table has no explicit primary key columns defined, you also have to tell the mapper what columns will compose its "primary key", which are typically the two (or more) columns involved in the association. Also, the relation between the parent and association mapping is typically set up with a cascade of all, delete-orphan. This is to ensure that when an association object is removed from its parent collection, it is deleted (otherwise, the unit of work tries to null out one of the foreign key columns, which raises an error condition since that column is also part of its "primary key").

from sqlalchemy import *
metadata = MetaData()

users_table = Table('users', metadata, 
    Column('user_id', Integer, primary_key = True),
    Column('user_name', String(16), nullable = False),
)

articles_table = Table('articles', metadata,
    Column('article_id', Integer, primary_key = True),
    Column('headline', String(150), key='headline'),
    Column('body', TEXT, key='body'),
)

keywords_table = Table('keywords', metadata,
    Column('keyword_id', Integer, primary_key = True),
    Column('keyword_name', String(50))
)

# add "attached_by" column which will reference the user who attached this keyword
itemkeywords_table = Table('article_keywords', metadata,
    Column('article_id', Integer, ForeignKey("articles.article_id")),
    Column('keyword_id', Integer, ForeignKey("keywords.keyword_id")),
    Column('attached_by', Integer, ForeignKey("users.user_id"))
)

engine = create_engine('sqlite:///', echo=True)
metadata.create_all(engine)

# class definitions
class User(object):
    pass
class Keyword(object):
    def __init__(self, name):
        self.keyword_name = name
class Article(object):
    pass
class KeywordAssociation(object):
    pass

# Article mapper, relates to Keyword via KeywordAssociation
mapper(Article, articles_table, properties={
    'keywords':relation(KeywordAssociation, lazy=False, cascade="all, delete-orphan")
    }
)

# mapper for KeywordAssociation
# specify "primary key" columns manually
mapper(KeywordAssociation, itemkeywords_table,
    primary_key=[itemkeywords_table.c.article_id, itemkeywords_table.c.keyword_id],
    properties={
        'keyword' : relation(Keyword, lazy=False), 
        'user' : relation(User, lazy=False) 
    }
)

# user mapper
mapper(User, users_table)

# keyword mapper
mapper(Keyword, keywords_table)

session = create_session(bind=engine)
# select by keyword
sqlalist = session.query(Article).join(['keywords', 'keyword']).filter_by(keyword_name='jacks_stories').all()
# user is available
for a in alist:
    for k in a.keywords:
        if k.keyword.name == 'jacks_stories':
            print k.user.user_name

Keep in mind that the association object works a little differently from a plain many-to-many relationship. Members have to be added to the list via instances of the association object, which in turn point to the associated object:

user = User()
user.user_name = 'some user'

article = Article()

assoc = KeywordAssociation()
assoc.keyword = Keyword('blue')
assoc.user = user

assoc2 = KeywordAssociation()
assoc2.keyword = Keyword('green')
assoc2.user = user

article.keywords.append(assoc)
article.keywords.append(assoc2)

session.save(article)

session.flush()

SQLAlchemy includes an extension module which can be used in some cases to decrease the explicitness of the association object pattern; this extension is described in associationproxy.

Note that you should not combine the usage of a secondary relationship with an association object pattern against the same association table. This is because SQLAlchemy's unit of work will regard rows in the table tracked by the secondary argument as distinct from entities mapped into the table by the association mapper, causing unexpected behaviors when rows are changed by one mapping and not the other.

back to section top

Overview

The concept behind Unit of Work is to track modifications to a field of objects, and then be able to flush those changes to the database in a single operation. Theres a lot of advantages to this, including that your application doesn't need to worry about individual save operations on objects, nor about the required order for those operations, nor about excessive repeated calls to save operations that would be more efficiently aggregated into one step. It also simplifies database transactions, providing a neat package with which to insert into the traditional database begin/commit phase.

SQLAlchemy's unit of work includes these functions:

back to section top

Object States

When dealing with mapped instances with regards to Sessions, an instance may be attached or unattached to a particular Session. An instance also may or may not correspond to an actual row in the database. The product of these two binary conditions yields us four general states a particular instance can have within the perspective of the Session:

back to section top

Acquiring a Session

A new Session object is constructed via the create_session() function:

session = create_session()

A common option used with create_session() is to specify a specific Engine or Connection to be used for all operations performed by this Session:

# create an engine
e = create_engine('postgres://some/url')

# create a Session that will use this engine for all operations.
# it will open and close Connections as needed.
session = create_session(bind=e)

# open a Connection
conn = e.connect()

# create a Session that will use this specific Connection for all operations
session = create_session(bind=conn)

The session to which an object is attached can be acquired via the object_session() function, which returns the appropriate Session if the object is pending or persistent, or None if the object is transient or detached:

session = object_session(obj)

Session Facts:

We will now cover some of the key concepts used by Sessions and its underlying Unit of Work.

back to section top

Introduction to the Identity Map

A primary concept of the Session's underlying Unit of Work is that it is keeps track of all persistent instances; recall that a persistent instance has a database identity and is attached to a Session. In particular, the Unit of Work must ensure that only one copy of a particular persistent instance exists within the Session at any given time. The UOW accomplishes this task using a dictionary known as an Identity Map.

When a Query is used to issue select or get requests to the database, it will in nearly all cases result in an actual SQL execution to the database, and a corresponding traversal of rows received from that execution. However, when the underlying mapper actually creates objects corresponding to the result set rows it receives, it will check the session's identity map first before instantating a new object, and return the same instance already present in the identity map if it already exists, essentially ignoring the object state represented by that row. There are several ways to override this behavior and truly refresh an already-loaded instance which are described later, but the main idea is that once your instance is loaded into a particular Session, it will never change its state without your explicit approval, regardless of what the database says about it.

For example; below, two separate calls to load an instance with database identity "15" are issued, and the results assigned to two separate variables. However, since the same Session was used, the two instances are the same instance:

mymapper = mapper(MyClass, mytable)

session = create_session()
obj1 = session.query(MyClass).filter(mytable.c.id==15).first()
obj2 = session.query(MyClass).filter(mytable.c.id==15).first()

>>> obj1 is obj2
True

The Identity Map is an instance of dict by default. As an option, you can specify the flag weak_identity_map=True to the create_session function so that it will use a weakref.WeakValueDictionary, so that when an in-memory object falls out of scope, it will be removed automatically, thereby providing some automatic management of memory. However, this may not be instant if there are circular references upon the object. To guarantee that an instance is removed from the identity map before removing references to it, use the expunge() method, described later, to remove it. Additionally, note that an object that has changes marked on it (i.e. "dirty") can still fall out of scope when using weak_identity_map.

The Session supports an iterator interface in order to see all objects in the identity map:

for obj in session:
    print obj

As well as __contains__():

if obj in session:
    print "Object is present"

The identity map itself is accessible via the identity_map accessor:

>>> session.identity_map.values()
[<__main__.User object at 0x712630>, <__main__.Address object at 0x712a70>]

The identity of each object instance is available via the _instance_key property attached to each object instance, and is a tuple consisting of the object's class and an additional tuple of primary key values, in the order that they appear within the table definition:

>>> obj._instance_key 
(<class 'test.tables.User'>, (7,))

At the moment that an object is assigned this key within a flush() operation, it is also added to the session's identity map.

The get() method on Query, which retrieves an object based on primary key identity, also checks in the Session's identity map first to save a database round-trip if possible. In the case of an object lazy-loading a single child object, the get() method is used as well, so scalar-based lazy loads may in some cases not query the database; this is particularly important for backreference relationships as it can save a lot of queries.

back to section top

Whats Changed ?

The next concept is that in addition to the Session storing a record of all objects loaded or saved, it also stores lists of all newly created (i.e. pending) objects and lists of all persistent objects that have been marked as deleted. These lists are used when a flush() call is issued to save all changes. During a flush operation, it also scans its list of persistent instances for changes which are marked as dirty.

These records are all tracked by collection functions that are also viewable off the Session as properties:

# pending objects recently added to the Session
session.new

# persistent objects which currently have changes detected
# (this collection is now created on the fly each time the property is called)
session.dirty

# persistent objects that have been marked as deleted via session.delete(obj)
session.deleted

Note that if a session is created with the weak_identity_map flag, an item which is marked as "dirty" will be silently removed from the session if the item falls out of scope in the user application. This is because the unit of work does not look for "dirty" changes except for within a flush operation (or any time the session.dirty collection is accessed).

As for objects inside of new and deleted, if you abandon all references to new or modified objects within a session, they are still present in either of those two lists, and will be saved on the next flush operation, unless they are removed from the Session explicitly (more on that later).

back to section top

The Session API

query()

The query() function takes one or more classes and/or mappers, along with an optional entity_name parameter, and returns a new Query object which will issue mapper queries within the context of this Session. For each mapper is passed, the Query uses that mapper. For each class, the Query will locate the primary mapper for the class using class_mapper().

# query from a class
session.query(User).filter_by(name='ed').all()

# query with multiple classes, returns tuples
session.query(User, Address).join('addresses').filter_by(name='ed').all()

# query from a mapper
query = session.query(usermapper)
x = query.get(1)

# query from a class mapped with entity name 'alt_users'
q = session.query(User, entity_name='alt_users')
y = q.options(eagerload('orders')).all()

entity_name is an optional keyword argument sent with a class object, in order to further qualify which primary mapper to be used; this only applies if there was a Mapper created with that particular class/entity name combination, else an exception is raised. All of the methods on Session which take a class or mapper argument also take the entity_name argument, so that a given class can be properly matched to the desired primary mapper.

All instances retrieved by the returned Query object will be stored as persistent instances within the originating Session.

back to section top

get()

Given a class or mapper, a scalar or tuple-based identity, and an optional entity_name keyword argument, creates a Query corresponding to the given mapper or class/entity_name combination, and calls the get() method with the given identity value. If the object already exists within this Session, it is simply returned, else it is queried from the database. If the instance is not found, the method returns None.

# get Employer primary key 5
employer = session.get(Employer, 5)

# get Report composite primary key 7,12, using mapper 'report_mapper_b'
report = session.get(Report, (7,12), entity_name='report_mapper_b')
back to section top

load()

load() is similar to get() except it will raise an exception if the instance does not exist in the database. It will also load the object's data from the database in all cases, and overwrite all changes on the object if it already exists in the session with the latest data from the database.

# load Employer primary key 5
employer = session.load(Employer, 5)

# load Report composite primary key 7,12, using mapper 'report_mapper_b'
report = session.load(Report, (7,12), entity_name='report_mapper_b')
back to section top

save()

save() is called with a single transient (unsaved, unattached) instance as an argument, which is then added to the Session and becomes pending. When the session is next flushed, the instance will be saved to the database uponwhich it becomes persistent (saved, attached). If the given instance is not transient, meaning it is either attached to an existing Session or it has a database identity, an exception is raised.

user1 = User(name='user1')
user2 = User(name='user2')
session.save(user1)
session.save(user2)

session.flush()     # write changes to the database

save() is called automatically for new instances by the classes' associated mapper, if a default Session context is in effect (such as a thread-local session), which means that newly created instances automatically become pending. If there is no default session available, then the instance remains transient (unattached) until it is explicitly added to a Session via the save() method.

A transient instance also can be automatically saveed if it is associated with a parent object which specifies save-update within its cascade rules, and that parent is already attached or becomes attached to a Session. For more information on cascade, see the next section.

The save_or_update() method, covered later, is a convenience method which will call the save() or update() methods appropriately dependening on whether or not the instance has a database identity (but the instance still must be unattached).

back to section top

flush()

This is the main gateway to what the Unit of Work does best, which is save everything ! It should be clear by now what a flush looks like:

session.flush()

It also can be called with a list of objects; in this form, the flush operation will be limited only to the objects specified in the list, as well as any child objects within private relationships for a delete operation:

# saves only user1 and address2.  all other modified
# objects remain present in the session.
session.flush([user1, address2])

This second form of flush should be used carefully as it will not necessarily locate other dependent objects within the session, whose database representation may have foreign constraint relationships with the objects being operated upon.

Notes on Flush

A common misconception about the flush() operation is that once performed, the newly persisted instances will automatically have related objects attached to them, based on the values of primary key identities that have been assigned to the instances before they were persisted. An example would be, you create a new Address object, set address.user_id to 5, and then flush() the session. The erroneous assumption would be that there is now a User object of identity "5" attached to the Address object, but in fact this is not the case. If you were to refresh() the Address, invalidating its current state and re-loading, then it would have the appropriate User object present.

This misunderstanding is related to the observed behavior of backreferences (Backreferences), which automatically associates an instance "A" with another instance "B", in response to the manual association of instance "B" to instance "A" by the user. The backreference operation occurs completely externally to the flush() operation, and is pretty much the only example of a SQLAlchemy feature that manipulates the relationships of persistent objects.

The primary guideline for dealing with flush() is, the developer is responsible for maintaining in-memory objects and their relationships to each other, the unit of work is responsible for maintaining the database representation of the in-memory objects. The typical pattern is that the manipulation of objects is the way that changes get communicated to the unit of work, so that when the flush occurs, the objects are already in their correct in-memory representation and problems dont arise. The manipulation of identifier attributes like integer key values as well as deletes in particular are a frequent source of confusion.

back to section top

close()

This method first calls clear(), removing all objects from this Session, and then ensures that any transactional resources are closed.

back to section top

delete()

The delete method places an instance into the Unit of Work's list of objects to be marked as deleted:

# mark two objects to be deleted
session.delete(obj1)
session.delete(obj2)

# flush
session.flush()

The delete operation will have an effect on instances that are attached to the deleted instance according to the cascade style of the relationship; cascade rules are described further in the following section. By default, associated instances may need to be updated in the database to reflect that they no longer are associated with the parent object, before the parent is deleted. If the relationship specifies cascade="delete", then the associated instance will also be deleted upon flush, assuming it is still attached to the parent. If the relationship additionally includes the delete-orphan cascade style, the associated instance will be deleted if it is still attached to the parent, or is unattached to any other parent.

The delete() operation has no relationship to the in-memory status of the instance, including usage of the del Python statement. An instance marked as deleted and flushed will still exist within memory until references to it are freed; similarly, removing an instance from memory via the del statement will have no effect, since the persistent instance will still be referenced by its Session. Obviously, if the instance is removed from the Session and then totally dereferenced, it will no longer exist in memory, but also won't exist in any Session and is therefore not deleted from the database.

Note that the "in-memory status" of an instance also refers to its presence in any other collection. SQLAlchemy does not track the collections to which an instance is a member, and will not remove an instance from its parent collections that were not directly involved in a deletion operation. The operational and memory overhead implied by this would be too great (such as, if an object belonged to hundreds of collections). This means if an object A is attached to both an object B and an object C, if you delete() A and flush, A still remains attached to both B and C in a deleted state and must be removed by the application. Similarly, if a delete on B cascades to A, this does not affect A still being present on C - again it must be manually removed.

back to section top

clear()

This method detaches all instances from the Session, sending them to the detached or transient state as applicable, and replaces the underlying UnitOfWork with a new one.

session.clear()

The clear() method is particularly useful with a "default context" session such as a thread-local session, which can stay attached to the current thread to handle a new field of objects without having to re-attach a new Session.

back to section top

refresh() / expire()

To assist with the Unit of Work's "sticky" behavior, individual objects can have all of their attributes immediately re-loaded from the database, or marked as "expired" which will cause a re-load to occur upon the next access of any of the object's mapped attributes. This includes all relationships, so lazy-loaders will be re-initialized, eager relationships will be repopulated. Any changes marked on the object are discarded:

# immediately re-load attributes on obj1, obj2
session.refresh(obj1)
session.refresh(obj2)

# expire objects obj1, obj2, attributes will be reloaded
# on the next access:
session.expire(obj1)
session.expire(obj2)
back to section top

expunge()

Expunge removes an object from the Session, sending persistent instances to the detached state, and pending instances to the transient state:

session.expunge(obj1)

Use expunge when youd like to remove an object altogether from memory, such as before calling del on it, which will prevent any "ghost" operations occuring when the session is flushed.

back to section top

bind_mapper() / bind_table()

Both of these methods receive two arguments; in the case of bind_mapper(), it is a Mapper and an Engine or Connection instance; in the case of bind_table(), it is a Table instance or other Selectable (such as an Alias, Select, etc.), and an Engine or Connection instance.

engine1 = create_engine('sqlite:///file1.db')
engine2 = create_engine('mysql://localhost')

sqlite_conneciton = engine1.connect()

sess = create_session()

sess.bind_mapper(mymapper, sqlite_connection)  # bind mymapper operations to a single SQLite connection
sess.bind_table(email_addresses_table, engine2) # bind operations with the email_addresses_table to mysql

Normally, when a Session is created via create_session() with no arguments, the Session has no awareness of individual Engines, and when mappers use the Session to retrieve connections, the underlying MetaData each Table is associated with is expected to be "bound" to an Engine, else no engine can be located and an exception is raised. A second form of create_session() takes the argument bind=engine_or_connection, where all SQL operations performed by this Session use the single Engine or Connection (collectively known as a Connectable) passed to the constructor. With bind_mapper() and bind_table(), the operations of individual mapper and/or tables are bound to distinct engines or connections, thereby overriding not only the engine which may be "bound" to the underlying MetaData, but also the Engine or Connection which may have been passed to the create_session() function. Configurations which interact with multiple explicit database connections at one time must use either or both of these methods in order to associate Session operations with the appropriate connection resource.

Binding a Mapper to a resource takes precedence over a Table bind, meaning if mapper A is associated with table B, and the Session binds mapper A to connection X and table B to connection Y, an operation with mapper A will use connection X, not connection Y.

back to section top

update()

The update() method is used only with detached instances. A detached instance only exists if its Session was cleared or closed, or the instance was expunge()d from its session. update() will re-attach the detached instance with this Session, bringing it back to the persistent state, and allowing any changes on the instance to be saved when the Session is next flushed. If the instance is already attached to an existing Session, an exception is raised.

A detached instance also can be automatically updateed if it is associated with a parent object which specifies save-update within its cascade rules, and that parent is already attached or becomes attached to a Session. For more information on cascade, see the next section.

The save_or_update() method is a convenience method which will call the save() or update() methods appropriately dependening on whether or not the instance has a database identity (but the instance still must be unattached).

back to section top

save_or_update()

This method is a combination of the save() and update() methods, which will examine the given instance for a database identity (i.e. if it is transient or detached), and will call the implementation of save() or update() as appropriate. Use save_or_update() to add unattached instances to a session when you're not sure if they were newly created or not. Like save() and update(), save_or_update() cascades along the save-update cascade indicator, described in the cascade section below.

back to section top

merge()

merge() is used to return the persistent version of an instance that is not attached to this Session. When passed an instance, if an instance with its database identity already exists within this Session, it is returned. If the instance does not exist in this Session, it is loaded from the database and then returned.

A future version of merge() will also update the Session's instance with the state of the given instance (hence the name "merge").

This method is useful for bringing in objects which may have been restored from a serialization, such as those stored in an HTTP session:

# deserialize an object
myobj = pickle.loads(mystring)

# "merge" it.  if the session already had this object in the 
# identity map, then you get back the one from the current session.
myobj = session.merge(myobj)

Note that merge() does not associate the given instance with the Session; it remains detached (or attached to whatever Session it was already attached to).

back to section top

Cascade rules

Mappers support the concept of configurable cascade behavior on relation()s. This behavior controls how the Session should treat the instances that have a parent-child relationship with another instance that is operated upon by the Session. Cascade is indicated as a comma-separated list of string keywords, with the possible values all, delete, save-update, refresh-expire, merge, expunge, and delete-orphan.

Cascading is configured by setting the cascade keyword argument on a relation():

mapper(Order, order_table, properties={
    'items' : relation(Item, items_table, cascade="all, delete-orphan"),
    'customer' : relation(User, users_table, user_orders_table, cascade="save-update"),
})

The above mapper specifies two relations, items and customer. The items relationship specifies "all, delete-orphan" as its cascade value, indicating that all save, update, merge, expunge, refresh delete and expire operations performed on a parent Order instance should also be performed on the child Item instances attached to it (save and update are cascaded using the save_or_update() method, so that the database identity of the instance doesn't matter). The delete-orphan cascade value additionally indicates that if an Item instance is no longer associated with an Order, it should also be deleted. The "all, delete-orphan" cascade argument allows a so-called lifecycle relationship between an Order and an Item object.

The customer relationship specifies only the "save-update" cascade value, indicating most operations will not be cascaded from a parent Order instance to a child User instance, except for if the Order is attached with a particular session, either via the save(), update(), or save-update() method.

Additionally, when a child item is attached to a parent item that specifies the "save-update" cascade value on the relationship, the child is automatically passed to save_or_update() (and the operation is further cascaded to the child item).

Note that cascading doesn't do anything that isn't possible by manually calling Session methods on individual instances within a hierarchy, it merely automates common operations on a group of associated instances.

The default value for cascade on relation()s is save-update, and the private=True keyword argument is a synonym for cascade="all, delete-orphan".

back to section top

SessionTransaction

SessionTransaction is a multi-engine transaction manager, which aggregates one or more Engine/Connection pairs and keeps track of a Transaction object for each one. As the Session receives requests to execute SQL statements, it uses the Connection that is referenced by the SessionTransaction. At commit time, the underyling Session is flushed, and each Transaction is the committed.

Example usage is as follows:

sess = create_session()
trans = sess.create_transaction()
try:
    item1 = sess.query(Item).get(1)
    item2 = sess.query(Item).get(2)
    item1.foo = 'bar'
    item2.bar = 'foo'
except:
    trans.rollback()
    raise
trans.commit()

The SessionTransaction object supports Python 2.5's with statement so that the example above can be written as:

sess = create_session()
with sess.create_transaction():
    item1 = sess.query(Item).get(1)
    item2 = sess.query(Item).get(2)
    item1.foo = 'bar'
    item2.bar = 'foo'

The create_transaction() method creates a new SessionTransaction object but does not declare any connection/transaction resources. At the point of the first get() call, a connection resource is opened off the engine that corresponds to the Item classes' mapper and is stored within the SessionTransaction with an open Transaction. When trans.commit() is called, the flush() method is called on the Session and the corresponding update statements are issued to the database within the scope of the transaction already opened; afterwards, the underying Transaction is committed, and connection resources are freed.

SessionTransaction, like the Transaction off of Connection also supports "nested" behavior, and is safe to pass to other functions which then issue their own begin()/commit() pair; only the outermost begin()/commit() pair actually affects the transaction, and any call to rollback() within a particular call stack will issue a rollback.

Note that while SessionTransaction is capable of tracking multiple transactions across multiple databases, it currently is in no way a fully functioning two-phase commit engine; generally, when dealing with multiple databases simultaneously, there is the distinct possibility that a transaction can succeed on the first database and fail on the second, which for some applications may be an invalid state. If this is an issue, its best to either refrain from spanning transactions across databases, or to look into some of the available technologies in this area, such as Zope which offers a two-phase commit engine; some users have already created their own SQLAlchemy/Zope hybrid implementations to deal with scenarios like these.

SessionTransaction Facts:

Using SQL with SessionTransaction

The SessionTransaction can interact with direct SQL queries in two general ways. Either specific Connection objects can be associated with the SessionTransaction, which are then useable both for direct SQL as well as within flush() operations performed by the SessionTransaction, or via accessing the Connection object automatically referenced within the SessionTransaction.

To associate a specific Connection with the SessionTransaction, use the add() method:

Associate a Connection with the SessionTransaction
connection = engine.connect()
trans = session.create_transaction()
try:
    trans.add(connection)
    connection.execute(mytable.update(), {'col1':4, 'col2':17})
    session.flush() # flush() operation will use the same connection
except:
    trans.rollback()
    raise
trans.commit()

The add() method will key the Connection's underlying Engine to this SessionTransaction. When mapper operations are performed against this Engine, the Connection explicitly added will be used. This overrides any other Connection objects that the underlying Session was associated with, corresponding to the underlying Engine of that Connection. However, if the SessionTransaction itself is already associated with a Connection, then an exception is thrown.

The other way is just to use the Connection referenced by the SessionTransaction. This is performed via the connection() method, and requires passing in a class or Mapper which indicates which underlying Connection should be returned (recall that different Mappers may use different underlying Engines). If the class_or_mapper argument is None, then the Session must be globally bound to a specific Engine when it was constructed, else the method returns None.

Get a Connection from the SessionTransaction
trans = session.create_transaction()
try:
    connection = trans.connection(UserClass)   # get the Connection used by the UserClass' Mapper
    connection.execute(mytable.update(), {'col1':4, 'col2':17})
except:
    trans.rollback()
    raise
trans.commit()

The connection() method also exists on the Session object itself, and can be called regardless of whether or not a SessionTransaction is in progress. If a SessionTransaction is in progress, it will return the connection referenced by the transaction. If an Engine is being used with threadlocal strategy, the Connection returned will correspond to the connection resources that are bound to the current thread, if any (i.e. it is obtained by calling contextual_connect()).

back to section top

Using Engine-level Transactions with Sessions

The transactions issued by SessionTransaction as well as internally by the Session's flush() operation use the same Transaction object off of Connection that is publically available. Recall that this object supports "nestable" behavior, meaning any number of actors can call begin() off a particular Connection object, and they will all be managed within the scope of a single transaction. Therefore, the flush() operation can similarly take place within the scope of a regular Transaction:

Transactions with Sessions
connection = engine.connect()   # Connection
session = create_session(bind=connection) # Session bound to the Connection
trans = connection.begin()      # start transaction
try:
    stuff = session.query(MyClass).select()     # Session operation uses connection
    stuff[2].foo = 'bar'
    connection.execute(mytable.insert(), dict(id=12, value="bar"))    # use connection explicitly
    session.flush()     # Session flushes with "connection", using transaction "trans"
except:
    trans.rollback()    # or rollback
    raise
trans.commit()      # commit
back to section top

This section details all the options available to Mappers, as well as advanced patterns.

To start, heres the tables we will work with again:

from sqlalchemy import *

metadata = MetaData()

# a table to store users
users_table = Table('users', metadata,
    Column('user_id', Integer, primary_key = True),
    Column('user_name', String(40)),
    Column('password', String(80))
)

# a table that stores mailing addresses associated with a specific user
addresses_table = Table('addresses', metadata,
    Column('address_id', Integer, primary_key = True),
    Column('user_id', Integer, ForeignKey("users.user_id")),
    Column('street', String(100)),
    Column('city', String(80)),
    Column('state', String(2)),
    Column('zip', String(10))
)

# a table that stores keywords
keywords_table = Table('keywords', metadata,
    Column('keyword_id', Integer, primary_key = True),
    Column('name', VARCHAR(50))
)

# a table that associates keywords with users
userkeywords_table = Table('userkeywords', metadata,
    Column('user_id', INT, ForeignKey("users")),
    Column('keyword_id', INT, ForeignKey("keywords"))
)

More On Mapper Properties

Overriding Column Names

When mappers are constructed, by default the column names in the Table metadata are used as the names of attributes on the mapped class. This can be customzed within the properties by stating the key/column combinations explicitly:

user_mapper = mapper(User, users_table, properties={
    'id' : users_table.c.user_id,
    'name' : users_table.c.user_name,
})

In the situation when column names overlap in a mapper against multiple tables, columns may be referenced together with a list:

# join users and addresses
usersaddresses = sql.join(users_table, addresses_table, users_table.c.user_id == addresses_table.c.user_id)
m = mapper(User, usersaddresses,   
    properties = {
        'id' : [users_table.c.user_id, addresses_table.c.user_id],
    }
    )
back to section top

Overriding Properties

A common request is the ability to create custom class properties that override the behavior of setting/getting an attribute. Currently, the easiest way to do this in SQLAlchemy is how it would be done in any Python program; define your attribute with a different name, such as "_attribute", and use a property to get/set its value. The mapper just needs to be told of the special name:

class MyClass(object):
    def _set_email(self, email):
       self._email = email
    def _get_email(self):
       return self._email
    email = property(_get_email, _set_email)

mapper(MyClass, mytable, properties = {
   # map the '_email' attribute to the "email" column
   # on the table
   '_email': mytable.c.email
})

It is also possible to route the the select_by and get_by functions on Query using the new property name, by establishing a synonym:

mapper(MyClass, mytable, properties = {
    # map the '_email' attribute to the "email" column
    # on the table
    '_email': mytable.c.email,

    # make a synonym 'email'
    'email' : synonym('_email')
})

# now you can select_by(email)
result = session.query(MyClass).select_by(email='john@smith.com')

Synonym can be established with the flag "proxy=True", to create a class-level proxy to the actual property. This has the effect of creating a fully functional synonym on class instances:

mapper(MyClass, mytable, properties = {
    '_email': mytable.c.email
    'email' : synonym('_email', proxy=True)
})

x = MyClass()
x.email = 'john@doe.com'

>>> x._email
'john@doe.com'
back to section top

Custom List Classes

Feature Status: Alpha API

A one-to-many or many-to-many relationship results in a list-holding element being attached to all instances of a class. The actual list is an "instrumented" list, which transparently maintains a relationship to a plain Python list. The implementation of the underlying plain list can be changed to be any object that implements a list-style append and __iter__ method. A common need is for a list-based relationship to actually be a dictionary. This can be achieved by subclassing dict to have list-like behavior.

In this example, a class MyClass is defined, which is associated with a parent object MyParent. The collection of MyClass objects on each MyParent object will be a dictionary, storing each MyClass instance keyed to its name attribute.

# a class to be stored in the list
class MyClass(object):
    def __init__(self, name):
        self.name = name

# create a dictionary that will act like a list, and store
# instances of MyClass
class MyDict(dict):
    def append(self, item):
        self[item.name] = item
    def __iter__(self):
        return self.values()

# parent class
class MyParent(object):
    pass

# mappers, constructed normally
mapper(MyClass, myclass_table)
mapper(MyParent, myparent_table, properties={
    'myclasses' : relation(MyClass, collection_class=MyDict)
})

# elements on 'myclasses' can be accessed via string keyname
myparent = MyParent()
myparent.myclasses.append(MyClass('this is myclass'))
myclass = myparent.myclasses['this is myclass']

Note: SQLAlchemy 0.4 has an overhauled and much improved implementation for custom list classes, with some slight API changes.

back to section top

Custom Join Conditions

When creating relations on a mapper, most examples so far have illustrated the mapper and relationship joining up based on the foreign keys of the tables they represent. in fact, this "automatic" inspection can be completely circumvented using the primaryjoin and secondaryjoin arguments to relation, as in this example which creates a User object which has a relationship to all of its Addresses which are in Boston:

class User(object):
    pass
class Address(object):
    pass

mapper(Address, addresses_table)
mapper(User, users_table, properties={
    'boston_addresses' : relation(Address, primaryjoin=
                and_(users_table.c.user_id==Address.c.user_id, 
                Addresses.c.city=='Boston'))
})

Many to many relationships can be customized by one or both of primaryjoin and secondaryjoin, shown below with just the default many-to-many relationship explicitly set:

class User(object):
    pass
class Keyword(object):
    pass
mapper(Keyword, keywords_table)
mapper(User, users_table, properties={
    'keywords':relation(Keyword, secondary=userkeywords_table,
        primaryjoin=users_table.c.user_id==userkeywords_table.c.user_id,
        secondaryjoin=userkeywords_table.c.keyword_id==keywords_table.c.keyword_id
        )
})
back to section top

Lazy/Eager Joins Multiple Times to One Table

The previous example leads in to the idea of joining against the same table multiple times. Below is a User object that has lists of its Boston and New York addresses:

mapper(User, users_table, properties={
    'boston_addresses' : relation(Address, primaryjoin=
                and_(users_table.c.user_id==Address.c.user_id, 
                Addresses.c.city=='Boston')),
    'newyork_addresses' : relation(Address, primaryjoin=
                and_(users_table.c.user_id==Address.c.user_id, 
                Addresses.c.city=='New York')),
})

Both lazy and eager loading support multiple joins equally well.

back to section top

Deferred Column Loading

This feature allows particular columns of a table to not be loaded by default, instead being loaded later on when first referenced. It is essentailly "column-level lazy loading". This feature is useful when one wants to avoid loading a large text or binary field into memory when its not needed. Individual columns can be lazy loaded by themselves or placed into groups that lazy-load together.

book_excerpts = Table('books', db, 
    Column('book_id', Integer, primary_key=True),
    Column('title', String(200), nullable=False),
    Column('summary', String(2000)),
    Column('excerpt', String),
    Column('photo', Binary)
)

class Book(object):
    pass

# define a mapper that will load each of 'excerpt' and 'photo' in 
# separate, individual-row SELECT statements when each attribute
# is first referenced on the individual object instance
mapper(Book, book_excerpts, properties = {
    'excerpt' : deferred(book_excerpts.c.excerpt),
    'photo' : deferred(book_excerpts.c.photo)
})

Deferred columns can be placed into groups so that they load together:

book_excerpts = Table('books', db, 
    Column('book_id', Integer, primary_key=True),
    Column('title', String(200), nullable=False),
    Column('summary', String(2000)),
    Column('excerpt', String),
    Column('photo1', Binary),
    Column('photo2', Binary),
    Column('photo3', Binary)
)

class Book(object):
    pass

# define a mapper with a 'photos' deferred group.  when one photo is referenced,
# all three photos will be loaded in one SELECT statement.  The 'excerpt' will 
# be loaded separately when it is first referenced.
mapper(Book, book_excerpts, properties = {
    'excerpt' : deferred(book_excerpts.c.excerpt),
    'photo1' : deferred(book_excerpts.c.photo1, group='photos'),
    'photo2' : deferred(book_excerpts.c.photo2, group='photos'),
    'photo3' : deferred(book_excerpts.c.photo3, group='photos')
})

You can defer or undefer columns at the Query level with the options method:

query = session.query(Book)
query.options(defer('summary')).all()
query.options(undefer('excerpt')).all()
back to section top

Working with Large Collections

SQLAlchemy relations are generally simplistic; the lazy loader loads in the full list of child objects when accessed, and the eager load builds a query that loads the full list of child objects. Additionally, when you are deleting a parent object, SQLAlchemy ensures that it has loaded the full list of child objects so that it can mark them as deleted as well (or to update their parent foreign key to NULL). It does not issue an en-masse "delete from table where parent_id=?" type of statement in such a scenario. This is because the child objects themselves may also have further dependencies, and additionally may also exist in the current session in which case SA needs to know their identity so that their state can be properly updated.

So there are several techniques that can be used individually or combined together to address these issues, in the context of a large collection where you normally would not want to load the full list of relationships:

  • Use lazy=None to disable child object loading (i.e. noload)

    mapper(MyClass, table, properties=relation{
        'children':relation(MyOtherClass, lazy=None)
    })
  • To load child objects, just use a query. Of particular convenience is that Query is a generative object, so you can return it as is, allowing additional criterion to be added as needed:

    class Organization(object):
        def __init__(self, name):
            self.name = name
        member_query = property(lambda self: object_session(self).query(Member).with_parent(self))
    
    myorg = sess.query(Organization).get(5)
    
    # get all members
    members = myorg.member_query.list()
    
    # query a subset of members using LIMIT/OFFSET
    members = myorg.member_query[5:10]
  • Use passive_deletes=True to disable child object loading on a DELETE operation, in conjunction with "ON DELETE (CASCADE|SET NULL)" on your database to automatically cascade deletes to child objects. Note that "ON DELETE" is not supported on SQLite, and requires InnoDB tables when using MySQL:

    mytable = Table('mytable', meta,
        Column('id', Integer, primary_key=True),
        )
    
    myothertable = Table('myothertable', meta,
        Column('id', Integer, primary_key=True),
        Column('parent_id', Integer),
        ForeignKeyConstraint(['parent_id'],['mytable.id'], ondelete="CASCADE"),
        )
    
    mmapper(MyOtherClass, myothertable)
    
    mapper(MyClass, mytable, properties={
        'children':relation(MyOtherClass, passive_deletes=True)
    })
  • As an alternative to using "ON DELETE CASCADE", for very simple scenarios you can create a simple MapperExtension that will issue a DELETE for child objects before the parent object is deleted:

    class DeleteMemberExt(MapperExtension):
        def before_delete(self, mapper, connection, instance):
            connection.execute(member_table.delete(member_table.c.org_id==instance.org_id))
    
    mapper(Organization, org_table, extension=DeleteMemberExt(), properties = {
        'members' : relation(Member, lazy=None, passive_deletes=True, cascade="all, delete-orphan")
    })

Note that this approach is not nearly as efficient or general-purpose as "ON DELETE CASCADE", since the database itself can cascade the operation along any number of tables.

The latest distribution includes an example examples/collection/large_collection.py which illustrates most of these techniques.

back to section top

Relation Options

Options which can be sent to the relation() function. For arguments to mapper(), see Mapper Keyword Arguments.

  • association - Deprecated; as of version 0.3.0 the association keyword is synonomous with applying the "all, delete-orphan" cascade to a "one-to-many" relationship. SA can now automatically reconcile a "delete" and "insert" operation of two objects with the same "identity" in a flush() operation into a single "update" statement, which is the pattern that "association" used to indicate. See the updated example of association mappings in datamapping_association.
  • backref - indicates the name of a property to be placed on the related mapper's class that will handle this relationship in the other direction, including synchronizing the object attributes on both sides of the relation. Can also point to a backref() construct for more configurability. See Backreferences.
  • cascade - a string list of cascade rules which determines how persistence operations should be "cascaded" from parent to child. For a description of cascade rules, see Lifecycle Relations and Cascade rules.
  • collection_class - a class or function that returns a new list-holding object. will be used in place of a plain list for storing elements. See Custom List Classes.
  • foreign_keys - a list of columns which are to be used as "foreign key" columns. this parameter should be used in conjunction with explicit primaryjoin and secondaryjoin (if needed) arguments, and the columns within the foreign_keys list should be present within those join conditions. Normally, relation() will inspect the columns within the join conditions to determine which columns are the "foreign key" columns, based on information in the Table metadata. Use this argument when no ForeignKey's are present in the join condition, or to override the table-defined foreign keys.
  • foreignkey - deprecated. use the foreign_keys argument for foreign key specification, or remote_side for "directional" logic.
  • lazy=True - specifies how the related items should be loaded. a value of True indicates they should be loaded lazily when the property is first accessed. A value of False indicates they should be loaded by joining against the parent object query, so parent and child are loaded in one round trip (i.e. eagerly). A value of None indicates the related items are not loaded by the mapper in any case; the application will manually insert items into the list in some other way. In all cases, items added or removed to the parent object's collection (or scalar attribute) will cause the appropriate updates and deletes upon flush(), i.e. this option only affects load operations, not save operations.

  • order_by - indicates the ordering that should be applied when loading these items. See the section Controlling Ordering for details.
  • passive_deletes=False - Indicates if lazy-loaders should not be executed during the flush() process, which normally occurs in order to locate all existing child items when a parent item is to be deleted. Setting this flag to True is appropriate when ON DELETE CASCADE rules have been set up on the actual tables so that the database may handle cascading deletes automatically. This strategy is useful particularly for handling the deletion of objects that have very large (and/or deep) child-object collections. See the example in Working with Large Collections.
  • post_update - this indicates that the relationship should be handled by a second UPDATE statement after an INSERT or before a DELETE. Currently, it also will issue an UPDATE after the instance was UPDATEd as well, although this technically should be improved. This flag is used to handle saving bi-directional dependencies between two individual rows (i.e. each row references the other), where it would otherwise be impossible to INSERT or DELETE both rows fully since one row exists before the other. Use this flag when a particular mapping arrangement will incur two rows that are dependent on each other, such as a table that has a one-to-many relationship to a set of child rows, and also has a column that references a single child row within that list (i.e. both tables contain a foreign key to each other). If a flush() operation returns an error that a "cyclical dependency" was detected, this is a cue that you might want to use post_update to "break" the cycle.
  • primaryjoin - a ClauseElement that will be used as the primary join of this child object against the parent object, or in a many-to-many relationship the join of the primary object to the association table. By default, this value is computed based on the foreign key relationships of the parent and child tables (or association table).
  • private=False - deprecated. setting private=True is the equivalent of setting cascade="all, delete-orphan", and indicates the lifecycle of child objects should be contained within that of the parent. See the example in Lifecycle Relations.
  • remote_side - used for self-referential relationships, indicates the column or list of columns that form the "remote side" of the relationship. See the examples in Self Referential Mappers.
  • secondary - for a many-to-many relationship, specifies the intermediary table. The secondary keyword argument should generally only be used for a table that is not otherwise expressed in any class mapping. In particular, using the Association Object Pattern is generally mutually exclusive against using the secondary keyword argument.
  • secondaryjoin - a ClauseElement that will be used as the join of an association table to the child object. By default, this value is computed based on the foreign key relationships of the association and child tables.
  • uselist=(True|False) - a boolean that indicates if this property should be loaded as a list or a scalar. In most cases, this value is determined automatically by relation(), based on the type and direction of the relationship - one to many forms a list, many to one forms a scalar, many to many is a list. If a scalar is desired where normally a list would be present, such as a bi-directional one-to-one relationship, set uselist to False.
  • viewonly=False - when set to True, the relation is used only for loading objects within the relationship, and has no effect on the unit-of-work flush process. Relations with viewonly can specify any kind of join conditions to provide additional views of related objects onto a parent object. Note that the functionality of a viewonly relationship has its limits - complicated join conditions may not compile into eager or lazy loaders properly. If this is the case, use an alternative method, such as those described in Working with Large Collections, Statement and Result-Set ORM Queries, or Mapping a Class against Arbitrary Selects.
back to section top

Controlling Ordering

By default, mappers will attempt to ORDER BY the "oid" column of a table, or the primary key column, when selecting rows. This can be modified in several ways.

The "order_by" parameter can be sent to a mapper, overriding the per-engine ordering if any. A value of None means that the mapper should not use any ordering. A non-None value, which can be a column, an asc or desc clause, or an array of either one, indicates the ORDER BY clause that should be added to all select queries:

# disable all ordering
mapper = mapper(User, users_table, order_by=None)

# order by a column
mapper = mapper(User, users_table, order_by=users_tableusers_table.c.user_id)

# order by multiple items
mapper = mapper(User, users_table, order_by=[users_table.c.user_id, desc(users_table.c.user_name)])

"order_by" can also be specified with queries, overriding all other per-engine/per-mapper orderings:

# order by a column
l = query.filter(users_table.c.user_name=='fred').order_by(users_table.c.user_id).all()

# order by multiple criterion
l = query.filter(users_table.c.user_name=='fred').order_by([users_table.c.user_id, desc(users_table.c.user_name)])

The "order_by" property can also be specified on a relation() which will control the ordering of the collection:

mapper(Address, addresses_table)

# order address objects by address id
mapper(User, users_table, properties = {
    'addresses' : relation(Address, order_by=addresses_table.c.address_id)
})

back to section top

Limiting Rows Combined with Eager Loads

As indicated in the docs on Query, you can limit rows using limit() and offset(). However, things get tricky when dealing with eager relationships, since a straight LIMIT of rows will interfere with the eagerly-loaded rows. So here is what SQLAlchemy will do when you use limit or offset with an eager relationship:

class User(object):
    pass
class Address(object):
    pass
    mapper(User, users_table, properties={
    'addresses' : relation(mapper(Address, addresses_table), lazy=False)
})
r = session.query(User).filter(User.c.user_name.like('F%')).limit(20).offset(10).all()
{opensql}SELECT users.user_id AS users_user_id, users.user_name AS users_user_name, 
users.password AS users_password, addresses.address_id AS addresses_address_id, 
addresses.user_id AS addresses_user_id, addresses.street AS addresses_street, 
addresses.city AS addresses_city, addresses.state AS addresses_state, 
addresses.zip AS addresses_zip 
FROM 
(SELECT users.user_id FROM users WHERE users.user_name LIKE %(users_user_name)s
ORDER BY users.oid LIMIT 20 OFFSET 10) AS rowcount, 
 users LEFT OUTER JOIN addresses ON users.user_id = addresses.user_id 
WHERE rowcount.user_id = users.user_id ORDER BY users.oid, addresses.oid
{'users_user_name': 'F%'}

The main WHERE clause as well as the limiting clauses are coerced into a subquery; this subquery represents the desired result of objects. A containing query, which handles the eager relationships, is joined against the subquery to produce the result. This is something to keep in mind as it's a complex query which may be problematic on databases with poor support for LIMIT, such as Oracle which does not support it natively.

back to section top

Mapping a Class with Table Inheritance

Inheritance in databases comes in three forms: single table inheritance, where several types of classes are stored in one table, concrete table inheritance, where each type of class is stored in its own table, and joined table inheritance, where the parent/child classes are stored in their own tables that are joined together in a select.

There is also the ability to load "polymorphically", which is that a single query loads objects of multiple types at once.

SQLAlchemy supports all three kinds of inheritance. Additionally, true "polymorphic" loading is supported in a straightfoward way for single table inheritance, and has some more manually-configured features that can make it happen for concrete and multiple table inheritance.

Working examples of polymorphic inheritance come with the distribution in the directory examples/polymorphic.

Here are the classes we will use to represent an inheritance relationship:

class Employee(object):
    def __init__(self, name):
        self.name = name
    def __repr__(self):
        return self.__class__.__name__ + " " + self.name

class Manager(Employee):
    def __init__(self, name, manager_data):
        self.name = name
        self.manager_data = manager_data
    def __repr__(self):
        return self.__class__.__name__ + " " + self.name + " " +  self.manager_data

class Engineer(Employee):
    def __init__(self, name, engineer_info):
        self.name = name
        self.engineer_info = engineer_info
    def __repr__(self):
        return self.__class__.__name__ + " " + self.name + " " +  self.engineer_info

Each class supports a common name attribute, while the Manager class has its own attribute manager_data and the Engineer class has its own attribute engineer_info.

Single Table Inheritance

This will support polymorphic loading via the Employee mapper.

employees_table = Table('employees', metadata, 
    Column('employee_id', Integer, primary_key=True),
    Column('name', String(50)),
    Column('manager_data', String(50)),
    Column('engineer_info', String(50)),
    Column('type', String(20))
)

employee_mapper = mapper(Employee, employees_table, polymorphic_on=employees_table.c.type)
manager_mapper = mapper(Manager, inherits=employee_mapper, polymorphic_identity='manager')
engineer_mapper = mapper(Engineer, inherits=employee_mapper, polymorphic_identity='engineer')
back to section top

Concrete Table Inheritance

Without polymorphic loading, you just define a separate mapper for each class.

Concrete Inheritance, Non-polymorphic
managers_table = Table('managers', metadata, 
    Column('employee_id', Integer, primary_key=True),
    Column('name', String(50)),
    Column('manager_data', String(50)),
)

engineers_table = Table('engineers', metadata, 
    Column('employee_id', Integer, primary_key=True),
    Column('name', String(50)),
    Column('engineer_info', String(50)),
)

manager_mapper = mapper(Manager, managers_table)
engineer_mapper = mapper(Engineer, engineers_table)

With polymorphic loading, the SQL query to do the actual polymorphic load must be constructed, usually as a UNION. There is a helper function to create these UNIONS called polymorphic_union.

Concrete Inheritance, Polymorphic
pjoin = polymorphic_union({
    'manager':managers_table,
    'engineer':engineers_table
}, 'type', 'pjoin')

employee_mapper = mapper(Employee, pjoin, polymorphic_on=pjoin.c.type)
manager_mapper = mapper(Manager, managers_table, inherits=employee_mapper, concrete=True, polymorphic_identity='manager')
engineer_mapper = mapper(Engineer, engineers_table, inherits=employee_mapper, concrete=True, polymorphic_identity='engineer')
back to section top

Joined Table Inheritance

Like concrete table inheritance, this can be done non-polymorphically, or with a little more complexity, polymorphically:

Multiple Table Inheritance, Non-polymorphic
employees = Table('employees', metadata, 
   Column('person_id', Integer, primary_key=True),
   Column('name', String(50)),
   Column('type', String(30)))

engineers = Table('engineers', metadata, 
   Column('person_id', Integer, ForeignKey('employees.person_id'), primary_key=True),
   Column('engineer_info', String(50)),
  )

managers = Table('managers', metadata, 
   Column('person_id', Integer, ForeignKey('employees.person_id'), primary_key=True),
   Column('manager_data', String(50)),
   )

person_mapper = mapper(Employee, employees)
mapper(Engineer, engineers, inherits=person_mapper)
mapper(Manager, managers, inherits=person_mapper)

Polymorphically, joined-table inheritance is easier than concrete, as a simple outer join can usually work:

Joined Table Inheritance, Polymorphic
person_join = people.outerjoin(engineers).outerjoin(managers)

person_mapper = mapper(Person, people, select_table=person_join,polymorphic_on=people.c.type, polymorphic_identity='person')
mapper(Engineer, engineers, inherits=person_mapper, polymorphic_identity='engineer')
mapper(Manager, managers, inherits=person_mapper, polymorphic_identity='manager')

In SQLAlchemy 0.4, the above mapper setup can load polymorphically without the join as well, by issuing distinct queries for each subclasses' table.

The join condition in a joined table inheritance structure can be specified explicitly, using inherit_condition:

AddressUser.mapper = mapper(
        AddressUser,
        addresses_table, inherits=User.mapper, 
        inherit_condition=users_table.c.user_id==addresses_table.c.user_id
    )
back to section top

Mapping a Class against Multiple Tables

Mappers can be constructed against arbitrary relational units (called Selectables) as well as plain Tables. For example, The join keyword from the SQL package creates a neat selectable unit comprised of multiple tables, complete with its own composite primary key, which can be passed in to a mapper as the table.

# a class
class AddressUser(object):
    pass

# define a Join
j = join(users_table, addresses_table)

# map to it - the identity of an AddressUser object will be 
# based on (user_id, address_id) since those are the primary keys involved
m = mapper(AddressUser, j, properties={
    'user_id':[users_table.c.user_id, addresses_table.c.user_id]
})

A second example:

# many-to-many join on an association table
j = join(users_table, userkeywords, 
        users_table.c.user_id==userkeywords.c.user_id).join(keywords, 
           userkeywords.c.keyword_id==keywords.c.keyword_id)

# a class 
class KeywordUser(object):
    pass

# map to it - the identity of a KeywordUser object will be
# (user_id, keyword_id) since those are the primary keys involved
m = mapper(KeywordUser, j, properties={
    'user_id':[users_table.c.user_id, userkeywords.c.user_id],
    'keyword_id':[userkeywords.c.keyword_id, keywords.c.keyword_id]
})

In both examples above, "composite" columns were added as properties to the mappers; these are aggregations of multiple columns into one mapper property, which instructs the mapper to keep both of those columns set at the same value.

back to section top

Mapping a Class against Arbitrary Selects

Similar to mapping against a join, a plain select() object can be used with a mapper as well. Below, an example select which contains two aggregate functions and a group_by is mapped to a class:

s = select([customers, 
            func.count(orders).label('order_count'), 
            func.max(orders.price).label('highest_order')],
            customers.c.customer_id==orders.c.customer_id,
            group_by=[c for c in customers.c]
            ).alias('somealias')
class Customer(object):
    pass

m = mapper(Customer, s)

Above, the "customers" table is joined against the "orders" table to produce a full row for each customer row, the total count of related rows in the "orders" table, and the highest price in the "orders" table, grouped against the full set of columns in the "customers" table. That query is then mapped against the Customer class. New instances of Customer will contain attributes for each column in the "customers" table as well as an "order_count" and "highest_order" attribute. Updates to the Customer object will only be reflected in the "customers" table and not the "orders" table. This is because the primary keys of the "orders" table are not represented in this mapper and therefore the table is not affected by save or delete operations.

back to section top

Multiple Mappers for One Class

The first mapper created for a certain class is known as that class's "primary mapper." Other mappers can be created as well, these come in two varieties.

example:

# primary mapper
mapper(User, users_table)

# make a secondary mapper to load User against a join
othermapper = mapper(User, users_table.join(someothertable), non_primary=True)

# select
result = session.query(othermapper).select()

example:

# primary mapper
mapper(User, users_table)

# make an entity name mapper that stores User objects in another table
mapper(User, alternate_users_table, entity_name='alt')

# make two User objects
user1 = User()
user2 = User()

# save one in in the "users" table
session.save(user1)

# save the other in the "alternate_users_table"
session.save(user2, entity_name='alt')

session.flush()

# select from the alternate mapper
session.query(User, entity_name='alt').select()
back to section top

Self Referential Mappers

A self-referential mapper is a mapper that is designed to operate with an adjacency list table. This is a table that contains one or more foreign keys back to itself, and is usually used to create hierarchical tree structures. SQLAlchemy's default model of saving items based on table dependencies is not sufficient in this case, as an adjacency list table introduces dependencies between individual rows. Fortunately, SQLAlchemy will automatically detect a self-referential mapper and do the extra lifting to make it work.

# define a self-referential table
trees = Table('treenodes', engine,
    Column('node_id', Integer, primary_key=True),
    Column('parent_node_id', Integer, ForeignKey('treenodes.node_id'), nullable=True),
    Column('node_name', String(50), nullable=False),
    )

# treenode class
class TreeNode(object):
    pass

# mapper defines "children" property, pointing back to TreeNode class,
# with the mapper unspecified.  it will point back to the primary 
# mapper on the TreeNode class.
TreeNode.mapper = mapper(TreeNode, trees, properties={
        'children' : relation(
                        TreeNode, 
                        cascade="all"
                     ),
        }
    )

This kind of mapper goes through a lot of extra effort when saving and deleting items, to determine the correct dependency graph of nodes within the tree.

A self-referential mapper where there is more than one relationship on the table requires that all join conditions be explicitly spelled out. Below is a self-referring table that contains a "parent_node_id" column to reference parent/child relationships, and a "root_node_id" column which points child nodes back to the ultimate root node:

# define a self-referential table with several relations
trees = Table('treenodes', engine,
    Column('node_id', Integer, primary_key=True),
    Column('parent_node_id', Integer, ForeignKey('treenodes.node_id'), nullable=True),
    Column('root_node_id', Integer, ForeignKey('treenodes.node_id'), nullable=True),
    Column('node_name', String(50), nullable=False),
    )

# treenode class
class TreeNode(object):
    pass

# define the "children" property as well as the "root" property
mapper(TreeNode, trees, properties={
        'children' : relation(
                        TreeNode, 
                        primaryjoin=trees.c.parent_node_id==trees.c.node_id
                        cascade="all",
                        backref=backref("parent", remote_side=[trees.c.node_id])
                     ),
        'root' : relation(
                TreeNode,
                primaryjoin=trees.c.root_node_id=trees.c.node_id, 
                remote_side=[trees.c.node_id],
                uselist=False
            )
        }
    )

The "root" property on a TreeNode is a many-to-one relationship. By default, a self-referential mapper declares relationships as one-to-many, so the extra parameter remote_side, pointing to a column or list of columns on the remote side of a relationship, is needed to indicate a "many-to-one" self-referring relationship (note the previous keyword argument foreignkey is deprecated). Both TreeNode examples above are available in functional form in the examples/adjacencytree directory of the distribution.

back to section top

Statement and Result-Set ORM Queries

Take any textual statement, constructed statement or result set and feed it into a Query to produce objects. Below, we define two class/mapper combinations, issue a SELECT statement, and send the result object to the method instances() method on Query:

class User(object):
    pass

class Address(object):
    pass

mapper(User, users_table)

mapper(Address, addresses_table)

# select users and addresses in one query
# use_labels is so that the user_id column in both tables are distinguished
s = select([users_table, addresses_table], users_table.c.user_id==addresses_table.c.user_id, use_labels=True)

# execute it, and process the results, asking for both User and Address objects
r = session.query(User, Address).instances(s.execute())

# result rows come back as tuples
for entry in r:
    user = r[0]
    address = r[1]

Alternatively, the from_statement() method may be used with either a textual string or SQL construct:

s = select([users_table, addresses_table], users_table.c.user_id==addresses_table.c.user_id, use_labels=True)

r = session.query(User, Address).from_statement(s).all()

for entry in r:
    user = r[0]
    address = r[1]

Combining Eager Loads with Statement/Result Set Queries

When full statement/result loads are used with Query, SQLAlchemy does not affect the SQL query itself, and therefore has no way of tacking on its own LEFT [OUTER] JOIN conditions that are normally used to eager load relationships. If the query being constructed is created in such a way that it returns rows not just from a parent table (or tables) but also returns rows from child tables, the result-set mapping can be notified as to which additional properties are contained within the result set. This is done using the contains_eager() query option, which specifies the name of the relationship to be eagerly loaded.

# mapping is the users->addresses mapping
mapper(User, users_table, properties={
    'addresses':relation(Address, addresses_table)
})

# define a query on USERS with an outer join to ADDRESSES
statement = users_table.outerjoin(addresses_table).select(use_labels=True)

# construct a Query object which expects the "addresses" results 
query = session.query(User).options(contains_eager('addresses'))

# get results normally
r = query.instances(statement.execute())

If the "eager" portion of the statement is "alisaed", the alias keyword argument to contains_eager() may be used to indicate it. This is a string alias name or reference to an actual Alias object:

# use an alias of the addresses table
adalias = addresses_table.alias('adalias')

# define a query on USERS with an outer join to adalias
statement = users_table.outerjoin(adalias).select(use_labels=True)

# construct a Query object which expects the "addresses" results 
query = session.query(User).options(contains_eager('addresses', alias=adalias))

# get results normally
sqlr = query.from_statement(query).all()

In the case that the main table itself is also aliased, the contains_alias() option can be used:

# define an aliased UNION called 'ulist'
statement = users.select(users.c.user_id==7).union(users.select(users.c.user_id>7)).alias('ulist')

# add on an eager load of "addresses"
statement = statement.outerjoin(addresses).select(use_labels=True)

# create query, indicating "ulist" is an alias for the main table, "addresses" property should
# be eager loaded
query = create_session().query(User).options(contains_alias('ulist'), contains_eager('addresses'))

# results
r = query.instances(statement.execute())
back to section top

Mapper Keyword Arguments

Keyword arguments which can be used with the mapper() function. For arguments to relation(), see Relation Options.

back to section top

Extending Mapper

Mappers can have functionality augmented or replaced at many points in its execution via the usage of the MapperExtension class. This class is just a series of "hooks" where various functionality takes place. An application can make its own MapperExtension objects, overriding only the methods it needs. Methods that are not overridden return the special value sqlalchemy.orm.mapper.EXT_PASS, which indicates the operation should proceed as normally.

class MapperExtension(object):
    """base implementation for an object that provides overriding behavior to various
    Mapper functions.  For each method in MapperExtension, a result of EXT_PASS indicates
    the functionality is not overridden."""
    def get_session(self):
        """called to retrieve a contextual Session instance with which to
        register a new object. Note: this is not called if a session is 
        provided with the __init__ params (i.e. _sa_session)"""
        return EXT_PASS
    def select_by(self, query, *args, **kwargs):
        """overrides the select_by method of the Query object"""
        return EXT_PASS
    def select(self, query, *args, **kwargs):
        """overrides the select method of the Query object"""
        return EXT_PASS
    def create_instance(self, mapper, selectcontext, row, class_):
        """called when a new object instance is about to be created from a row.  
        the method can choose to create the instance itself, or it can return 
        None to indicate normal object creation should take place.

        mapper - the mapper doing the operation

        selectcontext - SelectionContext corresponding to the instances() call

        row - the result row from the database

        class_ - the class we are mapping.
        """
        return EXT_PASS
    def append_result(self, mapper, selectcontext, row, instance, identitykey, result, isnew):
        """called when an object instance is being appended to a result list.

        If this method returns EXT_PASS, it is assumed that the mapper should do the appending, else
        if this method returns any other value or None, it is assumed that the append was handled by this method.

        mapper - the mapper doing the operation

        selectcontext - SelectionContext corresponding to the instances() call

        row - the result row from the database

        instance - the object instance to be appended to the result

        identitykey - the identity key of the instance

        result - list to which results are being appended

        isnew - indicates if this is the first time we have seen this object instance in the current result
        set.  if you are selecting from a join, such as an eager load, you might see the same object instance
        many times in the same result set.
        """
        return EXT_PASS
    def populate_instance(self, mapper, selectcontext, row, instance, identitykey, isnew):
        """called right before the mapper, after creating an instance from a row, passes the row
        to its MapperProperty objects which are responsible for populating the object's attributes.
        If this method returns EXT_PASS, it is assumed that the mapper should do the appending, else
        if this method returns any other value or None, it is assumed that the append was handled by this method.

        Essentially, this method is used to have a different mapper populate the object:

            def populate_instance(self, mapper, selectcontext, instance, row, identitykey, isnew):
                othermapper.populate_instance(selectcontext, instance, row, identitykey, isnew, frommapper=mapper)
                return True
        """
        return EXT_PASS
    def before_insert(self, mapper, connection, instance):
        """called before an object instance is INSERTed into its table.

        this is a good place to set up primary key values and such that arent handled otherwise."""
        return EXT_PASS
    def before_update(self, mapper, connection, instance):
        """called before an object instance is UPDATED"""
        return EXT_PASS
    def after_update(self, mapper, connection, instance):
        """called after an object instance is UPDATED"""
        return EXT_PASS
    def after_insert(self, mapper, connection, instance):
        """called after an object instance has been INSERTed"""
        return EXT_PASS
    def before_delete(self, mapper, connection, instance):
        """called before an object instance is DELETEed"""
        return EXT_PASS
    def after_delete(self, mapper, connection, instance):
        """called after an object instance is DELETEed"""
        return EXT_PASS

To use MapperExtension, make your own subclass of it and just send it off to a mapper:

m = mapper(User, users_table, extension=MyExtension())

Multiple extensions will be chained together and processed in order; they are specified as a list:

m = mapper(User, users_table, extension=[ext1, ext2, ext3])
back to section top

The package sqlalchemy.types defines the datatype identifiers which may be used when defining metadata. This package includes a set of generic types, a set of SQL-specific subclasses of those types, and a small extension system used by specific database connectors to adapt these generic types into database-specific type objects.

Built-in Types

SQLAlchemy comes with a set of standard generic datatypes, which are defined as classes.

The standard set of generic types are:

package sqlalchemy.types
class String(TypeEngine):
    def __init__(self, length=None)

class Integer(TypeEngine)

class SmallInteger(Integer)

class Numeric(TypeEngine): 
    def __init__(self, precision=10, length=2)

class Float(Numeric):
    def __init__(self, precision=10)

# DateTime, Date and Time types deal with datetime objects from the Python datetime module
class DateTime(TypeEngine)

class Date(TypeEngine)

class Time(TypeEngine)

class Binary(TypeEngine): 
    def __init__(self, length=None)

class Boolean(TypeEngine)

# converts unicode strings to raw bytes
# as bind params, raw bytes to unicode as 
# rowset values, using the unicode encoding 
# setting on the engine (defaults to 'utf-8')
class Unicode(TypeDecorator):
    impl = String

# uses the pickle protocol to serialize data
# in/out of Binary columns
class PickleType(TypeDecorator):
    impl = Binary

More specific subclasses of these types are available, which various database engines may choose to implement specifically, allowing finer grained control over types:

class FLOAT(Numeric)
class TEXT(String)
class DECIMAL(Numeric)
class INT(Integer)
INTEGER = INT
class TIMESTAMP(DateTime)
class DATETIME(DateTime)
class CLOB(String)
class VARCHAR(String)
class CHAR(String)
class BLOB(Binary)
class BOOLEAN(Boolean)

When using a specific database engine, these types are adapted even further via a set of database-specific subclasses defined by the database engine. There may eventually be more type objects that are defined for specific databases. An example of this would be Postgres' Array type.

Type objects are specified to table meta data using either the class itself, or an instance of the class. Creating an instance of the class allows you to specify parameters for the type, such as string length, numerical precision, etc.:

mytable = Table('mytable', engine, 
    # define type using a class
    Column('my_id', Integer, primary_key=True), 

    # define type using an object instance
    Column('value', Number(7,4)) 
)
back to section top

Dialect Specific Types

Each dialect has its own set of types, many of which are available only within that dialect. For example, MySQL has a BigInteger type and Postgres has an Inet type. To use these, import them from the module explicitly:

from sqlalchemy.databases.mysql import MSEnum, MSBigInteger

table = Table('foo', meta,
    Column('enumerates', MSEnum('a', 'b', 'c')),
    Column('id', MSBigInteger)
)

Or some postgres types:

from sqlalchemy.databases.postgres import PGInet, PGArray

table = Table('foo', meta,
    Column('ipaddress', PGInet),
    Column('elements', PGArray(str))   # PGArray is available in 0.4, and takes a type argument
    )
back to section top

Creating your Own Types

User-defined types can be created, to support either database-specific types, or customized pre-processing of query parameters as well as post-processing of result set data. You can make your own classes to perform these operations. To augment the behavior of a TypeEngine type, such as String, the TypeDecorator class is used:

import sqlalchemy.types as types

class MyType(types.TypeDecorator):
    """basic type that decorates String, prefixes values with "PREFIX:" on 
    the way in and strips it off on the way out."""
    impl = types.String
    def convert_bind_param(self, value, engine):
        return "PREFIX:" + value
    def convert_result_value(self, value, engine):
        return value[7:]

The PickleType class is an instance of TypeDecorator already and can be subclassed directly.

To build a type object from scratch, which will not have a corresponding database-specific implementation, subclass TypeEngine:

import sqlalchemy.types as types

class MyType(types.TypeEngine):
    def __init__(self, precision = 8):
        self.precision = precision
    def get_col_spec(self):
        return "MYTYPE(%s)" % self.precision
    def convert_bind_param(self, value, engine):
        return value
    def convert_result_value(self, value, engine):
        return value

Once you make your type, its immediately useable:

table = Table('foo', meta,
    Column('id', Integer, primary_key=True),
    Column('data', MyType(16))
    )
back to section top

This section describes the connection pool module of SQLAlchemy. The Pool object it provides is normally embedded within an Engine instance. For most cases, explicit access to the pool module is not required. However, the Pool object can be used on its own, without the rest of SA, to manage DBAPI connections; this section describes that usage. Also, this section will describe in more detail how to customize the pooling strategy used by an Engine.

At the base of any database helper library is a system of efficiently acquiring connections to the database. Since the establishment of a database connection is typically a somewhat expensive operation, an application needs a way to get at database connections repeatedly without incurring the full overhead each time. Particularly for server-side web applications, a connection pool is the standard way to maintain a "pool" of database connections which are used over and over again among many requests. Connection pools typically are configured to maintain a certain "size", which represents how many connections can be used simultaneously without resorting to creating more newly-established connections.

Establishing a Transparent Connection Pool

Any DBAPI module can be "proxied" through the connection pool using the following technique (note that the usage of 'psycopg2' is just an example; substitute whatever DBAPI module you'd like):

import sqlalchemy.pool as pool
import psycopg2 as psycopg
psycopg = pool.manage(psycopg)

# then connect normally
connection = psycopg.connect(database='test', username='scott', password='tiger')

This produces a sqlalchemy.pool.DBProxy object which supports the same connect() function as the original DBAPI module. Upon connection, a connection proxy object is returned, which delegates its calls to a real DBAPI connection object. This connection object is stored persistently within a connection pool (an instance of sqlalchemy.pool.Pool) that corresponds to the exact connection arguments sent to the connect() function.

The connection proxy supports all of the methods on the original connection object, most of which are proxied via __getattr__(). The close() method will return the connection to the pool, and the cursor() method will return a proxied cursor object. Both the connection proxy and the cursor proxy will also return the underlying connection to the pool after they have both been garbage collected, which is detected via the __del__() method.

Additionally, when connections are returned to the pool, a rollback() is issued on the connection unconditionally. This is to release any locks still held by the connection that may have resulted from normal activity.

By default, the connect() method will return the same connection that is already checked out in the current thread. This allows a particular connection to be used in a given thread without needing to pass it around between functions. To disable this behavior, specify use_threadlocal=False to the manage() function.

back to section top

Connection Pool Configuration

For all types of Pool construction, which includes the "transparent proxy" described in the previous section, using an Engine via create_engine(), or constructing a pool through direct class instantiation, the options are generally the same. Additional options may be available based on the specific subclass of Pool being used.

For a description of all pool classes, see the generated documentation.

Common options include:

QueuePool options include:

back to section top

Custom Pool Construction

Besides using the transparent proxy, instances of sqlalchemy.pool.Pool can be created directly. Constructing your own pool involves passing a callable used to create a connection. Through this method, custom connection schemes can be made, such as a connection that automatically executes some initialization commands to start.

Constructing a QueuePool
import sqlalchemy.pool as pool
import psycopg2

def getconn():
    c = psycopg2.connect(username='ed', host='127.0.0.1', dbname='test')
    # execute an initialization function on the connection before returning
    c.cursor.execute("setup_encodings()")
    return c

p = pool.QueuePool(getconn, max_overflow=10, pool_size=5, use_threadlocal=True)

Or with SingletonThreadPool:

Constructing a SingletonThreadPool
import sqlalchemy.pool as pool
import sqlite

def getconn():
    return sqlite.connect(filename='myfile.db')

# SQLite connections require the SingletonThreadPool    
p = pool.SingletonThreadPool(getconn)
back to section top

SQLAlchemy has a variety of extensions and "mods" available which provide extra functionality to SA, either via explicit usage or by augmenting the core behavior. Several of these extensions are designed to work together.

SessionContext

Author: Daniel Miller

This plugin is used to instantiate and manage Session objects. It is the preferred way to provide thread-local session functionality to an application. It provides several services:

Using the SessionContext in its most basic form involves just instantiating a SessionContext:

import sqlalchemy
from sqlalchemy.ext.sessioncontext import SessionContext

ctx = SessionContext(sqlalchemy.create_session)

class User(object):
    pass

mapper(User, users_table)
u = User()

# the contextual session is referenced by the "current" property on SessionContext
ctx.current.save(u)
ctx.current.flush()

From this example, one might see that the SessionContext's typical scope is at the module or application level. Since the Session itself is better suited to be used in per-user-request or even per-function scope, the SessionContext provides an easy way to manage the scope of those Session objects.

The construction of each Session instance can be customized by providing a "creation function" which returns a new Session. A common customization is a Session which needs to explicitly bind to a particular Engine:

import sqlalchemy
from sqlalchemy.ext.sessioncontext import SessionContext

# create an engine
someengine = sqlalchemy.create_engine('sqlite:///')

# a function to return a Session bound to our engine
def make_session():
    return sqlalchemy.create_session(bind_to=someengine)

# SessionContext
ctx = SessionContext(make_session)

# get the session bound to engine "someengine":
session = ctx.current

The above pattern is more succinctly expressed using Python lambdas:

ctx = SessionContext(lambda:sqlalchemy.create_session(bind_to=someengine))

The default creation function is simply:

ctx = SessionContext(sqlalchemy.create_session)

The "scope" to which the session is associated, which by default is a thread-local scope, can be customized by providing a "scope callable" which returns a hashable key that represents the current scope:

import sqlalchemy
from sqlalchemy.ext.sessioncontext import SessionContext

# global declaration of "scope"
scope = "scope1"

# a function to return the current "session scope"
def global_scope_func():
    return scope

# create SessionContext with a custom "scopefunc"
ctx = SessionContext(sqlalchemy.create_session, scopefunc=global_scope_func)

# get the session corresponding to "scope1":
session = ctx.current

# switch the "scope"
scope = "scope2"

# get the session corresponding to "scope2":
session = ctx.current

Examples of customized scope can include user-specific sessions or requests, or even sub-elements of an application, such as a graphical application which maintains a single Session per application window (this was the original motivation to create SessionContext).

Using SessionContextExt

This is a MapperExtension which allows a Mapper to be automatically associated with a SessionContext. Newly constructed objects get save()d to the session automatically, and Query objects can be constructed without a session. The instance of SessionContextExt is provided by the SessionContext itself:

import sqlalchemy
from sqlalchemy.ext.sessioncontext import SessionContext

ctx = SessionContext(sqlalchemy.create_session)

class User(object):
    pass

mapper(User, users_table, extension=ctx.mapper_extension)

# 'u' is automatically added to the current session of 'ctx'
u = User()

assert u in ctx.current

# get the current session and flush
ctx.current.flush()

The MapperExtension can be configured either per-mapper as above, or on an application-wide basis using:

import sqlalchemy
from sqlalchemy.orm.mapper import global_extensions
from sqlalchemy.ext.sessioncontext import SessionContext

ctx = SessionContext(sqlalchemy.create_session)

global_extensions.append(ctx.mapper_extension)

SessionContextExt allows Query objects to be created against the mapped class without specifying a Session. Each Query will automatically make usage of the current contextual session:

# create a Query from a class
query = Query(User)

# specify entity name
query = Query(User, entity_name='foo')

# create a Query from a mapper
query = Query(mapper)

# then use it
result = query.select()

When installed globally, all Mapper objects will contain a built-in association to the SessionContext. This means that once a mapped instance is created, creating a new Session and calling save() with the instance as an argument will raise an error stating that the instance is already associated with a different session. While you can always remove the object from its original session, SessionContextExt is probably convenient only for an application that does not need much explicit manipulation of sessions.

The user still has some control over which session gets used at instance construction time. An instance can be redirected at construction time to a different Session by specifying the keyword parameter _sa_session to its constructor, which is decorated by the mapper:

session = create_session()  # create a new session distinct from the contextual session
myuser = User(_sa_session=session)  # make a new User that is saved to this session

Similarly, the entity_name parameter, which specifies an alternate Mapper to be used when attaching this instance to the Session, can be specified via _sa_entity_name:

myuser = User(_sa_session=session, _sa_entity_name='altentity')

The decoration of mapped instances' __init__() method is similar to this example:

oldinit = class_.__init__   # the previous init method
def __init__(self, *args, **kwargs):
    session = kwargs.pop('_sa_session', None)
    entity_name = kwargs.pop('_sa_entity_name', None)
    if session is None:
        session = ext.get_session() # get Session from this Mapper's MapperExtension
        if session is EXT_PASS:
            session = None
    if session is not None:
        session.save(self, entity_name=entity_name)  # attach to the current session
    oldinit(self, *args, **kwagrs)   # call previous init method
back to section top

SelectResults

Author: Jonas Borgström

NOTE: As of verison 0.3.6 of SQLAlchemy, most behavior of SelectResults has been rolled into the base Query object. Explicit usage of SelectResults is therefore no longer needed.

SelectResults gives transformative behavior to the results returned from the select and select_by methods of Query.

from sqlalchemy.ext.selectresults import SelectResults

query = session.query(MyClass)
res = SelectResults(query)

res = res.filter(table.c.column == "something") # adds a WHERE clause (or appends to the existing via "and")
res = res.order_by([table.c.column]) # adds an ORDER BY clause

for x in res[:10]:  # Fetch and print the top ten instances - adds OFFSET 0 LIMIT 10 or equivalent
  print x.column2

# evaluate as a list, which executes the query
x = list(res)

# Count how many instances that have column2 > 42
# and column == "something"
print res.filter(table.c.column2 > 42).count()

# select() is a synonym for filter()
session.query(MyClass).select(mytable.c.column=="something").order_by([mytable.c.column])[2:7]

An important facet of SelectResults is that the actual SQL execution does not occur until the object is used in a list or iterator context. This means you can call any number of transformative methods (including filter, order_by, list range expressions, etc) before any SQL is actually issued.

Configuration of SelectResults may be per-Query, per Mapper, or per application:

from sqlalchemy.ext.selectresults import SelectResults, SelectResultsExt

# construct a SelectResults for an individual Query
sel = SelectResults(session.query(MyClass))

# construct a Mapper where the Query.select()/select_by() methods will return a SelectResults:
mapper(MyClass, mytable, extension=SelectResultsExt())

# globally configure all Mappers to return SelectResults, using the "selectresults" mod
import sqlalchemy.mods.selectresults

SelectResults greatly enhances querying and is highly recommended. For example, heres an example of constructing a query using a combination of joins and outerjoins:

mapper(User, users_table, properties={
    'orders':relation(mapper(Order, orders_table, properties={
        'items':relation(mapper(Item, items_table))
    }))
})
session = create_session()
query = SelectResults(session.query(User))

result = query.outerjoin_to('orders').outerjoin_to('items').select(or_(Order.c.order_id==None,Item.c.item_id==2))

For a full listing of methods, see the generated documentation.

back to section top

assignmapper

Author: Mike Bayer

This extension is used to decorate a mapped class with direct knowledge about its own Mapper, a contextual Session, as well as functions provided by the Query and Session objects. The methods will automatically make usage of a contextual session with which all newly constructed objects are associated. assign_mapper operates as a MapperExtension, and requires the usage of a SessionContext as well as SessionContextExt, described in SessionContext. It replaces the usage of the normal mapper function with its own version that adds a SessionContext specified as the first argument:

import sqlalchemy
from sqlalchemy.ext.sessioncontext import SessionContext
from sqlalchemy.ext.assignmapper import assign_mapper

# session context
ctx = SessionContext(sqlalchemy.create_session)

# assign mapper to class MyClass using table 'sometable', getting
# Sessions from 'ctx'.
assign_mapper(ctx, MyClass, sometable, properties={...}, ...)

Above, all new instances of MyClass will be associated with the contextual session, ctx.current. Additionally, MyClass and instances of MyClass now contain a large set of methods including get, select, flush, delete. The full list is as follows:

# Query methods:
['get', 'select', 'select_by', 'selectone', 'get_by', 'join_to', 'join_via', 'count', 'count_by']

# Session methods:
['flush', 'delete', 'expire', 'refresh', 'expunge', 'merge', 'save', 'update', 'save_or_update']

To continue the MyClass example:

# create a MyClass.  it will be automatically assigned to the contextual Session.
mc = MyClass()

# save MyClass - this will call flush() on the session, specifying 'mc' as the only
# object to be affected
mc.flush()

# load an object, using Query methods attached to MyClass
result = MyClass.get_by(id=5)

# delete it
result.delete()

# commit all changes
ctx.current.flush()

Note: : while the flush() method is also available on individual object instances, the instance-local flush() does not flush dependent objects. For this reason this method may be removed in a future release and replaced with a more explicit version.

back to section top

associationproxy

Author: Mike Bayer and Jason Kirtland

Version: 0.3.1 or greater

associationproxy is used to create a transparent proxy to the associated object in an association relationship, thereby decreasing the verbosity of the pattern in cases where explicit access to the association object is not required. The association relationship pattern is a richer form of a many-to-many relationship, which is described in datamapping_association. It is strongly recommended to fully understand the association object pattern in its explicit form before using this extension; see the examples in the SQLAlchemy distribution under the directory examples/association/.

When dealing with association relationships, the association object refers to the object that maps to a row in the association table (i.e. the many-to-many table), while the associated object refers to the "endpoint" of the association, i.e. the ultimate object referenced by the parent. The proxy can return collections of objects attached to association objects, and can also create new association objects given only the associated object. An example using the Keyword mapping described in the data mapping documentation is as follows:

from sqlalchemy.ext.associationproxy import association_proxy

class User(object):
    pass

class Keyword(object):
    def __init__(self, name):
        self.keyword_name = name

class Article(object):
    # create "keywords" proxied association.
    # the collection is called 'keyword_associations', the endpoint
    # attribute of each association object is called 'keyword'.  the 
    # class itself of the association object will be figured out automatically  .
    keywords = association_proxy('keyword_associations', 'keyword')

class KeywordAssociation(object):
    pass

# create mappers normally
# note that we set up 'keyword_associations' on Article,
# and 'keyword' on KeywordAssociation.
mapper(Article, articles_table, properties={
    'keyword_associations':relation(KeywordAssociation, lazy=False, cascade="all, delete-orphan")
    }
)
mapper(KeywordAssociation, itemkeywords_table,
    primary_key=[itemkeywords_table.c.article_id, itemkeywords_table.c.keyword_id],
    properties={
        'keyword' : relation(Keyword, lazy=False), 
        'user' : relation(User, lazy=False) 
    }
)
mapper(User, users_table)
mapper(Keyword, keywords_table)

# now, Keywords can be attached to an Article directly;
# KeywordAssociation will be created by the association_proxy, and have the 
# 'keyword' attribute set to the new Keyword.
# note that these KeywordAssociation objects will not have a User attached to them.
article = Article()
article.keywords.append(Keyword('blue'))
article.keywords.append(Keyword('red'))
session.save(article)
session.flush()

# the "keywords" collection also returns the underlying Keyword objects
article = session.query(Article).get_by(id=12)
for k in article.keywords:
    print "Keyword:", k.keyword_name

# the original 'keyword_associations' relation exists normally with no awareness of the proxy
article.keyword_associations.append(KeywordAssociation())
print [ka for ka in article.keyword_associations]

Note that the above operations on the keywords collection are proxying operations to and from the keyword_associations collection, which exists normally and can be accessed directly. association_proxy will also detect if the collection is list or scalar based and will configure the proxied property to act the same way.

For the common case where the association object's creation needs to be specified by the application, association_proxy takes an optional callable creator() which takes a single associated object as an argument, and returns a new association object.

def create_keyword_association(keyword):
    ka = KeywordAssociation()
    ka.keyword = keyword
    return ka

class Article(object):
    # create "keywords" proxied association
    keywords = association_proxy('keyword_associations', 'keyword', creator=create_keyword_association)

Proxy properties are implemented by the AssociationProxy class, which is also available in the module. The association_proxy function is not present in SQLAlchemy versions 0.3.1 through 0.3.7, instead instantiate the class directly:

from sqlalchemy.ext.associationproxy import AssociationProxy

class Article(object):
   keywords = AssociationProxy('keyword_associations', 'keyword')
back to section top

orderinglist

Author: Jason Kirtland

orderinglist is a helper for mutable ordered relations. It will intercept list operations performed on a relation collection and automatically synchronize changes in list position with an attribute on the related objects. (See Custom List Classes for more information on the general pattern.)

Example: Two tables that store slides in a presentation. Each slide has a number of bullet points, displayed in order by the 'position' column on the bullets table. These bullets can be inserted and re-ordered by your end users, and you need to update the 'position' column of all affected rows when changes are made.

slides_table = Table('Slides', metadata,
                     Column('id', Integer, primary_key=True),
                     Column('name', String))

bullets_table = Table('Bullets', metadata,
                      Column('id', Integer, primary_key=True),
                      Column('slide_id', Integer, ForeignKey('Slides.id')),
                      Column('position', Integer),
                      Column('text', String))

 class Slide(object):
     pass
 class Bullet(object):
     pass

 mapper(Slide, slides_table, properties={
       'bullets': relation(Bullet, order_by=[bullets_table.c.position])
 })
 mapper(Bullet, bullets_table)

The standard relation mapping will produce a list-like attribute on each Slide containing all related Bullets, but coping with changes in ordering is totally your responsibility. If you insert a Bullet into that list, there is no magic- it won't have a position attribute unless you assign it it one, and you'll need to manually renumber all the subsequent Bullets in the list to accommodate the insert.

An orderinglist can automate this and manage the 'position' attribute on all related bullets for you.

 
mapper(Slide, slides_table, properties={
'bullets': relation(Bullet,
                    collection_class=ordering_list('position'),
                    order_by=[bullets_table.c.position])
})
mapper(Bullet, bullets_table)

s = Slide()
s.bullets.append(Bullet())
s.bullets.append(Bullet())
s.bullets[1].position
>>> 1
s.bullets.insert(1, Bullet())
s.bullets[2].position
>>> 2

Use the ordering_list function to set up the collection_class on relations (as in the mapper example above). This implementation depends on the list starting in the proper order, so be SURE to put an order_by on your relation.

ordering_list takes the name of the related object's ordering attribute as an argument. By default, the zero-based integer index of the object's position in the ordering_list is synchronized with the ordering attribute: index 0 will get position 0, index 1 position 1, etc. To start numbering at 1 or some other integer, provide count_from=1.

Ordering values are not limited to incrementing integers. Almost any scheme can implemented by supplying a custom ordering_func that maps a Python list index to any value you require. See the module documentation for more information, and also check out the unit tests for examples of stepped numbering, alphabetical and Fibonacci numbering.

back to section top

threadlocal

Author: Mike Bayer and Daniel Miller

threadlocal is an extension that was created primarily to provide backwards compatibility with the older SQLAlchemy 0.1 series. It uses three features which SQLAlchemy 0.2 and above provide as distinct features: SessionContext, assign_mapper, and the TLEngine, which is the Engine used with the threadlocal create_engine() strategy. It is strongly recommended that these three features are understood individually before using threadlocal.

In SQLAlchemy 0.1, users never dealt with explcit connections and didn't have a very explicit Session interface, instead relying upon a more magical global object called objectstore. The objectstore idea was wildly popular with about half of SA's users, and completely unpopular with the other half. The threadlocal mod basically brings back objectstore, which is in fact just a SessionContext where you can call Session methods directly off of it, instead of saying context.current. For threadlocal to faithfully produce 0.1 behavior, it is invoked as a mod which globally installs the objectstore's mapper extension, such that all Mappers will automatically assign all new instances of mapped classes to the objectstore's contextual Session. Additionally, it also changes the default engine strategy used by create_engine to be the "threadlocal" strategy, which in normal practice does not affect much.

When you import threadlocal, what you get is:

So an important point to understand is, don't use the threadlocal mod unless you explcitly are looking for that behavior. Unfortunately, the easy import of the "threadlocal" mod has found its way into several tutorials on external websites, which produces application-wide behavior that is in conflict with the SQLAlchemy tutorial and data mapping documentation.

While "threadlocal" is only about 10 lines of code, it is strongly advised that users instead make usage of SessionContext and assign_mapper explictly to eliminate confusion. Additionally, the "threadlocal" strategy on create_engine() also exists primarily to provide patterns used in 0.1 and is probably not worth using either, unless you specifically need those patterns.

Basic usage of threadlocal involves importing the mod, before any usage of the sqlalchemy namespace, since threadlocal is going to add the "objectstore" and "assign_mapper" keywords to "sqlalchemy".

To use objectstore:

import sqlalchemy.mods.threadlocal
from sqlalchemy import *

metadata = MetaData('sqlite:///')
user_table = Table('users', metadata,
    Column('user_id', Integer, primary_key=True),
    Column('user_name', String(50), nullable=False)
)

class User(object):
    pass
mapper(User, user_table)

# "user" object is added to the session automatically
user = User()

# flush the contextual session
objectstore.flush()

The actual Session is available as:

objectstore.get_session()

To use assign_mapper:

import sqlalchemy.mods.threadlocal
from sqlalchemy import *

metadata = MetaData('sqlite:///')
user_table = Table('users', metadata,
    Column('user_id', Integer, primary_key=True),
    Column('user_name', String(50), nullable=False)
)

class User(object):
    pass

# note that no "context" object is needed
assign_mapper(User, user_table)

# call up a user
user = User.selectfirst(user_table.c.user_id==7)

# call 'delete' on the user
user.delete()

# flush
objectstore.flush()
back to section top

ActiveMapper

Author: Jonathan LaCour

Please note that ActiveMapper has been deprecated in favor of Elixir, a more comprehensive solution to declarative mapping, of which Jonathan is a co-author.

ActiveMapper is a so-called "declarative layer" which allows the construction of a class, a Table, and a Mapper all in one step:

class Person(ActiveMapper):
    class mapping:
        id          = column(Integer, primary_key=True)
        full_name   = column(String)
        first_name  = column(String)
        middle_name = column(String)
        last_name   = column(String)
        birth_date  = column(DateTime)
        ssn         = column(String)
        gender      = column(String)
        home_phone  = column(String)
        cell_phone  = column(String)
        work_phone  = column(String)
        prefs_id    = column(Integer, foreign_key=ForeignKey('preferences.id'))
        addresses   = one_to_many('Address', colname='person_id', backref='person')
        preferences = one_to_one('Preferences', colname='pref_id', backref='person')

    def __str__(self):
        s =  '%s\n' % self.full_name
        s += '  * birthdate: %s\n' % (self.birth_date or 'not provided')
        s += '  * fave color: %s\n' % (self.preferences.favorite_color or 'Unknown')
        s += '  * personality: %s\n' % (self.preferences.personality_type or 'Unknown')

        for address in self.addresses:
            s += '  * address: %s\n' % address.address_1
            s += '             %s, %s %s\n' % (address.city, address.state, address.postal_code)

        return s

class Preferences(ActiveMapper):
    class mapping:
        __table__        = 'preferences'
        id               = column(Integer, primary_key=True)
        favorite_color   = column(String)
        personality_type = column(String)

class Address(ActiveMapper):
    class mapping:
        id          = column(Integer, primary_key=True)
        type        = column(String)
        address_1   = column(String)
        city        = column(String)
        state       = column(String)
        postal_code = column(String)
        person_id   = column(Integer, foreign_key=ForeignKey('person.id'))

More discussion on ActiveMapper can be found at Jonathan LaCour's Blog as well as the SQLAlchemy Wiki.

back to section top

SqlSoup

Author: Jonathan Ellis

SqlSoup creates mapped classes on the fly from tables, which are automatically reflected from the database based on name. It is essentially a nicer version of the "row data gateway" pattern.

>>> from sqlalchemy.ext.sqlsoup import SqlSoup
>>> soup = SqlSoup('sqlite:///')

>>> db.users.select(order_by=[db.users.c.name])
[MappedUsers(name='Bhargan Basepair',email='basepair@example.edu',password='basepair',classname=None,admin=1),
 MappedUsers(name='Joe Student',email='student@example.edu',password='student',classname=None,admin=0)]

Full SqlSoup documentation is on the SQLAlchemy Wiki.

back to section top

ProxyEngine

Author: Jason Pellerin

The ProxyEngine is used to "wrap" an Engine, and via subclassing ProxyEngine one can instrument the functionality of an arbitrary Engine instance through the decorator pattern. It also provides a connect() method which will send all Engine requests to different underlying engines. Its functionality in that regard is largely superceded now by MetaData which is a better solution.

from sqlalchemy.ext.proxy import ProxyEngine
proxy = ProxyEngine()

proxy.connect('postgres://user:pw@host/db')
back to section top