Data mapping describes the process of defining Mapper objects, which associate table metadata with user-defined classes.
The Mapper
's role is to perform SQL operations upon the database, associating individual table rows with instances of those classes, and individual database columns with properties upon those instances, to transparently associate in-memory objects with a persistent database representation.
When a Mapper
is created to associate a Table
object with a class, all of the columns defined in the Table
object are associated with the class via property accessors, which add overriding functionality to the normal process of setting and getting object attributes. These property accessors keep track of changes to object attributes; these changes will be stored to the database when the application "flushes" the current state of objects (known as a Unit of Work).
Two objects provide the primary interface for interacting with Mappers and the "unit of work" in SA 0.2, which are the Query
object and the Session
object. Query
deals with selecting objects from the database, whereas Session
provides a context for loaded objects and the ability to communicate changes on those objects back to the database.
The primary method on Query
for loading objects is its select()
method, which has similar arguments to a sqlalchemy.sql.Select
object. But this select method executes automatically and returns results, instead of awaiting an execute() call. Instead of returning a cursor-like object, it returns an array of objects.
The three configurational elements to be defined, i.e. the Table
metadata, the user-defined class, and the Mapper
, are typically defined as module-level variables, and may be defined in any fashion suitable to the application, with the only requirement being that the class and table metadata are described before the mapper. For the sake of example, we will be defining these elements close together, but this should not be construed as a requirement; since SQLAlchemy is not a framework, those decisions are left to the developer or an external framework.
Also, keep in mind that the examples in this section deal with explicit Session
objects mapped directly to Engine
objects, which represents the most explicit style of using the ORM. Options exist for how this is configured, including binding Table
objects directly to Engines
(described in Binding MetaData to an Engine), as well as using the "Threadlocal" plugin which provides various code shortcuts by using an implicit Session associated to the current thread (described in threadlocal).
First, the metadata/mapper configuration code:
from sqlalchemy import * # metadata meta = MetaData() # table object users_table = Table('users', meta, Column('user_id', Integer, primary_key=True), Column('user_name', String(16)), Column('password', String(20)) ) # class definition class User(object): pass # create a mapper and associate it with the User class. # technically we dont really need the 'usermapper' variable. usermapper = mapper(User, users_table)
Note that no database definitions are required. Next we will define an Engine
and connect a Session
to it, and perform a simple select:
The method session.query(class_or_mapper)
returns a Query
object. Below is a synopsis of things you can do with Query
:
# get a query from a Session based on class: query = session.query(User) # get a query from a Session given a Mapper: query = session.query(usermapper) # select_by, which takes keyword arguments. the # keyword arguments represent property names and the values # represent values which will be compared via the = operator. # the comparisons are joined together via "AND". result = query.select_by(name='john', street='123 green street') # select_by can also combine ClauseElements with key/value properties. # all ClauseElements and keyword-based criterion are combined together # via "AND". result = query.select_by(users_table.c.user_name=='john', addresses_table.c.zip_code=='12345', street='123 green street') # get_by, which takes the same arguments as select_by # returns a single scalar result or None if no results user = query.get_by(id=12) # "dynamic" versions of select_by and get_by - everything past the # "select_by_" or "get_by_" is used as the key, and the function argument # as the value result = query.select_by_name('fred') u = query.get_by_name('fred') # get an object directly from its primary key. this will bypass the SQL # call if the object has already been loaded u = query.get(15) # get an object that has a composite primary key of three columns. # the order of the arguments matches that of the table meta data. myobj = query.get((27, 3, 'receipts')) # using a WHERE criterion result = query.select(or_(users_table.c.user_name == 'john', users_table.c.user_name=='fred')) # using a WHERE criterion to get a scalar u = query.selectfirst(users_table.c.user_name=='john') # selectone() is a stricter version of selectfirst() which # will raise an exception if there is not exactly one row u = query.selectone(users_table.c.user_name=='john') # using a full select object result = query.select(users_table.select(users_table.c.user_name=='john'))
Some of the above examples above illustrate the usage of the mapper's Table object to provide the columns for a WHERE Clause. These columns are also accessible off of the mapped class directly. When a mapper is assigned to a class, it also attaches a special property accessor c
to the class itself, which can be used just like the table metadata to access the columns of the table:
userlist = session.query(User).select(User.c.user_id==12)
When objects corresponding to mapped classes are created or manipulated, all changes are logged by the Session
object. The changes are then written to the database when an application calls flush()
. This pattern is known as a Unit of Work, and has many advantages over saving individual objects or attributes on those objects with individual method invocations. Domain models can be built with far greater complexity with no concern over the order of saves and deletes, excessive database round-trips and write operations, or deadlocking issues. The flush()
operation batches its SQL statements into a transaction, and can also perform optimistic concurrency checks (using a version id column) to insure the proper number of rows were in fact affected (not supported with the current MySQL drivers).
The Unit of Work is a powerful tool, and has some important concepts that should be understood in order to use it effectively. See the Session / Unit of Work section for a full description on all its operations.
When a mapper is created, the target class has its mapped properties decorated by specialized property accessors that track changes. New objects by default must be explicitly added to the Session
, however this can be made automatic by using threadlocal or SessionContext.
mapper(User, users_table) # create a new User myuser = User() myuser.user_name = 'jane' myuser.password = 'hello123' # create another new User myuser2 = User() myuser2.user_name = 'ed' myuser2.password = 'lalalala' # create a Session and save them sess = create_session() sess.save(myuser) sess.save(myuser2) # load a third User from the database sqlmyuser3 = sess.query(User).select(User.c.user_name=='fred')[0]
myuser3.user_name = 'fredjones' # save all changes sqlsession.flush()
The mapped class can also specify whatever methods and/or constructor it wants:
class User(object): def __init__(self, user_name, password): self.user_id = None self.user_name = user_name self.password = password def get_name(self): return self.user_name def __repr__(self): return "User id %s name %s password %s" % (repr(self.user_id), repr(self.user_name), repr(self.password)) mapper(User, users_table) sess = create_session() u = User('john', 'foo') sess.save(u) sqlsession.flush()
>>> u User id 1 name 'john' password 'foo'
SQLAlchemy will only put modified object attributes columns into the UPDATE statements generated upon flush. This is to conserve database traffic and also to successfully interact with a "deferred" attribute, which is a mapped object attribute against the mapper's primary table that isnt loaded until referenced by the application.
So that covers how to map the columns in a table to an object, how to load objects, create new ones, and save changes. The next step is how to define an object's relationships to other database-persisted objects. This is done via the relation
function provided by the orm
module.
So with our User class, lets also define the User has having one or more mailing addresses. First, the table metadata:
from sqlalchemy import * metadata = MetaData() # define user table users_table = Table('users', metadata, Column('user_id', Integer, primary_key=True), Column('user_name', String(16)), Column('password', String(20)) ) # define user address table addresses_table = Table('addresses', metadata, Column('address_id', Integer, primary_key=True), Column('user_id', Integer, ForeignKey("users.user_id")), Column('street', String(100)), Column('city', String(80)), Column('state', String(2)), Column('zip', String(10)) )
Of importance here is the addresses table's definition of a foreign key relationship to the users table, relating the user_id column into a parent-child relationship. When a Mapper
wants to indicate a relation of one object to another, the ForeignKey
relationships are the default method by which the relationship is determined (options also exist to describe the relationships explicitly).
So then lets define two classes, the familiar User
class, as well as an Address
class:
class User(object): def __init__(self, user_name, password): self.user_name = user_name self.password = password class Address(object): def __init__(self, street, city, state, zip): self.street = street self.city = city self.state = state self.zip = zip
And then a Mapper
that will define a relationship of the User
and the Address
classes to each other as well as their table metadata. We will add an additional mapper keyword argument properties
which is a dictionary relating the names of class attributes to database relationships, in this case a relation
object against a newly defined mapper for the Address class:
mapper(Address, addresses_table) mapper(User, users_table, properties = { 'addresses' : relation(Address) } )
Lets do some operations with these classes and see what happens:
engine = create_engine('sqlite:///mydb.db') metadata.create_all(engine) session = create_session(bind_to=engine) u = User('jane', 'hihilala') u.addresses.append(Address('123 anywhere street', 'big city', 'UT', '76543')) u.addresses.append(Address('1 Park Place', 'some other city', 'OK', '83923')) session.save(u) session.flush()
A lot just happened there! The Mapper
figured out how to relate rows in the addresses table to the users table, and also upon flush had to determine the proper order in which to insert rows. After the insert, all the User
and Address
objects have their new primary and foreign key attributes populated.
Also notice that when we created a Mapper
on the User
class which defined an addresses
relation, the newly created User
instance magically had an "addresses" attribute which behaved like a list. This list is in reality a property function which returns an instance of sqlalchemy.util.HistoryArraySet
. This object fulfills the full set of Python list accessors, but maintains a unique set of objects (based on their in-memory identity), and also tracks additions and deletions to the list:
del u.addresses[1] u.addresses.append(Address('27 New Place', 'Houston', 'TX', '34839')) session.flush()
Note that when creating a relation with the relation()
function, the target can either be a class, in which case the primary mapper for that class is used as the target, or a Mapper
instance itself, as returned by the mapper()
function.
In the previous example, a single address was removed from the addresses
attribute of a User
object, resulting in the corresponding database row being updated to have a user_id of None
. But now, theres a mailing address with no user_id floating around in the database of no use to anyone. How can we avoid this ? This is acheived by using the cascade
parameter of relation
:
clear_mappers() # clear mappers from the previous example mapper(Address, addresses_table) mapper(User, users_table, properties = { 'addresses' : relation(Address, cascade="all, delete-orphan") } ) del u.addresses[1] u.addresses.append(Address('27 New Place', 'Houston', 'TX', '34839')) session.flush()
In this case, with the delete-orphan
cascade rule set, the element that was removed from the addresses list was also removed from the database. Specifying cascade="all, delete-orphan"
means that every persistence operation performed on the parent object will be cascaded to the child object or objects handled by the relation, and additionally that each child object cannot exist without being attached to a parent. Such a relationship indicates that the lifecycle of the Address
objects are bounded by that of their parent User
object.
Cascading is described fully in Cascade rules.
By creating relations with the backref
keyword, a bi-directional relationship can be created which will keep both ends of the relationship updated automatically, independently of database operations. Below, the User
mapper is created with an addresses
property, and the corresponding Address
mapper receives a "backreference" to the User
object via the property name user
:
Address = mapper(Address, addresses_table) User = mapper(User, users_table, properties = { 'addresses' : relation(Address, backref='user') } ) u = User('fred', 'hi') a1 = Address('123 anywhere street', 'big city', 'UT', '76543') a2 = Address('1 Park Place', 'some other city', 'OK', '83923') # append a1 to u u.addresses.append(a1) # attach u to a2 a2.user = u # the bi-directional relation is maintained >>> u.addresses == [a1, a2] True >>> a1.user is user and a2.user is user True
The backreference feature also works with many-to-many relationships, which are described later. When creating a backreference, a corresponding property (i.e. a second relation()
) is placed on the child mapper. The default arguments to this property can be overridden using the backref()
function:
mapper(User, users_table) mapper(Address, addresses_table, properties={ 'user':relation(User, backref=backref('addresses', cascade="all, delete-orphan")) })
The backref()
function is often used to set up a bi-directional one-to-one relationship. This is because the relation()
function by default creates a "one-to-many" relationship when presented with a primary key/foreign key relationship, but the backref()
function can redefine the uselist
property to make it a scalar:
mapper(User, users_table) mapper(Address, addresses_table, properties={ 'user' : relation(User, backref=backref('address', uselist=False)) })
We've seen how the relation
specifier affects the saving of an object and its child items, how does it affect selecting them? By default, the relation keyword indicates that the related property should be attached a lazy loader when instances of the parent object are loaded from the database; this is just a callable function that when accessed will invoke a second SQL query to load the child objects of the parent.
# define a mapper mapper(User, users_table, properties = { 'addresses' : relation(mapper(Address, addresses_table)) }) # select users where username is 'jane', get the first element of the list # this will incur a load operation for the parent table sqluser = session.query(User).select(User.c.user_name=='jane')[0]
# iterate through the User object's addresses. this will incur an # immediate load of those child items sqlfor a in user.addresses:
print repr(a)
For mappers that have relationships, the select_by
method of the Query
object can create queries that include automatically created joins. Just specify a key in the argument list which is not present in the primary mapper's list of properties or columns, but is present in the property list of one of its relationships:
sqll = session.query(User).select_by(street='123 Green Street')
The above example is shorthand for:
l = session.query(User).select(and_( Address.c.user_id==User.c.user_id, Address.c.street=='123 Green Street') )
All keyword arguments sent to select_by
are used to create query criterion. This means that familiar select
keyword options like order_by
and limit
are not directly available. To enable these options with select_by
, you can try the SelectResults extension which offers methods off the result of a select
or select_by
such as order_by()
and array slicing functions that generate new queries.
Also, select_by
will not create joins derived from Column
-based expressions (i.e. ClauseElement
objects); the reason is that a Column
-based expression may include many columns, and select_by
has no way to know which columns in the expression correspond to properties and which don't (it also prefers not to dig into column expressions which may be very complex). The next section describes some ways to combine Column
expressions with select_by
's auto-joining capabilities.
Feature Status: Alpha API
The join_to
method of Query
is a component of the select_by
operation, and is given a keyname in order to return a "join path" from the Query's mapper to the mapper which is referenced by a relation()
of the given name:
>>> q = session.query(User) >>> j = q.join_to('addresses') >>> print j users.user_id=addresses.user_id
join_to
can also be given the name of a column-based property, in which case it will locate a path to the nearest mapper which has that property as a column:
>>> q = session.query(User) >>> j = q.join_to('street') >>> print j users.user_id=addresses.user_id
Also available is the join_via
function, which is similar to join_to
, except instead of traversing through all properties to find a path to the given key, its given an explicit path to the target property:
>>> q = session.query(User) >>> j = q.join_via(['orders', 'items']) >>> print j users.c.user_id==orders.c.user_id AND orders.c.item_id==items.c.item_id
Expressions produced by join_to
and join_via
can be used with select
to create more complicated query criterion across multiple relations:
>>> l = q.select( (addresses_table.c.street=='some address') & (items_table.c.item_name=='item #4') & q.join_to('addresses') & q.join_via(['orders', 'items']) )
With just a single parameter lazy=False
specified to the relation object, the parent and child SQL queries can be joined together.
mapper(Address, addresses_table) mapper(User, users_table, properties = { 'addresses' : relation(Address, lazy=False) } ) sqlusers = session.query(User).select(User.c.user_name=='Jane')
for u in users: print repr(u) for a in u.addresses: print repr(a)
Above, a pretty ambitious query is generated just by specifying that the User should be loaded with its child Addresses in one query. When the mapper processes the results, it uses an Identity Map to keep track of objects that were already loaded, based on their primary key identity. Through this method, the redundant rows produced by the join are organized into the distinct object instances they represent.
The generation of this query is also immune to the effects of additional joins being specified in the original query. To use our select_by example above, joining against the "addresses" table to locate users with a certain street results in this behavior:
sqlusers = session.query(User).select_by(street='123 Green Street')
The join implied by passing the "street" parameter is stated as an additional join between the addresses
and users
tables. Also, since the eager join is "aliasized", no name conflict occurs.
The options
method on the Query
object provides an easy way to get alternate forms of a mapper query from an original one. The most common use of this feature is to change the "eager/lazy" loading behavior of a particular mapper, via the functions eagerload()
, lazyload()
and noload()
:
# user mapper with lazy addresses mapper(User, users_table, properties = { 'addresses' : relation(mapper(Address, addresses_table)) } ) # query object query = session.query(User) # make an eager loading query eagerquery = query.options(eagerload('addresses')) u = eagerquery.select() # make another query that wont load the addresses at all plainquery = query.options(noload('addresses')) # multiple options can be specified myquery = oldquery.options(lazyload('tracker'), noload('streets'), eagerload('members')) # to specify a relation on a relation, separate the property names by a "." myquery = oldquery.options(eagerload('orders.items'))
The above examples focused on the "one-to-many" relationship. To do other forms of relationship is easy, as the relation
function can usually figure out what you want:
metadata = MetaData() # a table to store a user's preferences for a site prefs_table = Table('user_prefs', metadata, Column('pref_id', Integer, primary_key = True), Column('stylename', String(20)), Column('save_password', Boolean, nullable = False), Column('timezone', CHAR(3), nullable = False) ) # user table with a 'preference_id' column users_table = Table('users', metadata, Column('user_id', Integer, primary_key = True), Column('user_name', String(16), nullable = False), Column('password', String(20), nullable = False), Column('preference_id', Integer, ForeignKey("user_prefs.pref_id")) ) # engine and some test data engine = create_engine('sqlite:///', echo=True) metadata.create_all(engine) engine.execute(prefs_table.insert(), dict(pref_id=1, stylename='green', save_password=1, timezone='EST')) engine.execute(users_table.insert(), dict(user_name = 'fred', password='45nfss', preference_id=1)) # classes class User(object): def __init__(self, user_name, password): self.user_name = user_name self.password = password class UserPrefs(object): pass mapper(UserPrefs, prefs_table) mapper(User, users_table, properties = dict( preferences = relation(UserPrefs, lazy=False, cascade="all, delete-orphan"), )) # select session = create_session(bind_to=engine) sqluser = session.query(User).get_by(user_name='fred')
save_password = user.preferences.save_password # modify user.preferences.stylename = 'bluesteel' # flush sqlsession.flush()
The relation
function handles a basic many-to-many relationship when you specify the association table:
metadata = MetaData() articles_table = Table('articles', metadata, Column('article_id', Integer, primary_key = True), Column('headline', String(150), key='headline'), Column('body', TEXT, key='body'), ) keywords_table = Table('keywords', metadata, Column('keyword_id', Integer, primary_key = True), Column('keyword_name', String(50)) ) itemkeywords_table = Table('article_keywords', metadata, Column('article_id', Integer, ForeignKey("articles.article_id")), Column('keyword_id', Integer, ForeignKey("keywords.keyword_id")) ) engine = create_engine('sqlite:///') metadata.create_all(engine) # class definitions class Keyword(object): def __init__(self, name): self.keyword_name = name class Article(object): pass mapper(Keyword, keywords_table) # define a mapper that does many-to-many on the 'itemkeywords' association # table mapper(Article, articles_table, properties = dict( keywords = relation(Keyword, secondary=itemkeywords_table, lazy=False) ) ) session = create_session(bind_to=engine) article = Article() article.headline = 'a headline' article.body = 'this is the body' article.keywords.append(Keyword('politics')) article.keywords.append(Keyword('entertainment')) session.save(article) sqlsession.flush()
# select articles based on a keyword. select_by will handle the extra joins. sqlarticles = session.query(Article).select_by(keyword_name='politics')
a = articles[0] # clear out keywords with a new list a.keywords = [] a.keywords.append(Keyword('topstories')) a.keywords.append(Keyword('government')) # flush sqlsession.flush()
Many to Many can also be done with an association object, that adds additional information about how two items are related. This association object is set up in basically the same way as any other mapped object. However, since an association table typically has no primary key columns, you have to tell the mapper what columns will compose its "primary key", which are the two (or more) columns involved in the association. Also, the relation function needs an additional hint as to the fact that this mapped object is an association object, via the "association" argument which points to the class or mapper representing the other side of the association.
from sqlalchemy import * metadata = MetaData() users_table = Table('users', metadata, Column('user_id', Integer, primary_key = True), Column('user_name', String(16), nullable = False), ) articles_table = Table('articles', metadata, Column('article_id', Integer, primary_key = True), Column('headline', String(150), key='headline'), Column('body', TEXT, key='body'), ) keywords_table = Table('keywords', metadata, Column('keyword_id', Integer, primary_key = True), Column('keyword_name', String(50)) ) # add "attached_by" column which will reference the user who attached this keyword itemkeywords_table = Table('article_keywords', metadata, Column('article_id', Integer, ForeignKey("articles.article_id")), Column('keyword_id', Integer, ForeignKey("keywords.keyword_id")), Column('attached_by', Integer, ForeignKey("users.user_id")) ) engine = create_engine('sqlite:///', echo=True) metadata.create_all(engine) # class definitions class User(object): pass class Keyword(object): def __init__(self, name): self.keyword_name = name class Article(object): pass class KeywordAssociation(object): pass mapper(User, users_table) mapper(Keyword, keywords_table) # mapper for KeywordAssociation # specify "primary key" columns manually mapper(KeywordAssociation, itemkeywords_table, primary_key = [itemkeywords_table.c.article_id, itemkeywords_table.c.keyword_id], properties={ 'keyword' : relation(Keyword, lazy = False), 'user' : relation(User, lazy = False) } ) # Article mapper, relates to Keyword via KeywordAssociation mapper(Article, articles_table, properties={ 'keywords':relation(KeywordAssociation, lazy=False, association=Keyword) } ) session = create_session(bind_to=engine) # select by keyword sqlalist = session.query(Article).select_by(keyword_name='jacks_stories')
# user is available for a in alist: for k in a.keywords: if k.keyword.name == 'jacks_stories': print k.user.user_name
Keep in mind that the association object works a little differently from a plain many-to-many relationship. Members have to be added to the list via instances of the association object, which in turn point to the associated object:
user = User() user.user_name = 'some user' article = Article() assoc = KeywordAssociation() assoc.keyword = Keyword('blue') assoc.user = user assoc2 = KeywordAssociation() assoc2.keyword = Keyword('green') assoc2.user = user article.keywords.append(assoc) article.keywords.append(assoc2) session.flush()