Data mapping describes the process of defining Mapper
objects, which associate table metadata with user-defined classes.
When a Mapper
is created to associate a Table
object with a class, all of the columns defined in the Table
object are associated with the class via property accessors, which add overriding functionality to the normal process of setting and getting object attributes. These property accessors keep track of changes to object attributes; these changes will be stored to the database when the application "flushes" the current state of objects. This pattern is called a Unit of Work pattern.
Starting with a Table
definition and a minimal class construct, the two are associated with each other via the mapper()
function [api], which generates an object called a Mapper
. SA associates the class and all instances of that class with this particular Mapper
, which is then stored in a global registry.
from sqlalchemy import * # metadata meta = MetaData() # table object users_table = Table('users', meta, Column('user_id', Integer, primary_key=True), Column('user_name', String(16)), Column('fullname', String(100)), Column('password', String(20)) ) # class definition class User(object): pass # create a mapper and associate it with the User class. mapper(User, users_table)
Thats all for configuration. Next, we will create an Engine
and bind it to a Session
, which represents a local collection of mapped objects to be operated upon.
# engine engine = create_engine("sqlite://mydb.db") # session session = create_session(bind=engine)
The session
represents a "workspace" which can load objects and persist changes to the database. Note also that the bind
parameter is optional; if the underlying Table
objects are bound as described in Binding MetaData to an Engine or Connection, it's not needed. A Session
[doc] [api] is best created as local to a particular set of related data operations, such as scoped within a function call, or within a single application request cycle. Next we illustrate a rudimental query which will load a single object instance. We will modify one of its attributes and persist the change back to the database.
# select sqluser = session.query(User).filter_by(user_name='fred').first()
# modify user.user_name = 'fred jones' # flush - saves everything that changed # within the scope of our Session sqlsession.flush()
Things to note from the above include that the loaded User
object has an attribute named user_name
on it, which corresponds to the user_name
column in users_table
; this attribute was configured at the class level by the Mapper
, as part of it's post-initialization process (this process occurs normally when the mapper is first used). Our modify operation on this attribute caused the object to be marked as "dirty", which was picked up automatically within the subsequent flush()
process. The flush()
is the point at which all changes to objects within the Session
are persisted to the database, and the User
object is no longer marked as "dirty" until it is again modified.
The method session.query(class_or_mapper)
returns a Query
object [api]. Query
implements methods which are used to produce and execute select statements tailored for loading object instances. It returns object instances in all cases; usually as a list, but in some cases scalar objects, or lists of tuples which contain multiple kinds of objects and sometimes individual scalar values.
A Query
is created from the Session
, relative to a particular class we wish to load.
# get a query from a Session based on class: query = session.query(User)
Alternatively, an actual Mapper
instance can be specified instead of a class:
# locate the mapper corresponding to the User class usermapper = class_mapper(User) # create query against the User mapper query = session.query(usermapper)
A query which joins across multiple tables may also be used to request multiple entities, such as:
query = session.query(User, Address)
Once we have a query, we can start loading objects. The methods filter()
and filter_by()
handle narrowing results, and the methods all()
, one()
, and first()
exist to return all, exactly one, or the first result of the total set of results. Note that all methods are generative, meaning that on each call that doesn't return results, you get a new Query
instance.
The filter_by()
method works with keyword arguments, which are combined together via AND:
sqlresult = session.query(User).filter_by(name='john', fullname='John Smith').all()
Whereas filter()
works with constructed SQL expressions, i.e. those described in Constructing SQL Queries via Python Expressions:
sqlresult = session.query(User).filter(users_table.c.name=='john').all()
Sometimes, constructing SQL via expressions can be cumbersome. For quick SQL expression, the filter()
method can also accomodate straight text:
sqlresult = session.query(User).filter("user_id>224").all()
When using text, bind parameters can be specified the same way as in a text()
clause, using a colon. To specify the bind parameter values, use the params()
method:
sqlresult = session.query(User).filter("user_id>:value").params(value=224).all()
Multiple filter()
and filter_by()
expressions may be combined together. The resulting statement groups them using AND.
result = session.query(User).filter(users_table.c.user_id>224).filter_by(name='john'). sql filter(users.c.fullname=='John Smith').all()
filter_by()
's keyword arguments can also take mapped object instances as comparison arguments. We'll illustrate this later when we talk about object relationships.
Note that all conjunctions are available explicitly, such as and_()
and or_()
, when using filter()
:
result = session.query(User).filter( and_(users_table.c.user_id>224, or_(users_table.c.name=='john', users_table.c.name=='ed')) ).all()
Its also straightforward to use an entirely string-based statement, using from_statement()
; just ensure that the columns clause of the statement contains the column names normally used by the mapper (here illustrated using an asterisk):
sqlresult = session.query(User).from_statement("SELECT * FROM users").all()
from_statement()
can also accomodate select()
constructs:
result = session.query(User).from_statement( select([users], users.c.name<'e', having=users.c.name==func.max(users.c.name), group_by=[c for c in users.c]) sql ).all()
Any set of filtered criterion (or no criterion) can be distilled into a count of rows using count()
:
sqlnum = session.query(Users).filter(users_table.c.user_id>224).count()
Rows are limited and offset using limit()
and offset()
:
sqlresult = session.query(User).limit(20).offset(5).all()
And ordering is applied, using Column
objects and related SQL constructs, with order_by()
:
sqlresult = session.query(User).order_by(desc(users_table.c.user_name)).all()
The first()
and one()
methods will also limit rows, and both will return a single object, instead of a list. In the case of first()
, rows are limited to just one, and the result is returned as a scalar. In the case of one()
, rows are limited to two; however, only one is returned. If two rows are matched, an exception is raised.
# load the first result user = session.query(User).first() # load exactly *one* result - if more than one result matches, an exception is raised user = session.query(User).filter_by(name='jack').one()
The Query
, when evaluated as an iterator, executes results immediately, using whatever state has been built up:
sqlresult = list(session.query(User))
Array indexes and slices work too, adding the corresponding LIMIT and OFFSET clauses:
sqlresult = list(session.query(User)[1:3])
A scalar index returns a scalar result immediately:
sqluser = session.query(User)[2]
Theres also a way to combine scalar results with objects, using add_column()
. This is often used for functions and aggregates. When add_column()
(or its cousin add_entity()
, described later) is used, tuples are returned:
result = session.query(User).add_column(func.max(users_table.c.name)).group_by([c for c in users_table.c]).all() for r in result: print "user:", r[0] print "max name:", r[1]
Later in this chapter, we'll discuss how to configure relations between mapped classes. Once that's done, we'll discuss how to use table joins in Querying with Joins.
The get()
method loads a single instance, given the primary key value of the desired entity:
# load user with primary key 15 user = query.get(15)
The get()
method, because it has the actual primary key value of the instance, can return an already-loaded instance from the Session
without performing any SQL. It is the only result-returning method on Query
that does not issue SQL to the database in all cases.
To issue a composite primary key to get()
, use a tuple. The order of the arguments matches that of the primary key columns of the table:
myobj = query.get((27, 3, 'receipts'))
Another special method on Query
is load()
. This method has the same signature as get()
, except it always refreshes the returned instance with the latest data from the database. This is in fact a unique behavior, since as we will see in the Session / Unit of Work chapter, most Query
methods do not reload the contents of instances which are already present in the session.
Some of the above examples above illustrate the usage of the mapper's Table object to provide the columns for a WHERE Clause. These columns are also accessible off of the mapped class directly. When a mapper is assigned to a class, it also attaches a special property accessor c
to the class itself, which can be used just like that of a Table
object to access the columns of the table:
userlist = session.query(User).filter(User.c.user_id==12).first()
In version 0.4 of SQLAlchemy, the "c" prefix will no longer be needed.
back to section topWhen objects corresponding to mapped classes are created or manipulated, all changes are logged by the Session
object. The changes are then written to the database when an application calls flush()
. This pattern is known as a Unit of Work, and has many advantages over saving individual objects or attributes on those objects with individual method invocations. Domain models can be built with far greater complexity with no concern over the order of saves and deletes, excessive database round-trips and write operations, or deadlocking issues. The flush()
operation batches its SQL statements into a transaction, and can also perform optimistic concurrency checks (using a version id column) to ensure the proper number of rows were in fact affected.
The Unit of Work is a powerful tool, and has some important concepts that should be understood in order to use it effectively. See the Session / Unit of Work section for a full description on all its operations.
When a mapper is created, the target class has its mapped properties decorated by specialized property accessors that track changes. New objects by default must be explicitly added to the Session
using the save()
method:
mapper(User, users_table) # create a new User myuser = User() myuser.user_name = 'jane' myuser.password = 'hello123' # create another new User myuser2 = User() myuser2.user_name = 'ed' myuser2.password = 'lalalala' # create a Session and save them sess = create_session() sess.save(myuser) sess.save(myuser2) # load a third User from the database sqlmyuser3 = sess.query(User).filter_by(name='fred').all()[0]
myuser3.user_name = 'fredjones' # save all changes sqlsession.flush()
The mapped class can also specify whatever methods and/or constructor it wants:
class User(object): def __init__(self, user_name, password): self.user_id = None self.user_name = user_name self.password = password def get_name(self): return self.user_name def __repr__(self): return "User id %s name %s password %s" % (repr(self.user_id), repr(self.user_name), repr(self.password)) mapper(User, users_table) sess = create_session() u = User('john', 'foo') sess.save(u) sqlsession.flush()
>>> u User id 1 name 'john' password 'foo'
Note that the __init__() method is not called when the instance is loaded. This is so that classes can define operations that are specific to their initial construction which are not re-called when the object is restored from the database, and is similar in concept to how Python's pickle
module calls __new__()
when deserializing instances. To allow __init__()
to be called at object load time, or to define any other sort of on-load operation, create a MapperExtension
which supplies the create_instance()
method (see Extending Mapper, as well as the example in the FAQ).
So that covers how to map the columns in a table to an object, how to load objects, create new ones, and save changes. The next step is how to define an object's relationships to other database-persisted objects. This is done via the relation
function [doc][api] provided by the orm
module.
With our User class, lets also define the User has having one or more mailing addresses. First, the table metadata:
from sqlalchemy import * metadata = MetaData() # define user table users_table = Table('users', metadata, Column('user_id', Integer, primary_key=True), Column('user_name', String(16)), Column('password', String(20)) ) # define user address table addresses_table = Table('addresses', metadata, Column('address_id', Integer, primary_key=True), Column('user_id', Integer, ForeignKey("users.user_id")), Column('street', String(100)), Column('city', String(80)), Column('state', String(2)), Column('zip', String(10)) )
Of importance here is the addresses table's definition of a foreign key relationship to the users table, relating the user_id column into a parent-child relationship. When a Mapper
wants to indicate a relation of one object to another, the ForeignKey
relationships are the default method by which the relationship is determined (options also exist to describe the relationships explicitly).
So then lets define two classes, the familiar User
class, as well as an Address
class:
class User(object): def __init__(self, user_name, password): self.user_name = user_name self.password = password class Address(object): def __init__(self, street, city, state, zip): self.street = street self.city = city self.state = state self.zip = zip
And then a Mapper
that will define a relationship of the User
and the Address
classes to each other as well as their table metadata. We will add an additional mapper keyword argument properties
which is a dictionary relating the names of class attributes to database relationships, in this case a relation
object against a newly defined mapper for the Address class:
mapper(Address, addresses_table) mapper(User, users_table, properties = { 'addresses' : relation(Address) } )
Lets do some operations with these classes and see what happens:
engine = create_engine('sqlite:///mydb.db') # create tables metadata.create_all(engine) session = create_session(bind=engine) u = User('jane', 'hihilala') u.addresses.append(Address('123 anywhere street', 'big city', 'UT', '76543')) u.addresses.append(Address('1 Park Place', 'some other city', 'OK', '83923')) session.save(u) session.flush()
A lot just happened there! The Mapper
figured out how to relate rows in the addresses table to the users table, and also upon flush had to determine the proper order in which to insert rows. After the insert, all the User
and Address
objects have their new primary and foreign key attributes populated.
Also notice that when we created a Mapper
on the User
class which defined an addresses
relation, the newly created User
instance magically had an "addresses" attribute which behaved like a list. This list is in reality a Python property
which will return an instance of sqlalchemy.orm.attributes.InstrumentedList
. This is a generic collection-bearing object which can represent lists, sets, dictionaries, or any user-defined collection class. By default it represents a list:
del u.addresses[1] u.addresses.append(Address('27 New Place', 'Houston', 'TX', '34839')) session.flush()
Note that when creating a relation with the relation()
function, the target can either be a class, in which case the primary mapper for that class is used as the target, or a Mapper
instance itself, as returned by the mapper()
function.
In the previous example, a single address was removed from the addresses
attribute of a User
object, resulting in the corresponding database row being updated to have a user_id of None
. But now, theres a mailing address with no user_id floating around in the database of no use to anyone. How can we avoid this ? This is acheived by using the cascade
parameter of relation
:
session.clear() # clear session clear_mappers() # clear mappers from the previous example mapper(Address, addresses_table) mapper(User, users_table, properties = { 'addresses' : relation(Address, cascade="all, delete-orphan") } ) # reload the user u = session.query(User).get(u.user_id) del u.addresses[1] u.addresses.append(Address('27 New Place', 'Houston', 'TX', '34839')) session.flush()
In this case, with the delete-orphan
cascade rule set, the element that was removed from the addresses list was also removed from the database. Specifying cascade="all, delete-orphan"
means that every persistence operation performed on the parent object will be cascaded to the child object or objects handled by the relation, and additionally that each child object cannot exist without being attached to a parent. Such a relationship indicates that the lifecycle of the Address
objects are bounded by that of their parent User
object.
Cascading is described fully in Cascade rules.
back to section topBy creating relations with the backref
keyword, a bi-directional relationship can be created which will keep both ends of the relationship updated automatically, independently of database operations. Below, the User
mapper is created with an addresses
property, and the corresponding Address
mapper receives a "backreference" to the User
object via the property name user
:
Address = mapper(Address, addresses_table) User = mapper(User, users_table, properties = { 'addresses' : relation(Address, backref='user') } ) u = User('fred', 'hi') a1 = Address('123 anywhere street', 'big city', 'UT', '76543') a2 = Address('1 Park Place', 'some other city', 'OK', '83923') # append a1 to u u.addresses.append(a1) # attach u to a2 a2.user = u # the bi-directional relation is maintained >>> u.addresses == [a1, a2] True >>> a1.user is user and a2.user is user True
The backreference feature also works with many-to-many relationships, which are described later. When creating a backreference, a corresponding property (i.e. a second relation()
) is placed on the child mapper. The default arguments to this property can be overridden using the backref()
function:
mapper(User, users_table) mapper(Address, addresses_table, properties={ 'user':relation(User, backref=backref('addresses', cascade="all, delete-orphan")) })
The backref()
function is often used to set up a bi-directional one-to-one relationship. This is because the relation()
function by default creates a "one-to-many" relationship when presented with a primary key/foreign key relationship, but the backref()
function can redefine the uselist
property to make it a scalar:
mapper(User, users_table) mapper(Address, addresses_table, properties={ 'user' : relation(User, backref=backref('address', uselist=False)) })
When using mappers that have relationships to other mappers, the need to specify query criterion across multiple tables arises. SQLAlchemy provides several core techniques which offer this functionality.
One way is just to build up the join criterion yourself. This is easy to do using filter()
:
sqll = session.query(User).filter(users.c.user_id==addresses.c.user_id). filter(addresses.c.street=='123 Green Street').all()
Above, we specified selection criterion that included columns from both the users
and the addresses
table. Note that in this case, we had to specify not just the matching condition to the street
column on addresses
, but also the join condition between the users
and addresses
table. If we didn't do that, we'd get a cartesian product of both tables. The Query
object never "guesses" what kind of join you'd like to use, but makes it easy using the join()
method which we'll get to in a moment.
A way to specify joins very explicitly, using the SQL join()
construct, is possible via the select_from()
method on Query
:
sqll = session.query(User).select_from(users_table.join(addresses_table)). filter(addresses_table.c.street=='123 Green Street').all()
But the easiest way to join is automatically, using the join()
method on Query
. Just give this method the path from A to B, using the name of a mapped relationship directly:
sqll = session.query(User).join('addresses'). filter(addresses_table.c.street=='123 Green Street').all()
Each time the join()
is called on Query
, the joinpoint of the query is moved to be that of the endpoint of the join. As above, when we joined from users_table
to addresses_table
, all subsequent criterion used by filter_by()
are against the addresses
table. If we wanted to filter back on the starting table again, we can use the reset_joinpoint()
function:
l = session.query(User).join('addresses'). filter_by(street='123 Green Street'). reset_joinpoint().filter_by(user_name='ed').all()
With reset_joinpoint()
, we can also issue new join()
s which will start back from the root table.
In all cases, we can get the User
and the matching Address
objects back at the same time, by telling the session we want both. This returns the results as a list of tuples:
result = session.query(User, Address).join('addresses'). filter(addresses_table.c.street=='123 Green Street').all() for r in result: print "User:", r[0] print "Address:", r[1]
The above syntax is shorthand for using the add_entity()
method:
session.query(User).add_entity(Address).join('addresses').all()
To join across multiple relationships, specify them in a list. Below, we load a ShoppingCart
, limiting its cartitems
collection to the single item which has a price
object whose amount
column is 47.95:
cart = session.query(ShoppingCart).join(['cartitems', 'price']).filter_by(amount=47.95).one()
filter_by()
can also generate joins in some cases, such as when comparing to an object instance:
# get an instance of Address. assume its primary key identity # is 12. someaddress = session.query(Address).filter_by(street='123 Green Street').one() # look for User instances which have the # "someaddress" instance in their "addresses" collection sqll = session.query(User).filter_by(addresses=someaddress).all()
You can also create joins in "reverse", that is, to find an object with a certain parent. This is accomplished using with_parent()
:
# load a user someuser = session.query(User).get(2) # load an address with that user as a parent and email address foo@bar.com sqlsomeaddresses = session.query(Address).with_parent(someuser). filter_by(email_address="foo@bar.com").all()
Sometimes it's necessary to create repeated joins that are independent of each other, even though they reference the same tables. Using our one-to-many setup, an example is to locate users who have two partcular email addresses. We can do this using table aliases:
ad1 = addresses_table.alias('ad1') ad2 = addresses_table.alias('ad2') sqlresult = session.query(User).filter(and_( ad1.c.user_id==users.c.user_id, ad1.c.email_address=='foo@bar.com', ad2.c.user_id==users.c.user_id, ad2.c.email_address=='lala@yahoo.com' )).all()
Version 0.4 of SQLAlchemy will include better ability to issue queries like the above with less verbosity.
back to section topWe've seen how the relation
specifier affects the saving of an object and its child items, and also how it allows us to build joins. How to we get the actual related items loaded ? By default, the relation()
function indicates that the related property should be attached a lazy loader when instances of the parent object are loaded from the database; this is just a callable function that when accessed will invoke a second SQL query to load the child objects of the parent.
# define a user mapper mapper(User, users_table, properties = { 'addresses' : relation(Address) }) # define an address mapper mapper(Address, addresses_table) # select users where username is 'jane', get the first element of the list # this will incur a load operation for the parent table sqluser = session.query(User).filter(User.c.user_name=='jane')[0]
# iterate through the User object's addresses. this will incur an # immediate load of those child items sqlfor a in user.addresses:
print repr(a)
Eager Loading is another way for relationships to be loaded. It describes the loading of parent and child objects across a relation using a single query. The purpose of eager loading is strictly one of performance enhancement; eager loading has no impact on the results of a query, except that when traversing child objects within the results, lazy loaders will not need to issue separate queries to load those child objects.
With just a single parameter lazy=False
specified to the relation object, the parent and child SQL queries can be joined together.
mapper(Address, addresses_table) mapper(User, users_table, properties = { 'addresses' : relation(Address, lazy=False) } ) sqlusers = session.query(User).filter(User.c.user_name=='Jane').all()
for u in users: print repr(u) for a in u.addresses: print repr(a)
Above, a pretty ambitious query is generated just by specifying that the User should be loaded with its child Addresses in one query. When the mapper processes the results, it uses an Identity Map to keep track of objects that were already loaded, based on their primary key identity. Through this method, the redundant rows produced by the join are organized into the distinct object instances they represent.
Recall that eager loading has no impact on the results of the query. What if our query included our own join criterion? The eager loading query accomodates this using aliases, and is immune to the effects of additional joins being specified in the original query. Joining against the "addresses" table to locate users with a certain street results in this behavior:
sqlusers = session.query(User).join('addresses').filter_by(street='123 Green Street').all()
The join resulting from join('addresses')
is separate from the join produced by the eager join, which is "aliasized" to prevent conflicts.
The options()
method on the Query
object is allows modifications to the underlying querying methodology. The most common use of this feature is to change the "eager/lazy" loading behavior of a particular mapper, via the functions eagerload()
, lazyload()
and noload()
:
# user mapper with lazy addresses mapper(User, users_table, properties = { 'addresses' : relation(mapper(Address, addresses_table)) } ) # query object query = session.query(User) # make an eager loading query eagerquery = query.options(eagerload('addresses')) u = eagerquery.all() # make another query that wont load the addresses at all plainquery = query.options(noload('addresses')) # multiple options can be specified myquery = oldquery.options(lazyload('tracker'), noload('streets'), eagerload('members')) # to specify a relation on a relation, separate the property names by a "." myquery = oldquery.options(eagerload('orders.items'))
Previously, we've discussed how to set up a one-to-many relationship. This section will go over the remaining major types of relationships that can be configured. More detail on on relationships as well as more advanced patterns can be found in Advanced Data Mapping.
The above examples focused on the "one-to-many" relationship. To do other forms of relationship is easy, as the relation()
function can usually figure out what you want:
metadata = MetaData() # a table to store a user's preferences for a site prefs_table = Table('user_prefs', metadata, Column('pref_id', Integer, primary_key = True), Column('stylename', String(20)), Column('save_password', Boolean, nullable = False), Column('timezone', CHAR(3), nullable = False) ) # user table with a 'preference_id' column users_table = Table('users', metadata, Column('user_id', Integer, primary_key = True), Column('user_name', String(16), nullable = False), Column('password', String(20), nullable = False), Column('preference_id', Integer, ForeignKey("user_prefs.pref_id")) ) # engine and some test data engine = create_engine('sqlite:///', echo=True) metadata.create_all(engine) engine.execute(prefs_table.insert(), dict(pref_id=1, stylename='green', save_password=1, timezone='EST')) engine.execute(users_table.insert(), dict(user_name = 'fred', password='45nfss', preference_id=1)) # classes class User(object): def __init__(self, user_name, password): self.user_name = user_name self.password = password class UserPrefs(object): pass mapper(UserPrefs, prefs_table) mapper(User, users_table, properties = { 'preferences':relation(UserPrefs, lazy=False, cascade="all, delete-orphan"), }) # select session = create_session(bind=engine) sqluser = session.query(User).filter_by(user_name='fred').one()
save_password = user.preferences.save_password # modify user.preferences.stylename = 'bluesteel' # flush sqlsession.flush()
The relation()
function handles a basic many-to-many relationship when you specify an association table using the secondary
argument:
metadata = MetaData() articles_table = Table('articles', metadata, Column('article_id', Integer, primary_key = True), Column('headline', String(150), key='headline'), Column('body', TEXT, key='body'), ) keywords_table = Table('keywords', metadata, Column('keyword_id', Integer, primary_key = True), Column('keyword_name', String(50)) ) itemkeywords_table = Table('article_keywords', metadata, Column('article_id', Integer, ForeignKey("articles.article_id")), Column('keyword_id', Integer, ForeignKey("keywords.keyword_id")) ) engine = create_engine('sqlite:///') metadata.create_all(engine) # class definitions class Keyword(object): def __init__(self, name): self.keyword_name = name class Article(object): pass mapper(Keyword, keywords_table) # define a mapper that does many-to-many on the 'itemkeywords' association # table mapper(Article, articles_table, properties = { 'keywords':relation(Keyword, secondary=itemkeywords_table, lazy=False) } ) session = create_session(bind=engine) article = Article() article.headline = 'a headline' article.body = 'this is the body' article.keywords.append(Keyword('politics')) article.keywords.append(Keyword('entertainment')) session.save(article) sqlsession.flush()
# select articles based on a keyword. sqlarticles = session.query(Article).join('keywords').filter_by(keyword_name='politics').all()
a = articles[0] # clear out keywords with a new list a.keywords = [] a.keywords.append(Keyword('topstories')) a.keywords.append(Keyword('government')) # flush sqlsession.flush()
Many to Many can also be done with an association object, that adds additional information about how two items are related. In this pattern, the "secondary" option to relation()
is no longer used; instead, the association object becomes a mapped entity itself, mapped to the association table. If the association table has no explicit primary key columns defined, you also have to tell the mapper what columns will compose its "primary key", which are typically the two (or more) columns involved in the association. Also, the relation between the parent and association mapping is typically set up with a cascade of all, delete-orphan
. This is to ensure that when an association object is removed from its parent collection, it is deleted (otherwise, the unit of work tries to null out one of the foreign key columns, which raises an error condition since that column is also part of its "primary key").
from sqlalchemy import * metadata = MetaData() users_table = Table('users', metadata, Column('user_id', Integer, primary_key = True), Column('user_name', String(16), nullable = False), ) articles_table = Table('articles', metadata, Column('article_id', Integer, primary_key = True), Column('headline', String(150), key='headline'), Column('body', TEXT, key='body'), ) keywords_table = Table('keywords', metadata, Column('keyword_id', Integer, primary_key = True), Column('keyword_name', String(50)) ) # add "attached_by" column which will reference the user who attached this keyword itemkeywords_table = Table('article_keywords', metadata, Column('article_id', Integer, ForeignKey("articles.article_id")), Column('keyword_id', Integer, ForeignKey("keywords.keyword_id")), Column('attached_by', Integer, ForeignKey("users.user_id")) ) engine = create_engine('sqlite:///', echo=True) metadata.create_all(engine) # class definitions class User(object): pass class Keyword(object): def __init__(self, name): self.keyword_name = name class Article(object): pass class KeywordAssociation(object): pass # Article mapper, relates to Keyword via KeywordAssociation mapper(Article, articles_table, properties={ 'keywords':relation(KeywordAssociation, lazy=False, cascade="all, delete-orphan") } ) # mapper for KeywordAssociation # specify "primary key" columns manually mapper(KeywordAssociation, itemkeywords_table, primary_key=[itemkeywords_table.c.article_id, itemkeywords_table.c.keyword_id], properties={ 'keyword' : relation(Keyword, lazy=False), 'user' : relation(User, lazy=False) } ) # user mapper mapper(User, users_table) # keyword mapper mapper(Keyword, keywords_table) session = create_session(bind=engine) # select by keyword sqlalist = session.query(Article).join(['keywords', 'keyword']).filter_by(keyword_name='jacks_stories').all()
# user is available for a in alist: for k in a.keywords: if k.keyword.name == 'jacks_stories': print k.user.user_name
Keep in mind that the association object works a little differently from a plain many-to-many relationship. Members have to be added to the list via instances of the association object, which in turn point to the associated object:
user = User() user.user_name = 'some user' article = Article() assoc = KeywordAssociation() assoc.keyword = Keyword('blue') assoc.user = user assoc2 = KeywordAssociation() assoc2.keyword = Keyword('green') assoc2.user = user article.keywords.append(assoc) article.keywords.append(assoc2) session.save(article) session.flush()
SQLAlchemy includes an extension module which can be used in some cases to decrease the explicitness of the association object pattern; this extension is described in associationproxy.
Note that you should not combine the usage of a secondary
relationship with an association object pattern against the same association table. This is because SQLAlchemy's unit of work will regard rows in the table tracked by the secondary
argument as distinct from entities mapped into the table by the association mapper, causing unexpected behaviors when rows are changed by one mapping and not the other.