The concept behind Unit of Work is to track modifications to a field of objects, and then be able to flush those changes to the database in a single operation. Theres a lot of advantages to this, including that your application doesn't need to worry about individual save operations on objects, nor about the required order for those operations, nor about excessive repeated calls to save operations that would be more efficiently aggregated into one step. It also simplifies database transactions, providing a neat package with which to insert into the traditional database begin/commit phase.
SQLAlchemy's unit of work includes these functions:
Session
object. Transactional capability, which rides on top of the transactions provided by Engine
objects, is provided by the SessionTransaction
object.
When dealing with mapped instances with regards to Sessions, an instance may be attached or unattached to a particular Session. An instance also may or may not correspond to an actual row in the database. The product of these two binary conditions yields us four general states a particular instance can have within the perspective of the Session:
Transient - a transient instance exists within memory only and is not associated with any Session. It also has no database identity and does not have a corresponding record in the database. When a new instance of a class is constructed, and no default session context exists with which to automatically attach the new instance, it is a transient instance. The instance can then be saved to a particular session in which case it becomes a pending instance. If a default session context exists, new instances are added to that Session by default and therefore become pending instances immediately.
Pending - a pending instance is a Session-attached object that has not yet been assigned a database identity. When the Session is flushed (i.e. changes are persisted to the database), a pending instance becomes persistent.
Persistent - a persistent instance has a database identity and a corresponding record in the database, and is also associated with a particular Session. By "database identity" we mean the object is associated with a table or relational concept in the database combined with a particular primary key in that table. Objects that are loaded by SQLAlchemy in the context of a particular session are automatically considered persistent, as are formerly pending instances which have been subject to a session flush()
.
Detached - a detached instance is an instance which has a database identity and corresponding row in the database, but is not attached to any Session. This occurs when an instance has been removed from a Session, either because the session itself was cleared or closed, or the instance was explicitly removed from the Session. The object can be re-attached to a session in which case it becomes Persistent again; any un-persisted changes that exist on the instance, whether they occurred during its previous persistent state or during its detached state will be detected and maintained by the new session. Detached instances are useful when an application needs to represent a long-running operation across multiple Sessions, needs to store an object in a serialized state and then restore it later (such as within an HTTP "session" object), or in some cases where code needs to load instances locally which will later be associated with some other Session.
A new Session object is constructed via the create_session()
function:
session = create_session()
A common option used with create_session()
is to specify a specific Engine
or Connection
to be used for all operations performed by this Session:
# create an engine e = create_engine('postgres://some/url') # create a Session that will use this engine for all operations. # it will open and close Connections as needed. session = create_session(bind=e) # open a Connection conn = e.connect() # create a Session that will use this specific Connection for all operations session = create_session(bind=conn)
The session to which an object is attached can be acquired via the object_session()
function, which returns the appropriate Session
if the object is pending or persistent, or None
if the object is transient or detached:
session = object_session(obj)
Session Facts:
We will now cover some of the key concepts used by Sessions and its underlying Unit of Work.
back to section topA primary concept of the Session's underlying Unit of Work is that it is keeps track of all persistent instances; recall that a persistent instance has a database identity and is attached to a Session. In particular, the Unit of Work must ensure that only one copy of a particular persistent instance exists within the Session at any given time. The UOW accomplishes this task using a dictionary known as an Identity Map.
When a Query
is used to issue select
or get
requests to the database, it will in nearly all cases result in an actual SQL execution to the database, and a corresponding traversal of rows received from that execution. However, when the underlying mapper actually creates objects corresponding to the result set rows it receives, it will check the session's identity map first before instantating a new object, and return the same instance already present in the identity map if it already exists, essentially ignoring the object state represented by that row. There are several ways to override this behavior and truly refresh an already-loaded instance which are described later, but the main idea is that once your instance is loaded into a particular Session, it will never change its state without your explicit approval, regardless of what the database says about it.
For example; below, two separate calls to load an instance with database identity "15" are issued, and the results assigned to two separate variables. However, since the same Session
was used, the two instances are the same instance:
mymapper = mapper(MyClass, mytable) session = create_session() obj1 = session.query(MyClass).filter(mytable.c.id==15).first() obj2 = session.query(MyClass).filter(mytable.c.id==15).first() >>> obj1 is obj2 True
The Identity Map is an instance of dict
by default. As an option, you can specify the flag weak_identity_map=True
to the create_session
function so that it will use a weakref.WeakValueDictionary
, so that when an in-memory object falls out of scope, it will be removed automatically, thereby providing some automatic management of memory. However, this may not be instant if there are circular references upon the object. To guarantee that an instance is removed from the identity map before removing references to it, use the expunge()
method, described later, to remove it. Additionally, note that an object that has changes marked on it (i.e. "dirty") can still fall out of scope when using weak_identity_map
.
The Session supports an iterator interface in order to see all objects in the identity map:
for obj in session: print obj
As well as __contains__()
:
if obj in session: print "Object is present"
The identity map itself is accessible via the identity_map
accessor:
>>> session.identity_map.values() [<__main__.User object at 0x712630>, <__main__.Address object at 0x712a70>]
The identity of each object instance is available via the _instance_key
property attached to each object instance, and is a tuple consisting of the object's class and an additional tuple of primary key values, in the order that they appear within the table definition:
>>> obj._instance_key (<class 'test.tables.User'>, (7,))
At the moment that an object is assigned this key within a flush()
operation, it is also added to the session's identity map.
The get()
method on Query
, which retrieves an object based on primary key identity, also checks in the Session's identity map first to save a database round-trip if possible. In the case of an object lazy-loading a single child object, the get()
method is used as well, so scalar-based lazy loads may in some cases not query the database; this is particularly important for backreference relationships as it can save a lot of queries.
The next concept is that in addition to the Session
storing a record of all objects loaded or saved, it also stores lists of all newly created (i.e. pending) objects and lists of all persistent objects that have been marked as deleted. These lists are used when a flush()
call is issued to save all changes. During a flush operation, it also scans its list of persistent instances for changes which are marked as dirty.
These records are all tracked by collection functions that are also viewable off the Session
as properties:
# pending objects recently added to the Session session.new # persistent objects which currently have changes detected # (this collection is now created on the fly each time the property is called) session.dirty # persistent objects that have been marked as deleted via session.delete(obj) session.deleted
Note that if a session is created with the weak_identity_map
flag, an item which is marked as "dirty" will be silently removed from the session if the item falls out of scope in the user application. This is because the unit of work does not look for "dirty" changes except for within a flush operation (or any time the session.dirty collection is accessed).
As for objects inside of new
and deleted
, if you abandon all references to new or modified objects within a session, they are still present in either of those two lists, and will be saved on the next flush operation, unless they are removed from the Session explicitly (more on that later).
The query()
function takes one or more classes and/or mappers, along with an optional entity_name
parameter, and returns a new Query
object which will issue mapper queries within the context of this Session. For each mapper is passed, the Query uses that mapper. For each class, the Query will locate the primary mapper for the class using class_mapper()
.
# query from a class session.query(User).filter_by(name='ed').all() # query with multiple classes, returns tuples session.query(User, Address).join('addresses').filter_by(name='ed').all() # query from a mapper query = session.query(usermapper) x = query.get(1) # query from a class mapped with entity name 'alt_users' q = session.query(User, entity_name='alt_users') y = q.options(eagerload('orders')).all()
entity_name
is an optional keyword argument sent with a class object, in order to further qualify which primary mapper to be used; this only applies if there was a Mapper
created with that particular class/entity name combination, else an exception is raised. All of the methods on Session which take a class or mapper argument also take the entity_name
argument, so that a given class can be properly matched to the desired primary mapper.
All instances retrieved by the returned Query
object will be stored as persistent instances within the originating Session
.
Given a class or mapper, a scalar or tuple-based identity, and an optional entity_name
keyword argument, creates a Query
corresponding to the given mapper or class/entity_name combination, and calls the get()
method with the given identity value. If the object already exists within this Session, it is simply returned, else it is queried from the database. If the instance is not found, the method returns None
.
# get Employer primary key 5 employer = session.get(Employer, 5) # get Report composite primary key 7,12, using mapper 'report_mapper_b' report = session.get(Report, (7,12), entity_name='report_mapper_b')
load() is similar to get() except it will raise an exception if the instance does not exist in the database. It will also load the object's data from the database in all cases, and overwrite all changes on the object if it already exists in the session with the latest data from the database.
# load Employer primary key 5 employer = session.load(Employer, 5) # load Report composite primary key 7,12, using mapper 'report_mapper_b' report = session.load(Report, (7,12), entity_name='report_mapper_b')
save() is called with a single transient (unsaved, unattached) instance as an argument, which is then added to the Session and becomes pending. When the session is next flush
ed, the instance will be saved to the database uponwhich it becomes persistent (saved, attached). If the given instance is not transient, meaning it is either attached to an existing Session or it has a database identity, an exception is raised.
user1 = User(name='user1') user2 = User(name='user2') session.save(user1) session.save(user2) session.flush() # write changes to the database
save() is called automatically for new instances by the classes' associated mapper, if a default Session context is in effect (such as a thread-local session), which means that newly created instances automatically become pending. If there is no default session available, then the instance remains transient (unattached) until it is explicitly added to a Session via the save() method.
A transient instance also can be automatically save
ed if it is associated with a parent object which specifies save-update
within its cascade
rules, and that parent is already attached or becomes attached to a Session. For more information on cascade
, see the next section.
The save_or_update()
method, covered later, is a convenience method which will call the save()
or update()
methods appropriately dependening on whether or not the instance has a database identity (but the instance still must be unattached).
This is the main gateway to what the Unit of Work does best, which is save everything ! It should be clear by now what a flush looks like:
session.flush()
It also can be called with a list of objects; in this form, the flush operation will be limited only to the objects specified in the list, as well as any child objects within private
relationships for a delete operation:
# saves only user1 and address2. all other modified # objects remain present in the session. session.flush([user1, address2])
This second form of flush should be used carefully as it will not necessarily locate other dependent objects within the session, whose database representation may have foreign constraint relationships with the objects being operated upon.
A common misconception about the flush()
operation is that once performed, the newly persisted instances will automatically have related objects attached to them, based on the values of primary key identities that have been assigned to the instances before they were persisted. An example would be, you create a new Address
object, set address.user_id
to 5, and then flush()
the session. The erroneous assumption would be that there is now a User
object of identity "5" attached to the Address
object, but in fact this is not the case. If you were to refresh()
the Address
, invalidating its current state and re-loading, then it would have the appropriate User
object present.
This misunderstanding is related to the observed behavior of backreferences (Backreferences), which automatically associates an instance "A" with another instance "B", in response to the manual association of instance "B" to instance "A" by the user. The backreference operation occurs completely externally to the flush()
operation, and is pretty much the only example of a SQLAlchemy feature that manipulates the relationships of persistent objects.
The primary guideline for dealing with flush()
is, the developer is responsible for maintaining in-memory objects and their relationships to each other, the unit of work is responsible for maintaining the database representation of the in-memory objects. The typical pattern is that the manipulation of objects is the way that changes get communicated to the unit of work, so that when the flush occurs, the objects are already in their correct in-memory representation and problems dont arise. The manipulation of identifier attributes like integer key values as well as deletes in particular are a frequent source of confusion.
This method first calls clear()
, removing all objects from this Session
, and then ensures that any transactional resources are closed.
The delete
method places an instance into the Unit of Work's list of objects to be marked as deleted:
# mark two objects to be deleted session.delete(obj1) session.delete(obj2) # flush session.flush()
The delete operation will have an effect on instances that are attached to the deleted instance according to the cascade
style of the relationship; cascade rules are described further in the following section. By default, associated instances may need to be updated in the database to reflect that they no longer are associated with the parent object, before the parent is deleted. If the relationship specifies cascade="delete"
, then the associated instance will also be deleted upon flush, assuming it is still attached to the parent. If the relationship additionally includes the delete-orphan
cascade style, the associated instance will be deleted if it is still attached to the parent, or is unattached to any other parent.
The delete()
operation has no relationship to the in-memory status of the instance, including usage of the del
Python statement. An instance marked as deleted and flushed will still exist within memory until references to it are freed; similarly, removing an instance from memory via the del
statement will have no effect, since the persistent instance will still be referenced by its Session. Obviously, if the instance is removed from the Session and then totally dereferenced, it will no longer exist in memory, but also won't exist in any Session and is therefore not deleted from the database.
Note that the "in-memory status" of an instance also refers to its presence in any other collection. SQLAlchemy does not track the collections to which an instance is a member, and will not remove an instance from its parent collections that were not directly involved in a deletion operation. The operational and memory overhead implied by this would be too great (such as, if an object belonged to hundreds of collections). This means if an object A
is attached to both an object B
and an object C
, if you delete()
A
and flush, A
still remains attached to both B
and C
in a deleted state and must be removed by the application. Similarly, if a delete on B
cascades to A
, this does not affect A
still being present on C
- again it must be manually removed.
This method detaches all instances from the Session, sending them to the detached or transient state as applicable, and replaces the underlying UnitOfWork with a new one.
session.clear()
The clear()
method is particularly useful with a "default context" session such as a thread-local session, which can stay attached to the current thread to handle a new field of objects without having to re-attach a new Session.
To assist with the Unit of Work's "sticky" behavior, individual objects can have all of their attributes immediately re-loaded from the database, or marked as "expired" which will cause a re-load to occur upon the next access of any of the object's mapped attributes. This includes all relationships, so lazy-loaders will be re-initialized, eager relationships will be repopulated. Any changes marked on the object are discarded:
# immediately re-load attributes on obj1, obj2 session.refresh(obj1) session.refresh(obj2) # expire objects obj1, obj2, attributes will be reloaded # on the next access: session.expire(obj1) session.expire(obj2)
Expunge removes an object from the Session, sending persistent instances to the detached state, and pending instances to the transient state:
session.expunge(obj1)
Use expunge
when youd like to remove an object altogether from memory, such as before calling del
on it, which will prevent any "ghost" operations occuring when the session is flushed.
Both of these methods receive two arguments; in the case of bind_mapper()
, it is a Mapper
and an Engine
or Connection
instance; in the case of bind_table()
, it is a Table
instance or other Selectable
(such as an Alias
, Select
, etc.), and an Engine
or Connection
instance.
engine1 = create_engine('sqlite:///file1.db') engine2 = create_engine('mysql://localhost') sqlite_conneciton = engine1.connect() sess = create_session() sess.bind_mapper(mymapper, sqlite_connection) # bind mymapper operations to a single SQLite connection sess.bind_table(email_addresses_table, engine2) # bind operations with the email_addresses_table to mysql
Normally, when a Session
is created via create_session()
with no arguments, the Session has no awareness of individual Engines
, and when mappers use the Session
to retrieve connections, the underlying MetaData
each Table
is associated with is expected to be "bound" to an Engine
, else no engine can be located and an exception is raised. A second form of create_session()
takes the argument bind=engine_or_connection
, where all SQL operations performed by this Session
use the single Engine
or Connection
(collectively known as a Connectable
) passed to the constructor. With bind_mapper()
and bind_table()
, the operations of individual mapper and/or tables are bound to distinct engines or connections, thereby overriding not only the engine which may be "bound" to the underlying MetaData
, but also the Engine
or Connection
which may have been passed to the create_session()
function. Configurations which interact with multiple explicit database connections at one time must use either or both of these methods in order to associate Session
operations with the appropriate connection resource.
Binding a Mapper
to a resource takes precedence over a Table
bind, meaning if mapper A is associated with table B, and the Session binds mapper A to connection X and table B to connection Y, an operation with mapper A will use connection X, not connection Y.
The update() method is used only with detached instances. A detached instance only exists if its Session
was cleared or closed, or the instance was expunge()
d from its session. update()
will re-attach the detached instance with this Session, bringing it back to the persistent state, and allowing any changes on the instance to be saved when the Session
is next flush
ed. If the instance is already attached to an existing Session
, an exception is raised.
A detached instance also can be automatically update
ed if it is associated with a parent object which specifies save-update
within its cascade
rules, and that parent is already attached or becomes attached to a Session. For more information on cascade
, see the next section.
The save_or_update()
method is a convenience method which will call the save()
or update()
methods appropriately dependening on whether or not the instance has a database identity (but the instance still must be unattached).
This method is a combination of the save()
and update()
methods, which will examine the given instance for a database identity (i.e. if it is transient or detached), and will call the implementation of save()
or update()
as appropriate. Use save_or_update()
to add unattached instances to a session when you're not sure if they were newly created or not. Like save()
and update()
, save_or_update()
cascades along the save-update
cascade indicator, described in the cascade
section below.
merge()
is used to return the persistent version of an instance that is not attached to this Session. When passed an instance, if an instance with its database identity already exists within this Session, it is returned. If the instance does not exist in this Session, it is loaded from the database and then returned.
A future version of merge()
will also update the Session's instance with the state of the given instance (hence the name "merge").
This method is useful for bringing in objects which may have been restored from a serialization, such as those stored in an HTTP session:
# deserialize an object myobj = pickle.loads(mystring) # "merge" it. if the session already had this object in the # identity map, then you get back the one from the current session. myobj = session.merge(myobj)
Note that merge()
does not associate the given instance with the Session; it remains detached (or attached to whatever Session it was already attached to).
Mappers support the concept of configurable cascade behavior on relation()
s. This behavior controls how the Session should treat the instances that have a parent-child relationship with another instance that is operated upon by the Session. Cascade is indicated as a comma-separated list of string keywords, with the possible values all
, delete
, save-update
, refresh-expire
, merge
, expunge
, and delete-orphan
.
Cascading is configured by setting the cascade
keyword argument on a relation()
:
mapper(Order, order_table, properties={ 'items' : relation(Item, items_table, cascade="all, delete-orphan"), 'customer' : relation(User, users_table, user_orders_table, cascade="save-update"), })
The above mapper specifies two relations, items
and customer
. The items
relationship specifies "all, delete-orphan" as its cascade
value, indicating that all save
, update
, merge
, expunge
, refresh
delete
and expire
operations performed on a parent Order
instance should also be performed on the child Item
instances attached to it (save
and update
are cascaded using the save_or_update()
method, so that the database identity of the instance doesn't matter). The delete-orphan
cascade value additionally indicates that if an Item
instance is no longer associated with an Order
, it should also be deleted. The "all, delete-orphan" cascade argument allows a so-called lifecycle relationship between an Order
and an Item
object.
The customer
relationship specifies only the "save-update" cascade value, indicating most operations will not be cascaded from a parent Order
instance to a child User
instance, except for if the Order
is attached with a particular session, either via the save()
, update()
, or save-update()
method.
Additionally, when a child item is attached to a parent item that specifies the "save-update" cascade value on the relationship, the child is automatically passed to save_or_update()
(and the operation is further cascaded to the child item).
Note that cascading doesn't do anything that isn't possible by manually calling Session methods on individual instances within a hierarchy, it merely automates common operations on a group of associated instances.
The default value for cascade
on relation()
s is save-update
, and the private=True
keyword argument is a synonym for cascade="all, delete-orphan"
.
SessionTransaction is a multi-engine transaction manager, which aggregates one or more Engine/Connection pairs and keeps track of a Transaction object for each one. As the Session receives requests to execute SQL statements, it uses the Connection that is referenced by the SessionTransaction. At commit time, the underyling Session is flushed, and each Transaction is the committed.
Example usage is as follows:
sess = create_session() trans = sess.create_transaction() try: item1 = sess.query(Item).get(1) item2 = sess.query(Item).get(2) item1.foo = 'bar' item2.bar = 'foo' except: trans.rollback() raise trans.commit()
The SessionTransaction object supports Python 2.5's with statement so that the example above can be written as:
sess = create_session() with sess.create_transaction(): item1 = sess.query(Item).get(1) item2 = sess.query(Item).get(2) item1.foo = 'bar' item2.bar = 'foo'
The create_transaction()
method creates a new SessionTransaction object but does not declare any connection/transaction resources. At the point of the first get()
call, a connection resource is opened off the engine that corresponds to the Item classes' mapper and is stored within the SessionTransaction
with an open Transaction
. When trans.commit()
is called, the flush()
method is called on the Session
and the corresponding update statements are issued to the database within the scope of the transaction already opened; afterwards, the underying Transaction is committed, and connection resources are freed.
SessionTransaction
, like the Transaction
off of Connection
also supports "nested" behavior, and is safe to pass to other functions which then issue their own begin()
/commit()
pair; only the outermost begin()
/commit()
pair actually affects the transaction, and any call to rollback()
within a particular call stack will issue a rollback.
Note that while SessionTransaction is capable of tracking multiple transactions across multiple databases, it currently is in no way a fully functioning two-phase commit engine; generally, when dealing with multiple databases simultaneously, there is the distinct possibility that a transaction can succeed on the first database and fail on the second, which for some applications may be an invalid state. If this is an issue, its best to either refrain from spanning transactions across databases, or to look into some of the available technologies in this area, such as Zope which offers a two-phase commit engine; some users have already created their own SQLAlchemy/Zope hybrid implementations to deal with scenarios like these.
SessionTransaction Facts:
The SessionTransaction can interact with direct SQL queries in two general ways. Either specific Connection
objects can be associated with the SessionTransaction
, which are then useable both for direct SQL as well as within flush()
operations performed by the SessionTransaction
, or via accessing the Connection
object automatically referenced within the SessionTransaction
.
To associate a specific Connection
with the SessionTransaction
, use the add()
method:
connection = engine.connect() trans = session.create_transaction() try: trans.add(connection) connection.execute(mytable.update(), {'col1':4, 'col2':17}) session.flush() # flush() operation will use the same connection except: trans.rollback() raise trans.commit()
The add()
method will key the Connection
's underlying Engine
to this SessionTransaction
. When mapper operations are performed against this Engine
, the Connection
explicitly added will be used. This overrides any other Connection
objects that the underlying Session was associated with, corresponding to the underlying Engine
of that Connection
. However, if the SessionTransaction
itself is already associated with a Connection
, then an exception is thrown.
The other way is just to use the Connection
referenced by the SessionTransaction
. This is performed via the connection()
method, and requires passing in a class or Mapper
which indicates which underlying Connection
should be returned (recall that different Mappers
may use different underlying Engines
). If the class_or_mapper
argument is None
, then the Session
must be globally bound to a specific Engine
when it was constructed, else the method returns None
.
trans = session.create_transaction() try: connection = trans.connection(UserClass) # get the Connection used by the UserClass' Mapper connection.execute(mytable.update(), {'col1':4, 'col2':17}) except: trans.rollback() raise trans.commit()
The connection()
method also exists on the Session
object itself, and can be called regardless of whether or not a SessionTransaction
is in progress. If a SessionTransaction
is in progress, it will return the connection referenced by the transaction. If an Engine
is being used with threadlocal
strategy, the Connection
returned will correspond to the connection resources that are bound to the current thread, if any (i.e. it is obtained by calling contextual_connect()
).
The transactions issued by SessionTransaction
as well as internally by the Session
's flush()
operation use the same Transaction
object off of Connection
that is publically available. Recall that this object supports "nestable" behavior, meaning any number of actors can call begin()
off a particular Connection
object, and they will all be managed within the scope of a single transaction. Therefore, the flush()
operation can similarly take place within the scope of a regular Transaction
:
connection = engine.connect() # Connection session = create_session(bind=connection) # Session bound to the Connection trans = connection.begin() # start transaction try: stuff = session.query(MyClass).select() # Session operation uses connection stuff[2].foo = 'bar' connection.execute(mytable.insert(), dict(id=12, value="bar")) # use connection explicitly session.flush() # Session flushes with "connection", using transaction "trans" except: trans.rollback() # or rollback raise trans.commit() # commit