SQLAlchemy 0.3 Documentation

Multiple Pages | One Page
Version: 0.3.10 Last Updated: 07/20/07 17:20:04

Describing Databases with MetaData

The core of SQLAlchemy's query and object mapping operations are supported by database metadata, which is comprised of Python objects that describe tables and other schema-level objects. These objects can be created by explicitly naming the various components and their properties, using the Table, Column, ForeignKey, Index, and Sequence objects imported from sqlalchemy.schema. There is also support for reflection of some entities, which means you only specify the name of the entities and they are recreated from the database automatically.

A collection of metadata entities is stored in an object aptly named MetaData:

from sqlalchemy import *

metadata = MetaData()

To represent a Table, use the Table class:

users = Table('users', metadata, 
    Column('user_id', Integer, primary_key = True),
    Column('user_name', String(16), nullable = False),
    Column('email_address', String(60), key='email'),
    Column('password', String(20), nullable = False)
)

user_prefs = Table('user_prefs', metadata, 
    Column('pref_id', Integer, primary_key=True),
    Column('user_id', Integer, ForeignKey("users.user_id"), nullable=False),
    Column('pref_name', String(40), nullable=False),
    Column('pref_value', String(100))
)

The specific datatypes for each Column, such as Integer, String, etc. are described in The Types System, and exist within the module sqlalchemy.types as well as the global sqlalchemy namespace.

Foreign keys are most easily specified by the ForeignKey object within a Column object. For a composite foreign key, i.e. a foreign key that contains multiple columns referencing multiple columns to a composite primary key, an explicit syntax is provided which allows the correct table CREATE statements to be generated:

# a table with a composite primary key
invoices = Table('invoices', metadata, 
    Column('invoice_id', Integer, primary_key=True),
    Column('ref_num', Integer, primary_key=True),
    Column('description', String(60), nullable=False)
)

# a table with a composite foreign key referencing the parent table
invoice_items = Table('invoice_items', metadata, 
    Column('item_id', Integer, primary_key=True),
    Column('item_name', String(60), nullable=False),
    Column('invoice_id', Integer, nullable=False),
    Column('ref_num', Integer, nullable=False),
    ForeignKeyConstraint(['invoice_id', 'ref_num'], ['invoices.invoice_id', 'invoices.ref_num'])
)

Above, the invoice_items table will have ForeignKey objects automatically added to the invoice_id and ref_num Column objects as a result of the additional ForeignKeyConstraint object.

The MetaData object supports some handy methods, such as getting a list of Tables in the order (or reverse) of their dependency:

>>> for t in metadata.table_iterator(reverse=False):
...    print t.name
users
user_prefs

And Table provides an interface to the table's properties as well as that of its columns:

employees = Table('employees', metadata, 
    Column('employee_id', Integer, primary_key=True),
    Column('employee_name', String(60), nullable=False, key='name'),
    Column('employee_dept', Integer, ForeignKey("departments.department_id"))
)

# access the column "EMPLOYEE_ID":
employees.columns.employee_id

# or just
employees.c.employee_id

# via string
employees.c['employee_id']

# iterate through all columns
for c in employees.c:
    # ...

# get the table's primary key columns
for primary_key in employees.primary_key:
    # ...

# get the table's foreign key objects:
for fkey in employees.foreign_keys:
    # ...

# access the table's MetaData:
employees.metadata

# access the table's bound Engine or Connection, if its MetaData is bound:
employees.bind

# access a column's name, type, nullable, primary key, foreign key
employees.c.employee_id.name
employees.c.employee_id.type
employees.c.employee_id.nullable
employees.c.employee_id.primary_key
employees.c.employee_dept.foreign_key

# get the "key" of a column, which defaults to its name, but can 
# be any user-defined string:
employees.c.name.key

# access a column's table:
employees.c.employee_id.table is employees
>>> True

# get the table related by a foreign key
fcolumn = employees.c.employee_dept.foreign_key.column.table

Binding MetaData to an Engine or Connection

A MetaData object can be associated with an Engine or an individual Connection; this process is called binding. The term used to describe "an engine or a connection" is often referred to as a connectable. Binding allows the MetaData and the elements which it contains to perform operations against the database directly, using the connection resources to which it's bound. Common operations which are made more convenient through binding include being able to generate SQL constructs which know how to execute themselves, creating Table objects which query the database for their column and constraint information, and issuing CREATE or DROP statements.

To bind MetaData to an Engine, use the connect() method:

engine = create_engine('sqlite://', **kwargs)

# create MetaData 
meta = MetaData()

# bind to an engine
meta.bind = engine

Once this is done, the MetaData and its contained Table objects can access the database directly:

meta.create_all()  # issue CREATE statements for all tables

# describe a table called 'users', query the database for its columns
users_table = Table('users', meta, autoload=True)

# generate a SELECT statement and execute
result = users_table.select().execute()

Note that the feature of binding engines is completely optional. All of the operations which take advantage of "bound" MetaData also can be given an Engine or Connection explicitly with which to perform the operation. The equivalent "non-bound" of the above would be:

meta.create_all(engine)  # issue CREATE statements for all tables

# describe a table called 'users',  query the database for its columns
users_table = Table('users', meta, autoload=True, autoload_with=engine)

# generate a SELECT statement and execute
result = engine.execute(users_table.select())
back to section top

Reflecting Tables

A Table object can be created without specifying any of its contained attributes, using the argument autoload=True in conjunction with the table's name and possibly its schema (if not the databases "default" schema). This will issue the appropriate queries to the database in order to locate all properties of the table required for SQLAlchemy to use it effectively, including its column names and datatypes, foreign and primary key constraints, and in some cases its default-value generating attributes. To use autoload=True, the table's MetaData object need be bound to an Engine or Connection, or alternatively the autoload_with=<some connectable> argument can be passed. Below we illustrate autoloading a table and then iterating through the names of its columns:

>>> messages = Table('messages', meta, autoload=True)
>>> [c.name for c in messages.columns]
['message_id', 'message_name', 'date']

Note that if a reflected table has a foreign key referencing another table, the related Table object will be automatically created within the MetaData object if it does not exist already. Below, suppose table shopping_cart_items references a table shopping_carts. After reflecting, the shopping carts table is present:

>>> shopping_cart_items = Table('shopping_cart_items', meta, autoload=True)
>>> 'shopping_carts' in meta.tables:
True

To get direct access to 'shopping_carts', simply instantiate it via the Table constructor. Table uses a special contructor that will return the already created Table instance if its already present:

shopping_carts = Table('shopping_carts', meta)

Of course, its a good idea to use autoload=True with the above table regardless. This is so that if it hadn't been loaded already, the operation will load the table. The autoload operation only occurs for the table if it hasn't already been loaded; once loaded, new calls to Table will not re-issue any reflection queries.

Overriding Reflected Columns

Individual columns can be overridden with explicit values when reflecting tables; this is handy for specifying custom datatypes, constraints such as primary keys that may not be configured within the database, etc.

>>> mytable = Table('mytable', meta,
... Column('id', Integer, primary_key=True),   # override reflected 'id' to have primary key
... Column('mydata', Unicode(50)),    # override reflected 'mydata' to be Unicode
... autoload=True)
back to section top

Specifying the Schema Name

Some databases support the concept of multiple schemas. A Table can reference this by specifying the schema keyword argument:

financial_info = Table('financial_info', meta,
    Column('id', Integer, primary_key=True),
    Column('value', String(100), nullable=False),
    schema='remote_banks'
)

Within the MetaData collection, this table will be identified by the combination of financial_info and remote_banks. If another table called financial_info is referenced without the remote_banks schema, it will refer to a different Table. ForeignKey objects can reference columns in this table using the form remote_banks.financial_info.id.

back to section top

ON UPDATE and ON DELETE

ON UPDATE and ON DELETE clauses to a table create are specified within the ForeignKeyConstraint object, using the onupdate and ondelete keyword arguments:

foobar = Table('foobar', meta,
    Column('id', Integer, primary_key=True),
    Column('lala', String(40)),
    ForeignKeyConstraint(['lala'],['hoho.lala'], onupdate="CASCADE", ondelete="CASCADE"))

Note that these clauses are not supported on SQLite, and require InnoDB tables when used with MySQL. They may also not be supported on other databases.

back to section top

Other Options

Tables may support database-specific options, such as MySQL's engine option that can specify "MyISAM", "InnoDB", and other backends for the table:

addresses = Table('engine_email_addresses', meta,
    Column('address_id', Integer, primary_key = True),
    Column('remote_user_id', Integer, ForeignKey(users.c.user_id)),
    Column('email_address', String(20)),
    mysql_engine='InnoDB'
)
back to section top

Creating and Dropping Database Tables

Creating and dropping individual tables can be done via the create() and drop() methods of Table; these methods take an optional bind parameter which references an Engine or a Connection. If not supplied, the Engine bound to the MetaData will be used, else an error is raised:

meta = MetaData()
meta.bind = 'sqlite:///:memory:'

employees = Table('employees', meta, 
    Column('employee_id', Integer, primary_key=True),
    Column('employee_name', String(60), nullable=False, key='name'),
    Column('employee_dept', Integer, ForeignKey("departments.department_id"))
)
sqlemployees.create()

drop() method:

sqlemployees.drop(bind=e)

The create() and drop() methods also support an optional keyword argument checkfirst which will issue the database's appropriate pragma statements to check if the table exists before creating or dropping:

employees.create(bind=e, checkfirst=True)
employees.drop(checkfirst=False)

Entire groups of Tables can be created and dropped directly from the MetaData object with create_all() and drop_all(). These methods always check for the existence of each table before creating or dropping. Each method takes an optional bind keyword argument which can reference an Engine or a Connection. If no engine is specified, the underlying bound Engine, if any, is used:

engine = create_engine('sqlite:///:memory:')

metadata = MetaData()

users = Table('users', metadata, 
    Column('user_id', Integer, primary_key = True),
    Column('user_name', String(16), nullable = False),
    Column('email_address', String(60), key='email'),
    Column('password', String(20), nullable = False)
)

user_prefs = Table('user_prefs', metadata, 
    Column('pref_id', Integer, primary_key=True),
    Column('user_id', Integer, ForeignKey("users.user_id"), nullable=False),
    Column('pref_name', String(40), nullable=False),
    Column('pref_value', String(100))
)

sqlmetadata.create_all(bind=engine)
back to section top

Column Defaults and OnUpdates

SQLAlchemy includes flexible constructs in which to create default values for columns upon the insertion of rows, as well as upon update. These defaults can take several forms: a constant, a Python callable to be pre-executed before the SQL is executed, a SQL expression or function to be pre-executed before the SQL is executed, a pre-executed Sequence (for databases that support sequences), or a "passive" default, which is a default function triggered by the database itself upon insert, the value of which can then be post-fetched by the engine, provided the row provides a primary key in which to call upon.

Pre-Executed Insert Defaults

A basic default is most easily specified by the "default" keyword argument to Column. This defines a value, function, or SQL expression that will be pre-executed to produce the new value, before the row is inserted:

# a function to create primary key ids
i = 0
def mydefault():
    global i
    i += 1
    return i

t = Table("mytable", meta, 
    # function-based default
    Column('id', Integer, primary_key=True, default=mydefault),

    # a scalar default
    Column('key', String(10), default="default")
)

The "default" keyword can also take SQL expressions, including select statements or direct function calls:

t = Table("mytable", meta, 
    Column('id', Integer, primary_key=True),

    # define 'create_date' to default to now()
    Column('create_date', DateTime, default=func.now()),

    # define 'key' to pull its default from the 'keyvalues' table
    Column('key', String(20), default=keyvalues.select(keyvalues.c.type='type1', limit=1))
    )

The "default" keyword argument is shorthand for using a ColumnDefault object in a column definition. This syntax is optional, but is required for other types of defaults, futher described below:

Column('mycolumn', String(30), ColumnDefault(func.get_data()))
back to section top

Pre-Executed OnUpdate Defaults

Similar to an on-insert default is an on-update default, which is most easily specified by the "onupdate" keyword to Column, which also can be a constant, plain Python function or SQL expression:

t = Table("mytable", meta, 
    Column('id', Integer, primary_key=True),

    # define 'last_updated' to be populated with current_timestamp (the ANSI-SQL version of now())
    Column('last_updated', DateTime, onupdate=func.current_timestamp()),
)

To use an explicit ColumnDefault object to specify an on-update, use the "for_update" keyword argument:

Column('mycolumn', String(30), ColumnDefault(func.get_data(), for_update=True))
back to section top

Inline Default Execution: PassiveDefault

A PassiveDefault indicates an column default that is executed upon INSERT by the database. This construct is used to specify a SQL function that will be specified as "DEFAULT" when creating tables.

t = Table('test', meta, 
    Column('mycolumn', DateTime, PassiveDefault("sysdate"))
)

A create call for the above table will produce:

CREATE TABLE test (
    mycolumn datetime default sysdate
)

PassiveDefault also sends a message to the Engine that data is available after an insert. The object-relational mapper system uses this information to post-fetch rows after the insert, so that instances can be refreshed with the new data. Below is a simplified version:

# table with passive defaults
mytable = Table('mytable', engine, 
    Column('my_id', Integer, primary_key=True),

    # an on-insert database-side default
    Column('data1', Integer, PassiveDefault("d1_func()")),
)
# insert a row
r = mytable.insert().execute(name='fred')

# check the result: were there defaults fired off on that row ?
if r.lastrow_has_defaults():
    # postfetch the row based on primary key.
    # this only works for a table with primary key columns defined
    primary_key = r.last_inserted_ids()
    row = table.select(table.c.id == primary_key[0])

When Tables are reflected from the database using autoload=True, any DEFAULT values set on the columns will be reflected in the Table object as PassiveDefault instances.

The Catch: Postgres Primary Key Defaults always Pre-Execute

Current Postgres support does not rely upon OID's to determine the identity of a row. This is because the usage of OIDs has been deprecated with Postgres and they are disabled by default for table creates as of PG version 8. Pyscopg2's "cursor.lastrowid" function only returns OIDs. Therefore, when inserting a new row which has passive defaults set on the primary key columns, the default function is still pre-executed since SQLAlchemy would otherwise have no way of retrieving the row just inserted.

back to section top

Defining Sequences

A table with a sequence looks like:

table = Table("cartitems", meta, 
    Column("cart_id", Integer, Sequence('cart_id_seq'), primary_key=True),
    Column("description", String(40)),
    Column("createdate", DateTime())
)

The Sequence is used with Postgres or Oracle to indicate the name of a database sequence that will be used to create default values for a column. When a table with a Sequence on a column is created in the database by SQLAlchemy, the database sequence object is also created. Similarly, the database sequence is dropped when the table is dropped. Sequences are typically used with primary key columns. When using Postgres, if an integer primary key column defines no explicit Sequence or other default method, SQLAlchemy will create the column with the SERIAL keyword, and will pre-execute a sequence named "tablename_columnname_seq" in order to retrieve new primary key values, if they were not otherwise explicitly stated. Oracle, which has no "auto-increment" keyword, requires that a Sequence be specified for a table if automatic primary key generation is desired.

A Sequence object can be defined on a Table that is then also used with a non-sequence-supporting database. In that case, the Sequence object is simply ignored. Note that a Sequence object is entirely optional for all databases except Oracle, as other databases offer options for auto-creating primary key values, such as AUTOINCREMENT, SERIAL, etc. SQLAlchemy will use these default methods for creating primary key values if no Sequence is present on the table metadata.

A sequence can also be specified with optional=True which indicates the Sequence should only be used on a database that requires an explicit sequence, and not those that supply some other method of providing integer values. At the moment, it essentially means "use this sequence only with Oracle and not Postgres".

back to section top

Defining Constraints and Indexes

UNIQUE Constraint

Unique constraints can be created anonymously on a single column using the unique keyword on Column. Explicitly named unique constraints and/or those with multiple columns are created via the UniqueConstraint table-level construct.

meta = MetaData()
mytable = Table('mytable', meta,

    # per-column anonymous unique constraint
    Column('col1', Integer, unique=True),

    Column('col2', Integer),
    Column('col3', Integer),

    # explicit/composite unique constraint.  'name' is optional.
    UniqueConstraint('col2', 'col3', name='uix_1')
    )
back to section top

CHECK Constraint

Check constraints can be named or unnamed and can be created at the Column or Table level, using the CheckConstraint construct. The text of the check constraint is passed directly through to the database, so there is limited "database independent" behavior. Column level check constraints generally should only refer to the column to which they are placed, while table level constraints can refer to any columns in the table.

Note that some databases do not actively support check constraints such as MySQL and sqlite.

meta = MetaData()
mytable = Table('mytable', meta,

    # per-column CHECK constraint
    Column('col1', Integer, CheckConstraint('col1>5')),

    Column('col2', Integer),
    Column('col3', Integer),

    # table level CHECK constraint.  'name' is optional.
    CheckConstraint('col2 > col3 + 5', name='check1')
    )
back to section top

Indexes

Indexes can be created anonymously (using an auto-generated name "ix_") for a single column using the inline index keyword on Column, which also modifies the usage of unique to apply the uniqueness to the index itself, instead of adding a separate UNIQUE constraint. For indexes with specific names or which encompass more than one column, use the Index construct, which requires a name.

Note that the Index construct is created externally to the table which it corresponds, using Column objects and not strings.

meta = MetaData()
mytable = Table('mytable', meta,
    # an indexed column, with index "ix_mytable_col1"
    Column('col1', Integer, index=True),

    # a uniquely indexed column with index "ix_mytable_col2"
    Column('col2', Integer, index=True, unique=True),

    Column('col3', Integer),
    Column('col4', Integer),

    Column('col5', Integer),
    Column('col6', Integer),
    )

# place an index on col3, col4
Index('idx_col34', mytable.c.col3, mytable.c.col4)

# place a unique index on col5, col6
Index('myindex', mytable.c.col5, mytable.c.col6, unique=True)

The Index objects will be created along with the CREATE statements for the table itself. An index can also be created on its own independently of the table:

# create a table
sometable.create()

# define an index
i = Index('someindex', sometable.c.col5)

# create the index, will use the table's bound connectable if the `bind` keyword argument not specified
i.create()
back to section top

Adapting Tables to Alternate Metadata

A Table object created against a specific MetaData object can be re-created against a new MetaData using the tometadata method:

# create two metadata
meta1 = MetaData('sqlite:///querytest.db')
meta2 = MetaData()

# load 'users' from the sqlite engine
users_table = Table('users', meta1, autoload=True)

# create the same Table object for the plain metadata
users_table_2 = users_table.tometadata(meta2)
back to section top
Previous: Database Engines | Next: Constructing SQL Queries via Python Expressions