Slony-I REL_1_1_0 Documentation | ||||
---|---|---|---|---|
Prev | Fast Backward | Fast Forward | Next |
One of the usual merits the use, by PostgreSQL, of Multi-Version Concurrency Control (MVCC) is that this eliminates a whole host of reasons to need to lock database objects. On some other database systems, you need to acquire a table lock in order to insert data into the table; that can severely hinder performance. On other systems, read locks can impede writes; with MVCC, PostgreSQL eliminates that whole class of locks in that "old reads" can access "old tuples." Most of the time, this allows the gentle user of PostgreSQL to not need to worry very much about locks.
Unfortunately, there are several sorts of Slony-I events that do require exclusive locks on PostgreSQL tables, with the result that modifying Slony-I configuration can bring back some of those "locking irritations." In particular:
A momentary table lock must be acquired on the "origin" node in order to add the trigger that collects updates for that table.
When a set origin is shifted from one node to another, locks must be acquired on the tables on both the old origin and the new origin in order to change the triggers on the tables.
This operation expressly requests locks on the tables in a given replication set on the origin node.
This operation runs a set of SQL queries; in order for it to work, the Slony-I triggers must be removed, followed by the query (which potentially updates the data) running, followed by triggers being restored. The operation therefore must acquire table locks on all replicated tables on each node.
During the COPY_SET event on a new subscriber
In a sense, this is the least provocative scenario, since, before the replication set has been populated, it is pretty reasonable to say that the node is "unusable" and that Slony-I could reasonably expect exclusive access to the node.
Each of these actions requires, at some point, modifying each of the tables in the affected replication set, which requires acquiring an exclusive lock on the table. Some users that have tried running these operations on Slony-I nodes that were actively servicing applications have experienced difficulties with deadlocks and/or with the operations hanging up.
The obvious question: "What to do about such deadlocks?"
Several possibilities admit themselves:
Announce an application outage to avoid deadlocks
If you can temporarily block applications from using the database, that will provide a window of time during which there is nothing running against the database other than administrative processes under your control.
Try the operation, hoping for things to work
Since nothing prevents applications from leaving access locks in your way, you may find yourself deadlocked. But if the number of remaining locks are small, you may be able to negotiate with users to "get in edgewise."
Use pgpool
If you can use this or some similar "connection broker", you may be able to tell the connection manager to stop using the database for a little while, thereby letting it "block" the applications for you. What would be ideal would be for the connection manager to hold up user queries for a little while so that the brief database outage looks, to them, like a period where things were running slowly.
Rapid Outage Management
The following procedure may minimize the period of the outage:
Modify pg_hba.conf so that only the slony user will have access to the database.
Issue a kill -SIGHUP to the PostgreSQL postmaster.
This will not kill off existing possibly-long-running queries, but will prevent new ones from coming in. There is an application impact in that incoming queries will be rejected until the end of the process.
If "all looks good," then it should be safe to proceed with the Slony-I operation.
If some old query is lingering around, you may need to kill -SIGQUIT one of the PostgreSQL processes. This will restart the backend and kill off any lingering queries. You probably need to restart the slon processes that attach to the node.
At that point, it will be safe to proceed with the Slony-I operation.
Reset pg_hba.conf to allow other users in, and kill -SIGHUP the postmaster to make it reload the security configuration.
The section on DDL Changes suggests some additional techniques that may be useful, such as moving tables between replication sets in such a way that you minimize the set of tables that need to be locked.
Regrettably, there is no perfect answer to this. If it is necessary to submit a MOVE SET request, then it is presumably necessary to accept the brief application outage. As Slony-I/ pgpool linkages improve, that may become a better way to handle this.