It is common for managers to have a desire to operate systems using some available, documented set of “best practices.” Documenting that sort of thing is essential to ISO 9000, ISO 9001, and other sorts of organizational certifications.
It is worthwhile to preface a discussion of “best practices” by mentioning that each organization that uses Slony-I is unique, and there may be a need for local policies to reflect unique local operating characteristics. It is for that reason that Slony-I does not impose its own policies for such things as failover ; those will need to be determined based on the overall shape of your network, of your set of database servers, and of your usage patterns for those servers.
There are, however, a number of things that early adopters of Slony-I have discovered which can at least help to suggest the sorts of policies you might want to consider.
Slony-I is a complex multi-client, multi-server system, with the result that there are almost an innumerable set of places where problems can arise.
As a natural result, maintaining a clean environment is really valuable, as any sort of environmental “messiness” can either cause unexpected problems or mask the real problem.
Numerous users have reported problems resulting from mismatches between Slony-I versions, local libraries, and PostgreSQL libraries. Details count; you need to be clear on what hosts are running what versions of what software.
Principle: Use an unambiguous, stable time zone such as UTC or GMT.
Users have run into problems when their system uses a time zone that PostgreSQL was unable to recognize such as CUT0 or WST. It is necessary that you use a timezone that PostgreSQL can recognize correctly.
It is furthermore preferable to use a time zone where times do not shift around due to Daylight Savings Time.
The “geographically unbiased” choice seems to be
or
TZ
=UTC
. TZ
=GMT
See also Section 3.3, “ Time Synchronization”.
Principle: Long running transactions are Evil
The FAQ has an entry on growth
of pg_listener
which discusses this in a fair
bit of detail; the long and short is that long running transactions
have numerous ill effects. They are particularly troublesome on an
“origin” node, holding onto locks, preventing vacuums
from taking effect, and the like.
Failover policies should be planned for ahead of time.
This may simply involve thinking about what the priority lists should be of what should fail to what, as opposed to trying to automate it. But knowing what to do ahead of time cuts down on the number of mistakes made.
At Afilias, some internal [The 3AM Unhappy DBA's Guide to...] guides have been created to provide checklists of what to do when certain “unhappy” events take place; this sort of material is highly specific to the applications running, so you would need to generate your own such documents.
MOVE SET should be used to allow preventative maintenance to prevent problems from becoming serious enough to require failover .
VACUUM
policy needs to be
carefully defined.
As mentioned above, “long running transactions are
Evil.” VACUUM
s are no exception in this. A
VACUUM
on a huge table will open a long-running
transaction with all the known ill effects.
Running all of the slon daemons on a central server for each network has proven preferable.
Each slon should run on a host on the same local network as the node that it is servicing, as it does a lot of communications with its database.
In theory, the “best” speed would come from running the slon on the database server that it is servicing.
In practice, having the slon processes strewn across a dozen servers turns out to be really inconvenient to manage, as making changes to their configuration requires logging onto a whole bunch of servers. In environments where it is necessary to use sudo for users to switch to application users, this turns out to be seriously inconvenient. It turns out to be much easier to manage to group the slon processes on one server per local network, so that one script can start, monitor, terminate, and otherwise maintain all of the nearby nodes.
That also has the implication that configuration data and configuration scripts only need to be maintained in one place, eliminating duplication of configuration efforts.
The Database Schema Changes section outlines some practices that have been found useful for handling changes to database schemas.
Handling of Primary Keys
Discussed in the section on Replication Sets, it is ideal if each replicated table has a true primary key constraint; it is acceptable to use a “candidate primary key.”
It is not recommended that a Slony-I-defined key be used to introduce a candidate primary key, as this introduces the possibility that updates to this table can fail due to the introduced unique index, which means that Slony-I has introduced a new failure mode for your application.
Grouping tables into sets suggests strategies for determining how to group tables and sequences into replication sets.
It should be obvious that actions that can delete a lot of data should be taken with great care; the section on Dropping things from Slony-I Replication discusses the different sorts of “deletion” that Slony-I supports.
Certain Slony-I operations, notably set add table
,
move set
,
lock set
,
and execute script
require acquiring exclusive locks on the
tables being replicated.
Depending on the kind of activity on the databases, this may or may not have the effect of requiring a (hopefully brief) database outage.
It has proven useful to define a slony
user
for use by Slony-I, as distinct from a generic
postgres
or pgsql
user.
If all sorts of automatic “maintenance”
activities, such as vacuum
ing and performing
backups, are performed under the “ownership” of a single
PostgreSQL user, it turns out to be pretty easy to run into deadlock
problems.
For instance, a series of vacuums
that
unexpectedly run against a database that has a large
SUBSCRIBE_SET
event under way may run into a
deadlock which would roll back several hours worth of data copying
work.
If, instead, different maintenance roles are performed by
different users, you may, during vital operations such as
SUBSCRIBE_SET
, lock out other users at the
pg_hba.conf
level, only allowing the
slony
user in, which substantially reduces the risk
of problems while the subscription is in progress.
Path configuration
The section on Path Communications discusses the issues surrounding what network connections need to be in place in order for Slony-I to function.
The section on listen paths discusses the issues surrounding the table sl_listen.
As of Slony-I 1.1, its contents are computed automatically based on the communications information available to Slony-I which should alleviate the problems found in earlier versions where this had to be configured by hand. Many seemingly inexplicable communications failures, where nodes failed to talk to one another even though they technically could, were a result of incorrect listen path configuration.
Configuring slon
As of version 1.1, slon configuration may be drawn either from the command line or from configuration files. “Best” practices have yet to emerge from the two options:
Configuration via command line options
This approach has the merit that all the options that are active are visible in the process environment. (And if there are a lot of them, they may be a nuisance to read.)
Unfortunately, if you invoke slon from the command line, you could forget to include log shipping configuration and thereby destroy the sequence of logs for a log shipping node.
Unlike when command line options are used, the active options are not visible. They can only be inferred from the name and/or contents of the slon configuration file, and will not reflect subsequent changes to the configuration file.
By putting the options in a file, you won't forget including any of them, so this is safer for log shipping.
Things to do when subscribing nodes
When a new node is running the COPY_SET
event for a large replication set (e.g. - one
which takes several hours to subscribe) it has been found to be
desirable to lock all users other than the slony
user out of the new subscriber because:
Applications will run into partially-copied, half-baked data that is not totally consistent.
It is possible for applications (and maintenance
scripts) to submit combinations of queries that will get the system
into a deadlock situation, thereby terminating the
COPY_SET
event, and requiring the subscription to
start over again.
It may be worth considering turning the
PostgreSQL fsync
functionality off during the
copying of data, as this will improve performance, and if the database
“falls over” during the COPY_SET
event, you will be restarting the copy of the whole replication
set.
Managing use of slonik
The notes on Using Slonik describe some of the lessons learned from managing large numbers of slonik scripts.
Handling Very Large Replication Sets
Some users have set up replication on replication sets that are
tens to hundreds of gigabytes in size, which puts some added
“strain” on the system, in particular where it may take
several days for the COPY_SET
event to complete.
Here are some principles that have been observed for dealing with
these sorts of situtations.
Drop all indices other than the primary key index
while the COPY_SET
event is run.
When data is copied into a table that has indices on it, PostgreSQL builds the indices incrementally, on the fly. This is much slower than simply copying the data into the table, and then recreating each index “ex nihilo”, as the latter can take substantial advantage of sort memory.
In a future release, it is hopeful that indices will be dropped and recreated automatically, which would eliminate this.
If there are large numbers of updates taking place as
the large set is being copied, this can lead to the subscriber being
behind by some enormous number of SYNC
events.
If a SYNC
is generated about once per
second, that leads to the subscriber “falling behind” by
around 90,000 SYNC
s per day, possibly for several
days.
There will correspondingly be an enormous
growth of sl_log_1 and sl_seqlog. Unfortunately, once the
COPY_SET
completes, users have found that the
queries against these tables wind up reverting to Seq
Scans
so that even though a particular
SYNC
processing event is only processing a small
number of those 90,000 SYNC
events, it still reads
through the entire table. In such a case, you may never see
replication catch up.
Several things can be done that will help, involving careful selection of slon parameters:
Ensure that there exists, on the
“master” node, an index on sl_log_1(log_xid)
. If it doesn't exist, as the Slony-I instance was set up
before version 1.1.1, see slony1_base.sql
for
the exact form that the index setup should take.
On the subscriber's slon, increase
the number of SYNC
events processed together, with
the slon_conf_sync_group_maxsize parameter to some
value that allows it to process a significant portion of the
outstanding SYNC
events.
On the subscriber's slon, set the
desired_sync_time to 0, as the adaptive
SYNC
grouping system will start with small
groupings that will, under these circumstances, perform
poorly.
Increase the slon_conf_sync_interval on the origin's slon so that SYNC
events are generated
less frequently. If a SYNC
is only generated once
per minute instead of once per second, that will cut down the number
of events by a factor of 60.
It is likely to be worthwhile to use slon_conf_vac_frequency to deactivate slon-initiated vacuuming in favor of running your own vacuum scripts, as there will be a buildup of unpurgeable data while the data is copied and the subscriber starts to catch up.