Slony-I REL_1_1_0 Documentation | ||||
---|---|---|---|---|
Prev | Fast Backward | Fast Forward | Next |
The programs that actually perform Slony-I replication are the slon daemons.
You need to run one slon instance for each node in a Slony-I cluster, whether you consider that node a "master" or a "slave". Since a MOVE SET or FAILOVER can switch the roles of nodes, slon needs to be able to function for both providers and subscribers. It is not essential that these daemons run on any particular host, but there are some principles worth considering:
Each slon needs to be able to communicate quickly with the database whose "node controller" it is. Therefore, if a Slony-I cluster runs across some form of Wide Area Network, each slon process should run on or nearby the databases each is controlling. If you break this rule, no particular disaster should ensue, but the added latency introduced to monitoring events on the slon's "own node" will cause it to replicate in a somewhat less timely manner.
The very fastest results would be achieved by having each slon run on the database server that it is servicing. If it runs somewhere within a fast local network, performance will not be noticeably degraded.
It is an attractive idea to run many of the slon processes for a cluster on one machine, as this makes it easy to monitor them both in terms of log files and process tables from one location. This also eliminates the need to login to several hosts in order to look at log files or to restart slon instances.
There are two "watchdog" scripts currently available:
tools/altperl/slon_watchdog - an "early" version that basically wraps a loop around the invocation of slon, restarting any time it falls over
tools/altperl/slon_watchdog2 - a somewhat more intelligent version that periodically polls the database, checking to see if a SYNC has taken place recently. We have had VPN connections that occasionally fall over without signalling the application, so that the slon stops working, but doesn't actually die; this polling addresses that issue.
The slon_watchdog2 script is probably usually the preferable thing to run. It was at one point not preferable to run it whilst subscribing a very large replication set where it is expected to take many hours to do the initial COPY SET. The problem that came up in that case was that it figured that since it hasn't done a SYNC in 2 hours, something was broken requiring restarting slon, thereby restarting the COPY SET event. More recently, the script has been changed to detect COPY SET in progress.