Hi, Here are the notes on putting together an 8.2 based warm standby demonstration system. I hope people find them useful. As I mentioned in the presentation, the basic idea behind warm standby is to employ a shell script which periodically polls the WAL archive directory, executing the archive recovery procedure when it detects that a new WAL segment has arrived. The standby server thus processes archived segments as and when they are sent from the master. The shell script is specified in recovery.conf in the 'restore_command' parameter in recovery.conf. The problem of yet-to-be-archived transactions being lost in the event of a failure on the master is addressed by the addition of the 'archive_timeout' parameter in postgresql.conf, which will force a WAL segment to be archived every <archive_timeout> seconds. The procedure outlined below is just intended to produce a demonstration system - a production HA setup would be very different. 1) Obtain the 8.2 beta source distribution from http://www.postgresql.org/ftp/source/v8.2beta1/ 2) Create a directory tree into which you can install the 'master' and 'slave' postgres servers. mkdir -p ~/pg82demo/master mkdir -p ~/pg82demo/slave mkdir -p ~/pg82demo/wal_archive 3) Extract the postgres source archive. From the top level, execute: ./configure --prefix=/home/test/pg82demo/master make make install And then install the slave: ./configure --prefix=/home/test/pg82demo/slave make make install (Replace 'test' in the above with the appropriate string for your configuration. Throughout the rest of the notes, I'll specify the user 'test'.) 4) Go to the 'master' postgres installation, and initialise a database cluster: cd ~/pg82demo/master bin/initdb data/ Make the following entries in data/postgresql.conf: archive_command = 'rsync -arv %p /home/test/pg82demo/wal_archive/%f </dev/null' archive_timeout = 60 The reason for using 'rsync' instead of 'cp' will be explained later. 5) Start the master with bin/pg_ctl -D data/ start -l <logfile> 6) Connect to the master with psql, and initiate a base backup with: select pg_start_backup('test1'); Put psql in the background with ctrl-z, and take a backup of the data directory: tar cvzf base.tgz data Bring back psql with 'fg', and finalise the base backup: select pg_stop_backup(); Check that postgres has written the history file to the specified archive directory. These operations have to be performed as the database superuser. 7) Go to the 'slave' installation. Copy the base backup archive over from the master, and extract it: cd ../slave cp ../master/base.tgz . tar xvzf base.tgz rm -f base.tgz 8) Modify the slave's postgresql.conf as follows: port = 5433 The 'port' entry is modified so that the slave won't try to bind to the same port that the master has already bound to. Also, comment out the 'archive_command' and 'archive_timeout' entries in the slave's 'postgresql.conf' file. 9) Delete the 'postmaster.pid' file from the slave's data directory: rm -f data/postmaster.pid 10) Copy the recovery.conf.sample file from the 'share/postgresql/' direcory into the slave's data directory as 'recovery.conf' 11) Save the attached shell script 'restore.sh' to ~/pg82demo/, and make sure it is set executable. 12) Edit the shell script, and make sure the directory paths for TRIGGER_FILE are correct. Don't worry about what the 'trigger file' is yet exactly. The TRIGGER_FILE assignment statement in the shell script should look like: TRIGGER_FILE="/home/<username>/pg82demo/trigger" With <username> replaced with your configuration's path, of course. 13) Add the following to the data/recovery.conf on the slave: restore_command = '/home/test/pg82demo/restore.sh /home/test/pg82demo/wal_archive/%f "%p"' 14) Start the slave server with: bin/pg_ctl -D data/ -l <logfile> 15) Connect to the master, and trigger some WAL archive activity by creating a large table: create table foo as (select a,b from pg_class as a, pg_class as b); The master's log should show output like: LOG: archived transaction log file "000000010000000000000000" LOG: archived transaction log file "000000010000000000000000.00399A30.backup" And the slave's log should contain entries like: `/home/test/pg82test/wal_archive/000000010000000000000000.00399A30.backup' -> `pg_xlog/RECOVERYHISTORY' LOG: restored log file "000000010000000000000000.00399A30.backup" from archive `/home/test/pg82test/wal_archive/000000010000000000000000' -> `pg_xlog/RECOVERYXLOG' LOG: restored log file "000000010000000000000000" from archive LOG: checkpoint record is at 0/399A30 LOG: redo record is at 0/399A30; undo record is at 0/0; shutdown FALSE LOG: next transaction ID: 0/595; next OID: 10818 LOG: next MultiXactId: 1; next MultiXactOffset: 0 LOG: automatic recovery in progress LOG: redo starts at 0/399A78 As a new segment is archived by the master, it will log a message like: LOG: archived transaction log file "000000010000000000000001" Immediately the archive is generated, there should be a corresponding message on the slave which looks like: `/home/test/pg82demo/wal_archive/000000010000000000000001' -> `pg_xlog/RECOVERYXLOG' LOG: restored log file "000000010000000000000001" from archive 16) To initiate a failover from the master to the slave, create the 'trigger file': touch ~/pg82demo/trigger This should immediately cause the slave to finish processing archived segments, exit recovery mode, and come up ready for use. To repeat the test, it is not necessary to repeat the entire procedure above. Stop the slave, and delete its data directory. Then, continue the procedure from step (7) - that is, start the recovery procedure on the slave, beginning with extracting the base backup. The slave will process all the WAL segments in the archive directory, and stop after the last one. Until the failover is triggered, it will then wait for more segments to process, as described above. The reason rsync is used in the archive_command is that rsync features an 'atomic copy' - that is, the in-progress destination file is created as a temp file, and then renamed when the copy is complete. In the situation above, where segments are archived straight to the directory that the slave reads from, 'cp' can cause an error whereby the slave attempts to process a partially-copied WAL segment. If this happens, postgres will emit an error like: PANIC: archive file "000000010000000000000031" has wrong size: 1810432 instead of 16777216 LOG: startup process (PID 11356) was terminated by signal 6 LOG: aborting startup due to startup process failure Another wrinkle in the whole setup is the fact that when in recovery mode, postgres will ask for files to be recovered that do not, and will not, exist. This would seem to conflict with the idea of using the parameters to the restore_command shell script to poll the archive directory, since the script will hang indefinitely if it cannot find the file it is asked for. First off, it seems postgres will always ask for the file '00000001.history'. As a workaround, the supplied shell script looks for the string 'history' in the filename parameter. If detected, the script immediately tries to copy the file, returning cp's error code to postgres. I had thought this might be a bug, but see the post: http://archives.postgresql.org/pgsql-bugs/2006-06/msg00072.php The documentation doesn't quite make it clear whether or not the shell script is supposed to determine the expected WAL archive filename by looking at the sequence of previously archived files, or just by the passed parameter. This is an issue that requires a bit more investigation and experimentation. Thanks, Charles Duffy
Attachment: restore.sh
Description: Bourne shell script