[postgres@db1 .ssh]$ ssh-copy-id -i id_rsa.pub db2
The authenticity of host 'db2 (192.168.100.38)' can't be established.
RSA key fingerprint is 70:02:66:0a:f5:3a:62:52:55:a2:98:b1:1c:d7:6c:73.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'db2,192.168.100.38' (RSA) to the list of known hosts.
postgres@db2 's password:
Now try logging into the machine, with "ssh 'db2'", and check in:
.ssh/authorized_keys
to make sure we haven't added extra keys that you weren't expecting.
验证:
[postgres@db1 .ssh]$ ssh db2 date
Thu Dec 19 01:25:48 EST 2013
【当然也可通过说明文档中指出的使用walmgr3 �Cssh-keygen 命令做免密码登录认证】
[root@db1 ~]# cd /opt/pg93/
[root@db1 pg93]# mkdir data
[root@db1 pg93]# chown postgres:postgres data/
[root@db1 pg93]# su - postgres
[postgres@db1 ~]$ initdb -D /opt/pg93/data/
[postgres@db1 data]$ vi postgresql.conf
listen_addresses = '*'
port = 5432
logging_collector = on
log_filename = 'postgresql.log'
log_connections = on
log_disconnections = on
[postgres@master data]$ pg_ctl start
主端配置文件:
[postgres@db1 pg93]$ cat wal-master.ini
[walmgr]
job_name = wal-master
logfile = /opt/pg93/data/log/%(job_name)s.log
pidfile = /opt/pg93/data/pid/%(job_name)s.pid
use_skylog = 0
master_db = port=5432 host=db1 dbname=template1
master_data = /opt/pg93/data
master_config = /opt/pg93/data/postgresql.conf
master_bin = /opt/pg93/bin
# set this only if you can afford database restarts during setup and stop.
# master_restart_cmd = pg_ctlcluster 9.1 main restart
slave = db2
slave_config = /opt/pg93/wal-slave.ini
walmgr_data = db2:/opt/pg93/slave_walmanager/backup
completed_wals = %(walmgr_data)s/logs.complete
partial_wals = %(walmgr_data)s/logs.partial
full_backup = %(walmgr_data)s/data.master
config_backup = %(walmgr_data)s/config.backup
# syncdaemon update frequency
loop_delay = 10.0
# use record based shipping available since 8.2
use_xlog_functions = 0
# pass -z to rsync, useful on low bandwidth links
compression = 0
# keep symlinks for pg_xlog and pg_log
keep_symlinks = 1
# tell walmgr to set wal_level to hot_standby during setup
hot_standby = 1
# periodic sync
#command_interval = 600
#periodic_command = /var/lib/postgresql/walshipping/periodic.sh
备端配置文件:
[postgres@db2 pg93]$ cat wal-slave.ini
[walmgr]
job_name = wal-standby
logfile = /opt/pg93/data/log/%(job_name)s.log
use_skylog = 0
slave_data = /opt/pg93/data
slave_bin = /opt/pg93/bin
slave_stop_cmd = /opt/pg93/bin/pg_ctl -D /opt/pg93/data stop
slave_start_cmd = /opt/pg93/bin/pg_ctl -D /opt/pg93/data start
#slave_config_dir = /tmp/test_slave
slave_config_dir = /opt/pg93/data
walmgr_data = /opt/pg93/slave_walmanager/backup
completed_wals = %(walmgr_data)s/logs.complete
partial_wals = %(walmgr_data)s/logs.partial
full_backup = %(walmgr_data)s/data.master
config_backup = %(walmgr_data)s/config.backup
backup_datadir = no
keep_backups = 0
# archive_command =
# primary database connect string for hot standby -- enabling
# this will cause the slave to be started in hot standby mode.
primary_conninfo = host=db1 user=postgres port=5432
配置复制:
[postgres@db1 pg93]$ walmgr3 wal-master.ini setup
2013-12-20 02:05:19,289 32538 INFO Configuring WAL archiving
2013-12-20 02:05:19,294 32538 INFO Sending SIGHUP to postmaster
LOG: received SIGHUP, reloading configuration files
LOG: parameter "archive_command" changed to "/opt/skytools/bin/walmgr3 /opt/pg93/wal-master.ini xarchive %p %f"
2013-12-20 02:05:19,491 32538 INFO Done
此时配置文件postgresql.conf中如下参数发生变化:
wal_level = 'hot_standby'
archive_mode = 'on'
archive_command = '/opt/skytools/bin/walmgr3 /opt/pg93/data/wal-master.ini xarchive %p %f'
并且,在db2中生成了备份目录:
[postgres@db2 pg93]$ ll slave_walmanager/backup/
total 16
drwxrwxr-x 2 postgres postgres 4096 Dec 19 22:04 config.backup
drwxrwxr-x 2 postgres postgres 4096 Dec 19 22:04 data.master
drwxrwxr-x 2 postgres postgres 4096 Dec 19 22:04 logs.complete
drwxrwxr-x 2 postgres postgres 4096 Dec 19 22:04 logs.partial
修改max_wal_senders参数:
max_wal_senders = 3
在pg_hba.conf中加入以下两条:
[postgres@db1 data]$ vi pg_hba.conf
host postgres postgres 192.168.100.0/24 trust
host replication postgres 192.168.100.0/24 trust
执行基础备份:
[postgres@db1 pg93]$ walmgr3 wal-master.ini backup
2013-12-20 02:05:58,339 28752 INFO Backup lock obtained.
2013-12-20 02:05:58,347 32547 INFO Execute SQL: select pg_start_backup('FullBackup'); [port=5432 host=db1 dbname=template1]
2013-12-20 02:05:59,015 32550 INFO {count: 1, duration: 0.518224000931}
2013-12-20 02:06:00,098 32547 INFO Checking tablespaces
2013-12-20 02:06:00,099 32547 INFO pg_log does not exist, skipping
2013-12-20 02:06:00,274 32547 INFO Backup conf files from /opt/pg93/data
2013-12-20 02:06:00,648 28826 INFO First useful WAL file is: 00000003000000000000002B
2013-12-20 02:06:00,660 32547 INFO Execute SQL: select pg_stop_backup(); [port=5432 host=db1 dbname=template1]
2013-12-20 02:06:01,313 32564 INFO {count: 1, duration: 0.562467098236}
2013-12-20 02:06:01,681 32568 INFO {count: 1, duration: 0.282044887543}
2013-12-20 02:06:02,876 28878 INFO Backup lock released.
2013-12-20 02:06:02,884 32547 INFO Full backup successful
Backup过程会在walmgr_data中生成基础备份数据,如下:
[postgres@db2 pg93]$ ls slave_walmanager/backup/
config.backup/ data.master/ logs.complete/ logs.partial/
[postgres@db2 pg93]$ ls slave_walmanager/backup/
config.backup data.master logs.complete logs.partial
[postgres@db2 pg93]$ ls slave_walmanager/backup/data.master/
PG_VERSION backup_label.old global pg_clog pg_notify pg_snapshots pg_stat_tmp pg_twophase pid
backup_label base log pg_multixact pg_serial pg_stat pg_subtrans pg_xlog recovery.conf
在备端执行恢复:
[postgres@db2 pg93]$ walmgr3 wal-slave.ini restore
2013-12-20 02:09:38,190 28909 WARNING backup_datadir is disabled, deleting old data dir
2013-12-20 02:09:38,191 28909 INFO Move /opt/pg93/slave_walmanager/backup/data.master to /opt/pg93/data
2013-12-20 02:09:38,197 28909 INFO Write /opt/pg93/data/recovery.conf
2013-12-20 02:09:38,198 28909 INFO Restoring configuration files
2013-12-20 02:09:38,200 28909 INFO Starting postmaster: /opt/pg93/bin/pg_ctl -D /opt/pg93/data start
server starting
[postgres@db2 pg93]$ LOG: database system was interrupted; last known up at 2013-12-20 02:05:58 EST
LOG: entering standby mode
2013-12-20 02:09:38,494 28916 INFO 00000003.history: not found (ignored)
2013-12-20 02:09:38,576 28917 INFO 00000003000000000000002B: Found
2013-12-20 02:09:38,603 28917 INFO {count: 1}
LOG: restored log file "00000003000000000000002B" from archive
LOG: redo starts at 0/2B000028
LOG: consistent recovery state reached at 0/2B0000F0
2013-12-20 02:09:38,776 28921 INFO 00000003000000000000002C: not found (ignored)
LOG: fetching timeline history file for timeline 3 from primary server
LOG: started streaming WAL from primary at 0/2C000000 on timeline 3
在备库上生成的recovery.conf内容如下:
[postgres@db2 pg93]$ cat data/recovery.conf
restore_command = '/opt/skytools/bin/walmgr3 /opt/pg93/wal-slave.ini xrestore %f "%p" %r'
standby_mode = 'on'
trigger_file = '/opt/pg93/slave_walmanager/backup/logs.complete/STOP'
primary_conninfo = 'host=db1 user=postgres port=5432'
archive_cleanup_command = '/opt/pg93/bin/pg_archivecleanup /opt/pg93/slave_walmanager/backup/logs.complete %r'
【至此,流复制已经实现。 可将db1与db2的角色互换重复以上过程进行配置,便可实现db1与db2之间互相切换。】
主端:
[postgres@db1 pg93]$ ps -ef | grep post
postgres 605 1 0 02:48 pts/1 00:00:00 /opt/pg93/bin/postgres
postgres 607 605 0 02:48 ? 00:00:00 postgres: checkpointer process
postgres 608 605 0 02:48 ? 00:00:00 postgres: writer process
postgres 609 605 0 02:48 ? 00:00:00 postgres: wal writer process
postgres 610 605 0 02:48 ? 00:00:00 postgres: autovacuum launcher process
postgres 611 605 0 02:48 ? 00:00:00 postgres: archiver process last was 000000030000000000000030.00000028.backup
postgres 612 605 0 02:48 ? 00:00:00 postgres: stats collector process
postgres 660 605 0 02:50 ? 00:00:00 postgres: wal sender process postgres 192.168.100.38(56960) streaming 0/310000C8
[postgres@db1 pg93]$ createdb pgbench
[postgres@db1 pg93]$ pgbench -i -s 1 pgbench
NOTICE: table "pgbench_history" does not exist, skipping
NOTICE: table "pgbench_tellers" does not exist, skipping
NOTICE: table "pgbench_accounts" does not exist, skipping
NOTICE: table "pgbench_branches" does not exist, skipping
creating tables...
100000 of 100000 tuples (100%) done (elapsed 0.20 s, remaining 0.00 s).
vacuum...
set primary keys...
done.
[postgres@db1 pg93]$ psql
psql (9.3.2)
Type "help" for help.
postgres=# select pg_current_xlog_location();
pg_current_xlog_location
--------------------------
0/31CB94B0
(1 row)
备端:
[postgres@db2 pg93]$ ps -ef | grep post
root 24384 24063 0 Dec19 pts/1 00:00:00 su - postgres
postgres 24385 24384 0 Dec19 pts/1 00:00:00 -bash
postgres 29539 1 0 02:50 pts/1 00:00:00 /opt/pg93/bin/postgres -D /opt/pg93/data
postgres 29540 29539 0 02:50 ? 00:00:00 postgres: startup process recovering 000000030000000000000031
postgres 29544 29539 0 02:50 ? 00:00:00 postgres: checkpointer process
postgres 29545 29539 0 02:50 ? 00:00:00 postgres: writer process
postgres 29546 29539 0 02:50 ? 00:00:00 postgres: stats collector process
postgres 29548 29539 0 02:50 ? 00:00:00 postgres: wal receiver process streaming 0/310000C8
[postgres@db2 pg93]$ psql pgbench
psql (9.3.2)
Type "help" for help.
pgbench=# \d+
List of relations
Schema | Name | Type | Owner | Size | Description
--------+------------------+-------+----------+---------+-------------
public | pgbench_accounts | table | postgres | 13 MB |
public | pgbench_branches | table | postgres | 16 kB |
public | pgbench_history | table | postgres | 0 bytes |
public | pgbench_tellers | table | postgres | 16 kB |
(4 rows)
pgbench=# select pg_last_xlog_receive_location();
pg_last_xlog_receive_location
-------------------------------
0/31CB94B0
(1 row)
pgbench=# select pg_last_xlog_replay_location();
pg_last_xlog_replay_location
------------------------------
0/31CB94B0
(1 row)
【接2.2节环境进行以下实验】
【改变一个节点的provider】
当前拓扑关系如下:
[postgres@londiste1 londiste3]$ londiste3 db1.ini status
Queue: replika Local node: node1
node1 (root)
| Tables: 1/0/0
| Lag: 15s, Tick: 439
+--: node2 (branch)
| | Tables: 1/0/0
| | Lag: 15s, Tick: 439
| +--: node4 (branch)
| Tables: 1/0/0
| Lag: 15s, Tick: 439
+--: node3 (branch)
| Tables: 1/0/0
| Lag: 15s, Tick: 439
+--: node5 (branch)
Tables: 1/0/0
Lag: 15s, Tick: 439
将node4的provider更改为node3,如下:
[postgres@londiste1 londiste3]$ londiste3 db4.ini change-provider --provider=node3
2013-09-21 10:14:40,353 25458 INFO [node4] Consumer londiste_db4 tagged as paused
2013-09-21 10:14:40,366 25458 INFO Waiting for worker to accept
2013-09-21 10:14:41,371 25458 INFO Consumer 'londiste_db4' on node 'node4' paused
2013-09-21 10:14:41,386 25458 INFO [node3] Location registered
2013-09-21 10:14:41,409 25458 INFO [node3] Subscriber registered: node4
2013-09-21 10:14:41,416 25458 INFO [node4] Consumer provider node set to : node3
2013-09-21 10:14:41,424 25458 INFO [node4] Consumer londiste_db4 tagged as resumed
2013-09-21 10:14:41,431 25458 INFO Waiting for worker to accept
2013-09-21 10:14:42,436 25458 INFO Consumer 'londiste_db4' on node 'node4' resumed
2013-09-21 10:14:42,448 25458 INFO [node2] Subscriber unregistered: node4
[postgres@londiste1 londiste3]$ londiste3 db1.ini status
Queue: replika Local node: node1
node1 (root)
| Tables: 1/0/0
| Lag: 1s, Tick: 448
+--: node2 (branch)
| Tables: 1/0/0
| Lag: 1s, Tick: 448
+--: node3 (branch)
| Tables: 1/0/0
| Lag: 1s, Tick: 448
+--: node4 (branch)
| Tables: 1/0/0
| Lag: 1s, Tick: 448
+--: node5 (branch)
Tables: 1/0/0
Lag: 1s, Tick: 448
{node3已顺利接管node4}
将node5更改为node4的子节点,如下:
[postgres@londiste1 londiste3]$ londiste3 db5.ini change-provider --provider=node4
2013-09-21 10:16:11,661 25488 INFO [node5] Consumer londiste_db5 tagged as paused
2013-09-21 10:16:11,672 25488 INFO Waiting for worker to accept
2013-09-21 10:16:12,677 25488 INFO Consumer 'londiste_db5' on node 'node5' paused
2013-09-21 10:16:12,693 25488 INFO [node4] Location registered
2013-09-21 10:16:12,713 25488 INFO [node4] Subscriber registered: node5
2013-09-21 10:16:12,720 25488 INFO [node5] Consumer provider node set to : node4
2013-09-21 10:16:12,731 25488 INFO [node5] Consumer londiste_db5 tagged as resumed
2013-09-21 10:16:12,738 25488 INFO Waiting for worker to accept
2013-09-21 10:16:13,743 25488 INFO Consumer 'londiste_db5' on node 'node5' resumed
2013-09-21 10:16:13,756 25488 INFO [node3] Subscriber unregistered: node5
[postgres@londiste1 londiste3]$ londiste3 db1.ini status
Queue: replika Local node: node1
node1 (root)
| Tables: 1/0/0
| Lag: 3s, Tick: 451
+--: node2 (branch)
| Tables: 1/0/0
| Lag: 3s, Tick: 451
+--: node3 (branch)
| Tables: 1/0/0
| Lag: 3s, Tick: 451
+--: node4 (branch)
| Tables: 1/0/0
| Lag: 3s, Tick: 451
+--: node5 (branch)
Tables: 1/0/0
Lag: 3s, Tick: 451
{node5已成为node4的子节点}
将node4更改为node2的子节点,如下:
[postgres@londiste1 londiste3]$ londiste3 db4.ini change-provider --provider=node2
2013-09-21 10:17:20,633 25517 INFO [node4] Consumer londiste_db4 tagged as paused
2013-09-21 10:17:20,643 25517 INFO Waiting for worker to accept
2013-09-21 10:17:21,647 25517 INFO Consumer 'londiste_db4' on node 'node4' paused
2013-09-21 10:17:21,660 25517 INFO [node2] Location registered
2013-09-21 10:17:21,683 25517 INFO [node2] Subscriber registered: node4
2013-09-21 10:17:21,691 25517 INFO [node4] Consumer provider node set to : node2
2013-09-21 10:17:21,701 25517 INFO [node4] Consumer londiste_db4 tagged as resumed
2013-09-21 10:17:21,709 25517 INFO Waiting for worker to accept
2013-09-21 10:17:22,713 25517 INFO Consumer 'londiste_db4' on node 'node4' resumed
2013-09-21 10:17:22,725 25517 INFO [node3] Subscriber unregistered: node4
[postgres@londiste1 londiste3]$ londiste3 db1.ini status
Queue: replika Local node: node1
node1 (root)
| Tables: 1/0/0
| Lag: 45s, Tick: 452
+--: node3 (branch)
| Tables: 1/0/0
| Lag: 45s, Tick: 452
+--: node2 (branch)
| Tables: 1/0/0
| Lag: 45s, Tick: 452
+--: node4 (branch)
| Tables: 1/0/0
| Lag: 45s, Tick: 452
+--: node5 (branch)
Tables: 1/0/0
Lag: 45s, Tick: 452
{发现node4会携带其子节点受node2接管}
【使A节点直接接管B节点,接上小节】
使node3接管node5,如下:
[postgres@londiste1 londiste3]$ londiste3 db3.ini takeover node5
2013-09-21 10:19:58,929 25566 INFO old: node5
2013-09-21 10:19:58,998 25566 INFO [node5] Consumer londiste_db5 tagged as paused
2013-09-21 10:19:59,008 25566 INFO Waiting for worker to accept
2013-09-21 10:20:00,013 25566 INFO Consumer 'londiste_db5' on node 'node5' paused
2013-09-21 10:20:00,024 25566 INFO [node3] Location registered
2013-09-21 10:20:00,043 25566 INFO [node3] Subscriber registered: node5
2013-09-21 10:20:00,055 25566 INFO [node5] Consumer provider node set to : node3
2013-09-21 10:20:00,066 25566 INFO [node5] Consumer londiste_db5 tagged as resumed
2013-09-21 10:20:00,073 25566 INFO Waiting for worker to accept
2013-09-21 10:20:01,077 25566 INFO Consumer 'londiste_db5' on node 'node5' resumed
2013-09-21 10:20:01,091 25566 INFO [node4] Subscriber unregistered: node5
[postgres@londiste1 londiste3]$ londiste3 db1.ini status
Queue: replika Local node: node1
node1 (root)
| Tables: 1/0/0
| Lag: 2s, Tick: 459
+--: node2 (branch)
| | Tables: 1/0/0
| | Lag: 2s, Tick: 459
| +--: node4 (branch)
| Tables: 1/0/0
| Lag: 2s, Tick: 459
+--: node3 (branch)
| Tables: 1/0/0
| Lag: 2s, Tick: 459
+--: node5 (branch)
Tables: 1/0/0
Lag: 2s, Tick: 459
{已顺利接管}