postgresql 监控 stream replication 的延迟情况

10

有些名称发生了变化,xlog =>wal,location => lsn

pg_xlog_location_diff            pg_wal_lsn_diff

pg_current_xlog_insert_location  pg_current_wal_insert_lsn
pg_current_xlog_location         pg_current_wal_lsn
pg_current_xlog_flush_location   pg_current_wal_flush_lsn

postgres=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 17836
usesysid         | 16674
usename          | replicator
application_name | walreceiver
client_addr      | 192.168.56.101
client_hostname  | 
client_port      | 12955
backend_start    | 2018-03-01 17:03:29.129844+08
backend_xmin     | 
state            | streaming
sent_lsn         | 0/4CCFB4B8
write_lsn        | 0/4CCFB4B8
flush_lsn        | 0/4CCFB4B8
replay_lsn       | 0/4CCFB4B8
write_lag        | 
flush_lag        | 
replay_lag       | 
sync_priority    | 0
sync_state       | async

主库查询流复制落后字节数,主要看replay_delay数!
在主库 postgres超级用户连接到postgres库。
pg_current_wal_insert_lsn() 写入 wal buffer 的位置
pg_current_wal_lsn() 写入 wal 文件的位置

select client_addr, 
       pg_wal_lsn_diff(pg_current_wal_insert_lsn(), pg_current_wal_lsn() ) as local_noflush_delay,
       pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn) as local_sent_delay,
       pg_wal_lsn_diff(sent_lsn, write_lsn) as stream_write_delay,
       pg_wal_lsn_diff(sent_lsn, flush_lsn) as stream_flush_delay,
       pg_wal_lsn_diff(sent_lsn, replay_lsn) as stream_replay_delay 
from pg_stat_replication
;

执行如下

postgres=# select client_addr, 
        pg_wal_lsn_diff(pg_current_wal_insert_lsn(), pg_current_wal_lsn() ) as local_noflush_delay,
        pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn) as local_sent_delay,
        pg_wal_lsn_diff(sent_lsn, write_lsn) as stream_write_delay,
        pg_wal_lsn_diff(sent_lsn, flush_lsn) as stream_flush_delay,
        pg_wal_lsn_diff(sent_lsn, replay_lsn) as stream_replay_delay 
 from pg_stat_replication;
 
-[ RECORD 1 ]-------+------------
client_addr         | 192.168.56.101
local_noflush_delay | 0
local_sent_delay    | 0
stream_write_delay  | 0
stream_flush_delay  | 0
stream_replay_delay | 0

9.6

postgres=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 8467 # sender的进程
usesysid         | 44673 # 复制的用户id
usename          | replica # 复制的用户用户名
application_name | walreceiver 
client_addr      | 10.12.12.12 # 复制的客户端地址
client_hostname  |
client_port      | 55804 # 复制的客户端端口
backend_start    | 2015-05-12 07:31:16.972157+08 # 这个主从搭建的时间
backend_xmin     |
state            | streaming # 同步状态 startup: 连接中、catchup: 同步中、streaming: 同步
sent_location    | 3/CF123560 # Master传送WAL的位置
write_location   | 3/CF123560 # Slave接收WAL的位置(写入到磁盘)
flush_location   | 3/CF123560 # Slave同步到磁盘的WAL位置(刷入到磁盘)
replay_location  | 3/CF123560 # Slave同步到数据库的WAL位置(应用到数据库)
sync_priority    | 0 #同步Replication的优先度?
                   0: 异步、1~?: 同步(数字越小优先度越高)
sync_state       | async # 有三个值,
                   async: 异步、
                   sync: 同步、
                   potential: 虽然现在是异步模式,但是有可能升级到同步模式
                   

–主库查询流复制落后字节数,主要看replay_delay数!
–在主库 postgres超级用户连接到postgres库。
–pg_xlog_location_diff 建到了postgres库,而不是其它库!
–pg_xlog_location_diff 的单位是 byte

select client_addr,
       pg_xlog_location_diff(pg_current_xlog_insert_location(), pg_current_xlog_location() ) as insert_local_delay,
       pg_xlog_location_diff(pg_current_xlog_location(), pg_current_xlog_flush_location() ) as local_flush_delay,
       pg_xlog_location_diff(pg_current_xlog_insert_location(), sent_location) as insert_sent_delay,
       pg_xlog_location_diff(pg_current_xlog_flush_location(), sent_location) as flush_sent_delay,
       pg_xlog_location_diff(sent_location, write_location) as stream_write_delay,
       pg_xlog_location_diff(sent_location, flush_location) as stream_flush_delay,
       pg_xlog_location_diff(sent_location, replay_location) as stream_replay_delay 
from pg_stat_replication;

–stream暂停、判断、唤醒

select pg_xlog_replay_pause() as replay_pause; 
select pg_is_xlog_replay_paused() as is_replay_paused;
select pg_xlog_replay_resume() as replay_resume ;

–在HotStandby,还可以执行如下一些函数,查看备库接收的WAL日志和应用WAL日志的状态:
pg_last_xlog_receive_location()
pg_last_xlog_replay_location()
pg_last_xact_replay_timestamp()
如下:

select pg_last_xlog_receive_location(),
       pg_last_xlog_replay_location(),
       pg_last_xact_replay_timestamp(), 
       clock_timestamp() ,
       clock_timestamp() - pg_last_xact_replay_timestamp() AS replication_delay;
 
-[ RECORD 1 ]-----------------+------------------------------
pg_last_xlog_receive_location | BAA/393C71D0
pg_last_xlog_replay_location  | BAA/393C71D0
pg_last_xact_replay_timestamp | 2017-07-24 09:12:43.701454+08
replication_delay             | -00:00:05.58923
(1 row)
 

或者查询 pg_stat_wal_receiver

select * from pg_stat_wal_receiver;

粗略判断slave延迟时间

#!/bin/bash

PGSQLHOST=127.0.0.1
PGSQLPORT=5432
PGSQLDATABASE=postgres
PGSQLUSER=postgres
PGSQLRESULTFILE=/tmp/tempCheckStreamRepliationDelay.txt


sql="select coalesce(extract(epoch FROM (clock_timestamp() - (select last_msg_receipt_time from pg_stat_wal_receiver ))),0::int4)"
result=""

result=`psql -h $PGSQLHOST -U $PGSQLUSER -d $PGSQLDATABASE -Atqc "$sql" `

echo "`date +%Y%m%d%H%M%S` $result" >> $PGSQLRESULTFILE

参考
https://www.postgresql.org/docs/10/static/functions-admin.html

你可能感兴趣的:(#,postgresql,ha,#,postgresql,check,monitor,pgsql,stream)