在Db2 HADR系统中,如果打开了ROS(Read On Standby),在standby上可以运行查询,以减轻primary的负担。
但是,在primary上的一些特定操作,比如create table,或者reorg,会导致standby进入“replay-only window”。具体来讲,standby上所有的连接会被terminate掉,新的连接尝试也会被阻塞。只有replay操作完成之后,才允许建立连接。
这种行为(或者说是一种限制)显然会造成非常不好的用户体验。从Db2 V11.5开始,引入了一个registry variable叫 DB2_HADR_ROS_AVOID_REPLAY_ONLY_WINDOW
,其缺省值是ON,以尽量避免replay-only window。具体说来,在primary上的create table和reorg等操作,不会再断开standby上的连接。但是,如果有锁冲突(比如standby上某个表加了锁,而primary上对这个表做alter table操作),则该连接仍然会被断开。
准备好一套Db2 HADR环境,数据库名为 HADR
。
测试点:关闭 DB2_HADR_ROS_AVOID_REPLAY_ONLY_WINDOW
,则在primary上运行DDL会导致standby上的连接被force掉。
因为测试环境是Db2 V11.5,默认是打开 DB2_HADR_ROS_AVOID_REPLAY_ONLY_WINDOW
的,为了测试replay-only window,需要在standby上关闭 DB2_HADR_ROS_AVOID_REPLAY_ONLY_WINDOW
:
db2set DB2_HADR_ROS_AVOID_REPLAY_ONLY_WINDOW=OFF
然后验证一下:
[db2inst1@limb1 ~]$ db2set -all
[i] DB2_HADR_ROS_AVOID_REPLAY_ONLY_WINDOW=OFF
[i] DB2_STANDBY_ISO=UR
[i] DB2_HADR_ROS=ON
[i] DB2_ATS_ENABLE=YES
[i] DB2COMM=TCPIP
[g] DB2SYSTEM=limb1.fyre.ibm.com
注意:改变 DB2_HADR_ROS_AVOID_REPLAY_ONLY_WINDOW
设置后,需要重新activate一下standby:
[db2inst1@limb1 ~]$ db2 deactivate db hadr
DB20000I The DEACTIVATE DATABASE command completed successfully.
注:如果DB上有连接,则deactivate操作会失败,需要先把连接都断开,再deactivate。
[db2inst1@limb1 ~]$ db2 activate db hadr
DB20000I The ACTIVATE DATABASE command completed successfully.
接下来,在primary和standby上都建立DB连接,确认能访问 T1
表。以primary为例:
[db2inst1@myrmidon1 ~]$ db2 connect to hadr
Database Connection Information
Database server = DB2/LINUXX8664 11.5.0.0
SQL authorization ID = DB2INST1
Local database alias = HADR
[db2inst1@myrmidon1 ~]$ db2 "select * from t1"
C1 C2
----------- -----------
1 111
2 222
2 record(s) selected.
Standby上也一样。
现在,在primary上运行一个DDL:
[db2inst1@myrmidon1 ~]$ db2 "alter table t1 add column c3 int"
DB20000I The SQL command completed successfully.
此时,standby上的连接就会被断开:
[db2inst1@limb1 ~]$ db2 "select * from t1"
SQL1224N The database manager is not able to accept new requests, has
terminated all requests in progress, or has terminated the specified request
because of an error or a forced interrupt. SQLSTATE=55032
DDL运行的非常快,所以replay-only window时间很短,很快就能再次连接standby了。
测试点:打开 DB2_HADR_ROS_AVOID_REPLAY_ONLY_WINDOW
,则在primary上运行DDL不会导致standby上的连接被force掉。
在standby上,打开 DB2_HADR_ROS_AVOID_REPLAY_ONLY_WINDOW
:
[db2inst1@limb1 ~]$ db2set DB2_HADR_ROS_AVOID_REPLAY_ONLY_WINDOW=ON
[db2inst1@limb1 ~]$ db2set -all
[i] DB2_HADR_ROS_AVOID_REPLAY_ONLY_WINDOW=ON
[i] DB2_STANDBY_ISO=UR
[i] DB2_HADR_ROS=ON
[i] DB2_ATS_ENABLE=YES
[i] DB2COMM=TCPIP
[g] DB2SYSTEM=limb1.fyre.ibm.com
[db2inst1@limb1 ~]$ db2 deactivate db hadr
DB20000I The DEACTIVATE DATABASE command completed successfully.
[db2inst1@limb1 ~]$ db2 activate db hadr
DB20000I The ACTIVATE DATABASE command completed successfully.
接下来,在primary和standby上都建立连接,确认能访问 T1
表。以primary为例:
[db2inst1@myrmidon1 ~]$ db2 connect to hadr
Database Connection Information
Database server = DB2/LINUXX8664 11.5.0.0
SQL authorization ID = DB2INST1
Local database alias = HADR
[db2inst1@myrmidon1 ~]$ db2 "select * from t1"
C1 C2 C3
----------- ----------- -----------
1 111 -
2 222 -
2 record(s) selected.
Standby上也一样。
现在,在primary上运行一个DDL:
[db2inst1@myrmidon1 ~]$ db2 "alter table t1 add column c4 int"
DB20000I The SQL command completed successfully.
此时,standby上的连接仍然还在:
[db2inst1@limb1 ~]$ db2 "select * from t1"
C1 C2 C3 C4
----------- ----------- ----------- -----------
1 111 - -
2 222 - -
2 record(s) selected.
可见, DB2_HADR_ROS_AVOID_REPLAY_ONLY_WINDOW
有效防止了standby进入replay-only window。
测试点:即使打开了 DB2_HADR_ROS_AVOID_REPLAY_ONLY_WINDOW
,有锁冲突的连接仍然会被force掉。
在standby上,查询 T1
表,并且保持锁不释放:
[db2inst1@limb1 ~]$ db2 +c "select * from t1 with rr"
C1 C2 C3 C4
----------- ----------- ----------- -----------
1 111 - -
2 222 - -
2 record(s) selected.
然后,在primary上运行DDL:
[db2inst1@myrmidon1 ~]$ db2 "alter table t1 add column c5 int"
DB20000I The SQL command completed successfully.
此时,standby上的连接就会被断开:
[db2inst1@limb1 ~]$ db2 "select * from t1"
SQL1224N The database manager is not able to accept new requests, has
terminated all requests in progress, or has terminated the specified request
because of an error or a forced interrupt. SQLSTATE=55032
这是因为standby在replay log时有锁冲突,所以导致连接断开。
有很多DDL语句和操作会导致standby进入replay-only window,或者force掉有所冲突的连接(如果打开了 DB2_HADR_ROS_AVOID_REPLAY_ONLY_WINDOW
),比如:
注意:对于WORKLOAD的DDL操作,就算是打开了 DB2_HADR_ROS_AVOID_REPLAY_ONLY_WINDOW
,也仍然会导致replay-only window。
指标如下:
STANDBY_REPLAY_ONLY_WINDOW_ACTIVE
STANDBY_REPLAY_ONLY_WINDOW_START
STANDBY_REPLAY_ONLY_WINDOW_TRAN_COUNT
在primary或者standby上运行都可以:
[db2inst1@myrmidon1 ~]$ db2pd -hadr -db hadr
Database Member 0 -- Database HADR -- Active -- Up 0 days 01:50:39 -- Date 2023-05-25-02.22.38.394267
HADR_ROLE = PRIMARY
REPLAY_TYPE = PHYSICAL
HADR_SYNCMODE = ASYNC
STANDBY_ID = 1
LOG_STREAM_ID = 0
HADR_STATE = PEER
HADR_FLAGS = TCP_PROTOCOL
PRIMARY_MEMBER_HOST = myrmidon1.fyre.ibm.com
PRIMARY_INSTANCE = db2inst1
PRIMARY_MEMBER = 0
STANDBY_MEMBER_HOST = limb1.fyre.ibm.com
STANDBY_INSTANCE = db2inst1
STANDBY_MEMBER = 0
HADR_CONNECT_STATUS = CONNECTED
HADR_CONNECT_STATUS_TIME = 05/25/2023 02:00:30.703818 (1685005230)
HEARTBEAT_INTERVAL(seconds) = 30
HEARTBEAT_MISSED = 0
HEARTBEAT_EXPECTED = 213
HADR_TIMEOUT(seconds) = 120
TIME_SINCE_LAST_RECV(seconds) = 8
PEER_WAIT_LIMIT(seconds) = 0
LOG_HADR_WAIT_CUR(seconds) = 0.000
LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.000005
LOG_HADR_WAIT_ACCUMULATED(seconds) = 0.001
LOG_HADR_WAIT_COUNT = 254
SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 46080
SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 131072
PRIMARY_LOG_FILE,PAGE,POS = S0040380.LOG, 225, 168584769991
STANDBY_LOG_FILE,PAGE,POS = S0040380.LOG, 225, 168584769991
HADR_LOG_GAP(bytes) = 0
STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0040380.LOG, 225, 168584769991
STANDBY_RECV_REPLAY_GAP(bytes) = 0
PRIMARY_LOG_TIME = 05/25/2023 02:11:02.000000 (1685005862)
STANDBY_LOG_TIME = 05/25/2023 02:11:02.000000 (1685005862)
STANDBY_REPLAY_LOG_TIME = 05/25/2023 02:11:02.000000 (1685005862)
STANDBY_RECV_BUF_SIZE(pages) = 4300
STANDBY_RECV_BUF_PERCENT = 0
STANDBY_SPOOL_LIMIT(pages) = 25600
STANDBY_SPOOL_PERCENT = 0
STANDBY_ERROR_TIME = NULL
PEER_WINDOW(seconds) = 0
READS_ON_STANDBY_ENABLED = Y
STANDBY_REPLAY_ONLY_WINDOW_ACTIVE = N
当前不在replay-only window,所以只有 STANDBY_REPLAY_ONLY_WINDOW_ACTIVE
。
只能在primary上运行:
[db2inst1@myrmidon1 ~]$ db2 "select STANDBY_ID, STANDBY_REPLAY_ONLY_WINDOW_ACTIVE, STANDBY_REPLAY_ONLY_WINDOW_START, STANDBY_REPLAY_ONLY_WINDOW_TRAN_COUNT from table (mon_get_hadr(NULL))"
STANDBY_ID STANDBY_REPLAY_ONLY_WINDOW_ACTIVE STANDBY_REPLAY_ONLY_WINDOW_START STANDBY_REPLAY_ONLY_WINDOW_TRAN_COUNT
---------- --------------------------------- -------------------------------- -------------------------------------
1 N - -
1 record(s) selected.
同理,因为当前不在replay-only window,所以只有 STANDBY_REPLAY_ONLY_WINDOW_ACTIVE
有值。
为尽量减小DDL等操作造成replay-only window的影响,建议如下:
https://www.ibm.com/docs/en/db2/11.5?topic=standby-replay-only-window-hadr-database