测试原由
随着PXC的逐步上线。线上数据库的同步方式慢慢由之前的STATEMENT模式转换到了ROW模式。由于同步方式的改变引发了一些同步问题。
测试目的
一定程度上解决ROW模式下主从同步的问题。作为以后PXC集群down掉,人工修复的操作文档。
测试环境
masterold02:7301
masterold03:7302
skavetest178:7303
主库操作
vim my.cnf 加入下一面一句
binlog_format=ROW 数据库binlog使用ROW模式同步
分别赋予丛库同步用户的权限
grant all on *.* to okooo_rep@'192.168.%.%' identified by 'Bjfcmlc@Mhxzkhl';
flush privileges;
测试开始
测试基础同步功能
?.让test178作为从去同步old02的数据
CHANGE MASTER TO MASTER_HOST='192.168.8.72',MASTER_USER='okooo_rep',MASTER_PASSWORD='Bjfcmlc@Mhxzkhl',
MASTER_PORT=7301,MASTER_LOG_FILE='logbin.000001',MASTER_LOG_POS=4;
? 查看主从状态,我们看到很快test178就可以和old02保持一致了。
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.8.72
Master_User: okooo_rep
Master_Port: 7301
Connect_Retry: 60
Master_Log_File: logbin.000006
Read_Master_Log_Pos: 332
Relay_Log_File: relay.000007
Relay_Log_Pos: 475
Relay_Master_Log_File: logbin.000006
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
? 让test178作为从去同步old03的数据,我们看到很快test178也和old03保持一致了。
stop slave;
CHANGE MASTER TO MASTER_HOST='192.168.8.73',MASTER_USER='okooo_rep',MASTER_PASSWORD='Bjfcmlc@Mhxzkhl',MASTER_PORT=7302,MASTER_LOG_FILE='logbin.000001',MASTER_LOG_POS=4;
start slave;
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.8.73
Master_User: okooo_rep
Master_Port: 7302
Connect_Retry: 60
Master_Log_File: logbin.000005
Read_Master_Log_Pos: 332
Relay_Log_File: relay.000006
Relay_Log_Pos: 475
Relay_Master_Log_File: logbin.000005
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
总结:基础同步测试完成,说明在数据库新搭建结束的时候数据库中数据一致的情况下,test178可以正常的和old02和old03中任意主库同步数据。
写入测试
? 分别在old02,old03上建立新的数据库和表
create database row_slave;
CREATE TABLE `row_test` (
`id` int(10) unsigned NOT NULL,
`hostname` varchar(20) NOT NULL default '',
`create_time` datetime NOT NULL default '0000-00-00 00:00:00',
`update_time` datetime NOT NULL default '0000-00-00 00:00:00',
PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=1 ;
? old02写入数据
insert into row_test values(1,'old02','2013-12-11 00:00:00','2013-12-11 00:00:00');
insert into row_test values(2,'old02','2013-12-11 00:00:00','2013-12-11 00:00:00');
insert into row_test values(3,'old03','2013-12-11 01:00:00','2013-12-11 01:00:00');
insert into row_test values(4,'old03','2013-12-11 01:00:00','2013-12-11 01:00:00');
?查看old02,old03,test178 皆可以查出来
mysql> select * from row_test;
+----+----------+---------------------+---------------------+
| id | hostname | create_time | update_time |
+----+----------+---------------------+---------------------+
| 1 | old02 | 2013-12-11 00:00:00 | 2013-12-11 00:00:00 |
| 2 | old02 | 2013-12-11 00:00:00 | 2013-12-11 00:00:00 |
| 3 | old03 | 2013-12-11 01:00:00 | 2013-12-11 01:00:00 |
| 4 | old03 | 2013-12-11 01:00:00 | 2013-12-11 01:00:00 |
+----+----------+---------------------+---------------------+
?old03写入数据,此时old03(主)和test178(丛)在同步
insert into row_test values(5,'old03','2013-12-11 02:00:00','2013-12-11 02:00:00');
insert into row_test values(6,'old03','2013-12-11 02:00:00','2013-12-11 02:00:00');
?查看old03,test178 皆可查出。此时test178和 old02数据已经不一致了,丛库比old02多出2条数据id=5,6。
+----+----------+---------------------+---------------------+
| id | hostname | create_time | update_time |
+----+----------+---------------------+---------------------+
| 1 | old02 | 2013-12-11 00:00:00 | 2013-12-11 00:00:00 |
| 2 | old02 | 2013-12-11 00:00:00 | 2013-12-11 00:00:00 |
| 3 | old03 | 2013-12-11 01:00:00 | 2013-12-11 01:00:00 |
| 4 | old03 | 2013-12-11 01:00:00 | 2013-12-11 01:00:00 |
| 5 | old03 | 2013-12-11 02:00:00 | 2013-12-11 02:00:00 |
| 6 | old03 | 2013-12-11 02:00:00 | 2013-12-11 02:00:00 |
+----+----------+---------------------+---------------------+
?old02写入数据 此时主从库还是test178和old03在同步,和old02没有关系
insert into row_test values(7,'old02','2013-12-11 03:00:00','2013-12-11 03:00:00');
insert into row_test values(8,'old02','2013-12-11 03:00:00','2013-12-11 03:00:00');
?查看 old02的binlog 来找到插入id =7,8的 pos点
cd /home/okooo/apps/tmp_slave01/logs
../bin/mysqlbinlog --no-defaults --base64-output=decode-rows -v -v ./logbin.000007
# at 1399
#131211 11:36:42 server id 1287301 end_log_pos 1472 Query thread_id=5 exec_time=0 error_code=0
SET TIMESTAMP=1386733002/*!*/;
BEGIN
/*!*/;
# at 1472
# at 1529
#131211 11:36:42 server id 1287301 end_log_pos 1529 Table_map: `row_slave`.`row_test` mapped to number 33
#131211 11:36:42 server id 1287301 end_log_pos 1585 Write_rows: table id 33 flags: STMT_END_F
### INSERT INTO row_slave.row_test
### SET
### @1=7 /* INT meta=0 nullable=0 is_null=0 */
### @2='old02' /* VARSTRING(20) meta=20 nullable=0 is_null=0 */
### @3=2013-12-11 03:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */
### @4=2013-12-11 03:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */
# at 1585
#131211 11:36:42 server id 1287301 end_log_pos 1612 Xid = 40
COMMIT/*!*/;
# at 1612
#131211 11:36:43 server id 1287301 end_log_pos 1685 Query thread_id=5 exec_time=0 error_code=0
SET TIMESTAMP=1386733003/*!*/;
BEGIN
/*!*/;
# at 1685
# at 1742
#131211 11:36:43 server id 1287301 end_log_pos 1742 Table_map: `row_slave`.`row_test` mapped to number 33
#131211 11:36:43 server id 1287301 end_log_pos 1798 Write_rows: table id 33 flags: STMT_END_F
### INSERT INTO row_slave.row_test
### SET
### @1=8 /* INT meta=0 nullable=0 is_null=0 */
### @2='old02' /* VARSTRING(20) meta=20 nullable=0 is_null=0 */
### @3=2013-12-11 03:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */
### @4=2013-12-11 03:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */
# at 1798
#131211 11:36:43 server id 1287301 end_log_pos 1825 Xid = 41
COMMIT/*!*/;
DELIMITER ;
# End of log file
?改变test178的同步点和old02同步
stop slave;
CHANGE MASTER TO MASTER_HOST='192.168.8.72',MASTER_USER='okooo_rep',MASTER_PASSWORD='Bjfcmlc@Mhxzkhl',MASTER_PORT=7301,MASTER_LOG_FILE='logbin.000007',MASTER_LOG_POS=1399;
start slave;
show slave status\G
?发现old02数据改变以后丛库同步了old02的数据,这时候的test178(丛库) 已经拥有全部数据了。 其中id in(1,2,3,4)3库共有的。 id in(5,6 )old03独有的 id in (7,8) odl03独有的。
mysql> select * from row_test;
+----+----------+---------------------+---------------------+
| id | hostname | create_time | update_time |
+----+----------+---------------------+---------------------+
| 1 | old02 | 2013-12-11 00:00:00 | 2013-12-11 00:00:00 |
| 2 | old02 | 2013-12-11 00:00:00 | 2013-12-11 00:00:00 |
| 3 | old03 | 2013-12-11 01:00:00 | 2013-12-11 01:00:00 |
| 4 | old03 | 2013-12-11 01:00:00 | 2013-12-11 01:00:00 |
| 5 | old03 | 2013-12-11 02:00:00 | 2013-12-11 02:00:00 |
| 6 | old03 | 2013-12-11 02:00:00 | 2013-12-11 02:00:00 |
| 7 | old02 | 2013-12-11 03:00:00 | 2013-12-11 03:00:00 |
| 8 | old02 | 2013-12-11 03:00:00 | 2013-12-11 03:00:00 |
+----+----------+---------------------+---------------------+
总结:确认丛库表比主库表少数据不影响新数据写入
更新测试
?改变一条old02和test78都存在的数据 此时test178和old02同步数据,主从依然同步
update row_test set update_time =now() ,hostname ='old021' where id=7;
?改变一条old03和test178都有的数据此时test178和old02同步数据,没有和old03同步,改变old03的数据为下面做准备
update row_test set update_time =now() ,hostname ='old031' where id=5;
? 查看old03的binlog,寻找要同步的POS点
../bin/mysqlbinlog --no-defaults --base64-output=decode-rows -v -v ./logbin.000006
# at 1825
#131211 15:20:16 server id 1807302 end_log_pos 1906 Query thread_id=4 exec_time=0 error_code=0
SET TIMESTAMP=1386746416/*!*/;
SET @@session.time_zone='SYSTEM'/*!*/;
BEGIN
/*!*/;
# at 1906
# at 1963
#131211 15:20:16 server id 1807302 end_log_pos 1963 Table_map: `row_slave`.`row_test` mapped to number 33
#131211 15:20:16 server id 1807302 end_log_pos 2048 Update_rows: table id 33 flags: STMT_END_F
### UPDATE row_slave.row_test
### WHERE
### @1=5 /* INT meta=0 nullable=0 is_null=0 */
### @2='old03' /* VARSTRING(20) meta=20 nullable=0 is_null=0 */
### @3=2013-12-11 02:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */
### @4=2013-12-11 02:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */
### SET
### @1=5 /* INT meta=0 nullable=0 is_null=0 */
### @2='old031' /* VARSTRING(20) meta=20 nullable=0 is_null=0 */
### @3=2013-12-11 02:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */
### @4=2013-12-11 15:20:16 /* DATETIME meta=0 nullable=0 is_null=0 */
# at 2048
#131211 15:20:16 server id 1807302 end_log_pos 2075 Xid = 32
COMMIT/*!*/;
DELIMITER ;
# End of log file
?改变test178的同步点和old03同步
stop slave;
CHANGE MASTER TO MASTER_HOST='192.168.8.73',MASTER_USER='okooo_rep',MASTER_PASSWORD='Bjfcmlc@Mhxzkhl',MASTER_PORT=7302,MASTER_LOG_FILE='logbin.000006',MASTER_LOG_POS=1825;
start slave;
show slave status\G
?查看test178数据,发现更新成功 (确认修改不同行数据的时候,同时多个主同步数据不会相互牵制。深层理解,主从同步不会校验表数据是否一致和行数据是否一致。之后会继续验证这个观点)
mysql> select * from row_test;
+----+----------+---------------------+---------------------+
| id | hostname | create_time | update_time |
+----+----------+---------------------+---------------------+
| 1 | old02 | 2013-12-11 00:00:00 | 2013-12-11 00:00:00 |
| 2 | old02 | 2013-12-11 00:00:00 | 2013-12-11 00:00:00 |
| 3 | old03 | 2013-12-11 01:00:00 | 2013-12-11 01:00:00 |
| 4 | old03 | 2013-12-11 01:00:00 | 2013-12-11 01:00:00 |
| 5 | old031 | 2013-12-11 02:00:00 | 2013-12-11 15:20:16 |
| 6 | old03 | 2013-12-11 02:00:00 | 2013-12-11 02:00:00 |
| 7 | old021 | 2013-12-11 03:00:00 | 2013-12-11 15:15:34 |
| 8 | old02 | 2013-12-11 03:00:00 | 2013-12-11 03:00:00 |
+----+----------+---------------------+---------------------+
?修改在3个库上全都有的数据 首先改old03上的 id=1的数据
update row_test set update_time =now() ,hostname ='old032' where id=1;
?主丛库同步数据以后 test178和old03在同步数据
mysql> select * from row_test where id=1;
+----+----------+---------------------+---------------------+
| id | hostname | create_time | update_time |
+----+----------+---------------------+---------------------+
| 1 | old032 | 2013-12-11 00:00:00 | 2013-12-11 15:49:53 |
+----+----------+---------------------+---------------------+
?修改old02上同样的数据。
update row_test set update_time =now() ,hostname ='old022' where id=1;
? 查看old02上的binlog
../bin/mysqlbinlog --no-defaults --base64-output=decode-rows -v -v ./logbin.000007
# at 2075
#131211 15:51:15 server id 1287301 end_log_pos 2156 Query thread_id=9 exec_time=0 error_code=0
SET TIMESTAMP=1386748275/*!*/;
BEGIN
/*!*/;
# at 2156
# at 2213
#131211 15:51:15 server id 1287301 end_log_pos 2213 Table_map: `row_slave`.`row_test` mapped to number 33
#131211 15:51:15 server id 1287301 end_log_pos 2298 Update_rows: table id 33 flags: STMT_END_F
### UPDATE row_slave.row_test
### WHERE
### @1=1 /* INT meta=0 nullable=0 is_null=0 */
### @2='old02' /* VARSTRING(20) meta=20 nullable=0 is_null=0 */
### @3=2013-12-11 00:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */
### @4=2013-12-11 00:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */
### SET
### @1=1 /* INT meta=0 nullable=0 is_null=0 */
### @2='old022' /* VARSTRING(20) meta=20 nullable=0 is_null=0 */
### @3=2013-12-11 00:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */
### @4=2013-12-11 15:51:15 /* DATETIME meta=0 nullable=0 is_null=0 */
# at 2298
#131211 15:51:15 server id 1287301 end_log_pos 2325 Xid = 73
COMMIT/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
?修改test178到old02的同步点(主从和old02同步)
stop slave;
CHANGE MASTER TO MASTER_HOST='192.168.8.72',MASTER_USER='okooo_rep',MASTER_PASSWORD='Bjfcmlc@Mhxzkhl',MASTER_PORT=7301,MASTER_LOG_FILE='logbin.000007',MASTER_LOG_POS=2075;
start slave;
show slave status\G
?发现数据可以同步过来(old02的数据 覆盖了old03的数据,在一开始我们分析第一个binlog的时候就已经发现,ROW的同步是一个全行的update操作)
mysql> select * from row_test where id=1;
+----+----------+---------------------+---------------------+
| id | hostname | create_time | update_time |
+----+----------+---------------------+---------------------+
| 1 | old022 | 2013-12-11 00:00:00 | 2013-12-11 15:51:15 |
+----+----------+---------------------+---------------------+
总结:同时多个主同步数据不会相互牵制。深层理解,主从同步不会校验表数据是否一致和行数据是否一致。ROW的同步是一个全行的update操作。属于无脑执行,不会判断原始数据内容。
删除测试
?删除test178的id=1的数据
delete from row_test where id=1;
?更新old02的id=1的数据(主库和old02在同步数据)
update row_test set update_time =now() ,hostname ='old023' where id=1;
mysql> select * from row_test where id=1;
+----+----------+---------------------+---------------------+
| id | hostname | create_time | update_time |
+----+----------+---------------------+---------------------+
| 1 | old023 | 2013-12-11 00:00:00 | 2013-12-11 16:09:12 |
+----+----------+---------------------+---------------------+
?在test178上看丛库同步状态
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.8.72
Master_User: okooo_rep
Master_Port: 7301
Connect_Retry: 60
Master_Log_File: logbin.000007
Read_Master_Log_Pos: 3078
Relay_Log_File: relay.000002
Relay_Log_Pos: 500
Relay_Master_Log_File: logbin.000007
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB: mysql
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1032
Last_Error: Could not execute Update_rows event on table row_slave.row_test; Can't find record in 'row_test', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log logbin.000007, end_log_pos 2549
Skip_Counter: 0
Exec_Master_Log_Pos: 2325
Relay_Log_Space: 1399
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1032
Last_SQL_Error: Could not execute Update_rows event on table row_slave.row_test; Can't find record in 'row_test', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log logbin.000007, end_log_pos 2549
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1287301
错误解释:主从数据库中表的数据不一致导致。进过上面的实验我们发现,只有delete数据才会出现这个错误。
现在为止重现了schedule的PXC倒掉以后的备份库同步失败的现象。
总结:当数据不存在丛库的时候,主库的更新无法执行。
测试总结:当丛库上表的数据和主库不一致的时候,可以执行insert操作。update操作会把最后一次执行的记录覆盖到丛库上。delete的数据如果不存在的话,则detele失败,导致主从不同步。
修复方式
1.暴力的方法,也是对数据重要的方法
stop slave;
SET GLOBAL sql_slave_skip_counter=1; 跳过一句丛库同步
start slave;
2.针对小量数据比较好的方式,手动修改丛库数据。以为在上面我们知道ROW模式检验数据一致性,只是覆盖数据。所以,我们只要补上缺失的数据即可。
insert into row_test values(1,'new_row',now(),now());
mysql> select * from row_test where id=1; 我们加入了一条自己编的数据 hostname=‘new_row’
+----+----------+---------------------+---------------------+
| id | hostname | create_time | update_time |
+----+----------+---------------------+---------------------+
| 1 | new_row | 2013-12-11 08:49:37 | 2013-12-11 08:49:37 |
+----+----------+---------------------+---------------------+
stop slave;
start slave;
mysql> select * from row_test where id=1; 数据变成了同步以后的数据
+----+----------+---------------------+---------------------+
| id | hostname | create_time | update_time |
+----+----------+---------------------+---------------------+
| 1 | old023 | 2013-12-11 00:00:00 | 2013-12-11 16:09:12 |
+----+----------+---------------------+---------------------+
3.最保险的方式,同时也是数据量比较大的时候。我们可以找到主库上写入id=1的这个时间点的binlog,让数据重头开始同步数据。(这个方式时间比较长,基本是基于时间点的增量数据恢复)