1.版本
1)操作系统
cat /etc/issue
Red Hat Enterprise Linux Server release 5.5 (Tikanga)
Kernel \r on an \m
cat /proc/version
Linux version 2.6.32-504.el6.x86_64 ([email protected]) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-11) (GCC) ) #1 SMP Wed Oct 15 04:27:16 UTC 2014
2)mysql数据库版本
mysql --version
mysql Ver 14.14 Distrib 5.6.26, for linux-glibc2.5 (x86_64) using EditLine wrapper
2.问题描述
2.1我在从库的errorlog日志中发现了一个很奇怪的问题,在每天的凌晨4点从库的errorlog中都会报如下错误:
2015-12-12 04:00:01 3553 [Note] Error reading relay log event: slave SQL thread was killed 2015-12-12 04:03:34 3553 [Warning] Slave SQL: If a crash happens this configuration does not guarantee that the relay log info will be consistent, Error_code: 0 2015-12-12 04:03:34 3553 [Note] Slave SQL thread initialized, starting replication in log '3306-bin.000010' at position 835780077, relay log './3306-relay-bin.000021' position: 835780239 2015-12-13 04:00:01 3553 [Note] Error reading relay log event: slave SQL thread was killed 2015-12-13 04:04:33 3553 [Warning] Slave SQL: If a crash happens this configuration does not guarantee that the relay log info will be consistent, Error_code: 0 2015-12-13 04:04:33 3553 [Note] Slave SQL thread initialized, starting replication in log '3306-bin.000016' at position 32289757, relay log './3306-relay-bin.000039' position: 32289919
3.问题分析
3.1而我查看从库的状态发现从库状态是正常的(io_thread和sql_thread线程都是yes,也没有延时)
3.2查看了 主从两个库的errorlog,没有发现异常
3.3通过mysqlbinlog查看了两个库在凌晨4点左右的binlog日志也没有发现什么异常
3.4查看从库凌晨4点左右的generallog 发现在凌晨每天凌晨4点的时候会有如下记录
151213 4:00:01 554362 Connect root@localhost on 554362 Query /*!40100 SET @@SQL_MODE='' */ 554362 Query /*!40103 SET TIME_ZONE='+00:00' */ 554362 Query SHOW SLAVE STATUS 554362 Query STOP SLAVE SQL_THREAD >>备份开始时先停了 sql_thread线程 554362 Query SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ 554362 Query START TRANSACTION /*!40100 WITH CONSISTENT SNAPSHOT */ 554362 Query SHOW VARIABLES LIKE 'gtid\_mode' 554362 Query SHOW SLAVE STATUS 554362 Query UNLOCK TABLES 554362 Query SELECT LOGFILE_GROUP_NAME, FILE_NAME, TOTAL_EXTENTS, INITIAL_SIZE, ENGINE, EXTRA FROM INFORMATION_SCHEMA.FILES WHERE FILE_TYPE = 'UNDO LOG' AND FILE_NAME IS NOT NULL GROUP BY LOGFILE_GROUP_NAME, FILE_NAME, ENGINE ORDER BY LOGFILE_GROUP_NAME 554362 Query SELECT DISTINCT TABLESPACE_NAME, FILE_NAME, LOGFILE_GROUP_NAME, EXTENT_SIZE, INITIAL_SIZE, ENGINE FROM INFORMATION_SCHEMA.FILES WHERE FILE_TYPE = 'DATAFILE' ORDER BY TABLESPACE_NAME, LOGFILE_GROUP_NAME 554362 Query SHOW DATABASES 554362 Query SHOW VARIABLES LIKE 'ndbinfo\_version' 554362 Init DB b2b2c_recomm 554362 Query SHOW CREATE DATABASE IF NOT EXISTS `b2b2c_recomm` 554362 Query SAVEPOINT sp 554362 Query show tables 554362 Query show table status like 't\_recommend\_goods\_0001' 554362 Query SET SQL_QUOTE_SHOW_CREATE=1 554362 Query SET SESSION character_set_results = 'binary' 554362 Query show create table `t_recommend_goods_0001` 554362 Query SET SESSION character_set_results = 'utf8' 554362 Query show fields from `t_recommend_goods_0001` 554362 Query SELECT /*!40001 SQL_NO_CACHE */ * FROM `t_recommend_goods_0001` 554362 Query SET SESSION character_set_results = 'binary' 554362 Query use `b2b2c_recomm` 554362 Query select @@collation_database 554362 Query SHOW TRIGGERS LIKE 't\_recommend\_goods\_0001' 554362 Query SET SESSION character_set_results = 'utf8' 554362 Query ROLLBACK TO SAVEPOINT sp
......................................<span style="font-family: Arial;"> </span>
<pre name="code" class="html" style="color: rgb(51, 51, 51); font-size: 14px; line-height: 26px;"> 554362 Query SHOW SLAVE STATUS
554362 Query START SLAVE >>备份完成时启动slave##后面还有很长的记录,这里就省略了。我们看到在凌晨4点的时候root@localhost用户登录了数据库,然后执行了stop slave;命令,这个就是我们在errorlog中 每天凌晨4点的时候都能看到上面提到的报错的原因了。那么凌晨4点的时候数据库到在做什么呢?,为什么要stop slave;?(因为我们备份的参数中有--dump-slave)
3.5 crontab -l看了一下 这个从库在每天凌晨四点有一个定时备份任务
备份脚本类似如下:
OPTIONS="-h$HOSTNAME -u$USER -p$PASSWORD -P$PORT --single-transaction -A --triggers -R --events --default-character-set=utf8 --dump-slave=2"
##3.4中的general log中我们发现有"START TRANSACTION /*!40100 WITH CONSISTENT SNAPSHOT */" 语句,这个是使用--single-transaction备份innodb数据库时候,在备份开始之前执行的命令。所以我们在2.1中提到的errorlog中错误,是由每天定时备份从库造成的。
3.6 验证我们3.5中的想法
3.6.1手动在备库执行备份脚本
3.6.2检查errorlog发现 在备份发起的时间,errorlog中再次出现了2.1中的错误
3.6.3 show slave status\G;发现备库的 sql_thread 状态为no
3.6.4 检查general log 在备份发起的时间,general log中再次出现3.4中所示信息
##所以我们可以判断2.1中提到errorlog中错误,是由备库的定义备份任务导致的(备库备份时首先会stop slave;然后start transaction;)
3.6.5 我们再看一下官方文档中对 mysqldump中对--dump-slave的解释
<span style="color:#333333;"> --dump-slave[=value] This option is similar to --master-data except that it is used to dump a replication slave server to produce a dump file that can be used to set up another server as a slave that has the same master as the dumped server. It causes the dump output to include a CHANGE MASTER TO statement that indicates the binary log coordinates (file name and position) of the dumped slave's master. These are the master server coordinates from which the slave should start replicating. --dump-slave causes the coordinates from the master to be used rather than those of the dumped server, as is done by the --master-data option. In addition, specfiying this option causes the --master-data option to be overridden, if used, and effectively ignored. The option value is handled the same way as for --master-data (setting no value or 1 causes a CHANGE MASTER TO statement to be written to the dump, setting 2 causes the statement to be written but encased in SQL comments) and has the same effect as --master-data in terms of enabling or disabling other options and in how locking is handled. </span><span style="color:#ff0000;">This option causes mysqldump to stop the slave SQL thread before the dump and restart it again after. >>--dump-slave备份时会先stop sql_thread,等备份完成后再启动它</span><span style="color:#333333;"> In conjunction with --dump-slave, the --apply-slave-statements and --include-master-host-port options can also be used. This option was added in MySQL 5.5.3.</span>