MySQL-主从复制-下

主从复制监控

1.1 监控方法 
	
	a. 主库做个修改操作,看看从库有没有做。
	b. 通过相关命令监控   √
	c. 通过第三方工具监控

1.2 通过相关命令监控

a. 主库:

mysql> show processlist;
mysql> show slave hosts;
	+-----------+----------------+------+-----------+--------------------------------------+
	| Server_id | Host           | Port | Master_id | Slave_UUID                           |
	+-----------+----------------+------+-----------+--------------------------------------+
	|         9 | 10.0.0.51:3309 | 3309 |         7 | dc20d2d8-ba7b-11ea-9f57-000c295bb94f |
	|         8 | 10.0.0.51:3308 | 3308 |         7 | d8e965c5-ba7b-11ea-9d1e-000c295bb94f |
	+-----------+----------------+------+-----------+--------------------------------------+
	2 rows in set (0.00 sec)

b. 从库监控:

mysql> show slave status \G

1. 主库相关信息(来自于master_info)
	Master_Host: 10.0.0.51
	Master_User: repl
	Master_Port: 3307
	Connect_Retry: 10
	Master_Log_File: mysql-bin.000008
	Read_Master_Log_Pos: 444

2. 从库relay-log相关信息(relay_info)
	Relay_Log_File: db01-relay-bin.000004
	Relay_Log_Pos: 320

3. relaylog和binlog的对应关系
	Relay_Master_Log_File: mysql-bin.000008
	Exec_Master_Log_Pos: 444

4. 线程状态有关的信息 				  
	Slave_IO_Running: Yes
	Slave_SQL_Running: Yes
	Last_IO_Errno: 0
	Last_IO_Error: 
	Last_SQL_Errno: 0
	Last_SQL_Error: 

5. 过滤复制相关信息
	Replicate_Do_DB: 
	Replicate_Ignore_DB: 
	Replicate_Do_Table: 
	Replicate_Ignore_Table: 
	Replicate_Wild_Do_Table: 
	Replicate_Wild_Ignore_Table: 

6. 主从延时的时间
	Seconds_Behind_Master: 0

7. 延时从库状态信息
	SQL_Delay: 0
	SQL_Remaining_Delay: NULL
	  	  
8. GTID复制相关	  
	Retrieved_Gtid_Set: 
	Executed_Gtid_Set: 		  

2.主从常见故障分析及处理思路

2.1 如何监控 
	Slave_IO_Running: Yes
	Slave_SQL_Running: Yes
	Last_IO_Errno: 0
	Last_IO_Error: 
	Last_SQL_Errno: 0
	Last_SQL_Error: 

2.2 IO线程故障 
#1. 建立连接 (connecting)
外部因素: 网络不通、防火墙
内部因素: 
	用户、密码错误
	port、IP错误
	主库连接数满了,资源耗尽

3.故障重现:

3.1主库修改repl的密码
	mysql> alter user repl@'10.0.0.%' identified by '123456';
	Query OK, 0 rows affected (0.00 sec)
3.2从库重启线程
	stop slave;
	start slave;

	Slave_IO_Running: Connecting
	Last_IO_Errno: 1045
	Last_IO_Error: error connecting to master '[email protected]:3307' - retry-time: 10  retries: 1

4.故障重现:

4.1 修改主库server_id与从库一致。
	[root@db01 data]# mysql -S /tmp/mysql3307.sock -e "set global server_id=8"
	[root@db01 data]# mysql -S /tmp/mysql3307.sock -e "select @@server_id"
	+-------------+
	| @@server_id |
	+-------------+
	|           8 |
	+-------------+
	[root@db01 data]# mysql -S /tmp/mysql3308.sock -e "select @@server_id"
	+-------------+
	| @@server_id |
	+-------------+
	|           8 |
	+-------------+
	
4.2 重启从库线程
	[root@db01 data]# mysql -S /tmp/mysql3307.sock -e "stop slave;" 
	[root@db01 data]# mysql -S /tmp/mysql3307.sock -e "start slave;"

4.3解决方法: 
	[root@db01 data]# mysql -S /tmp/mysql3307.sock -e "set global server_id=7"
	[root@db01 data]# mysql -S /tmp/mysql3308.sock -e "stop slave ; start slave;"

主从复制(Source-Replica)-下-高级进阶

1.延时从库

1.1配置延时从库
SQL线程延时:数据已经写入relaylog中了,SQL线程"慢点"运行
一般企业建议3-6小时,具体看公司运维人员对于故障的反应时间

mysql>stop slave;
mysql>CHANGE MASTER TO MASTER_DELAY = 300;
mysql> start slave;
mysql> show slave status \G
SQL_Delay: 300
SQL_Remaining_Delay: NULL

2.延时从库应用 *****

2.1 故障恢复思路

1主1从,从库延时5分钟,主库误删除1个库
1. 5分钟之内 侦测到误删除操作
2. 停从库SQL线程
3. 截取relaylog
	起点 :停止SQL线程时,relay最后应用位置
		Relay_Log_File: db01-relay-bin.000002
		Relay_Log_Pos: 320
	终点:误删除之前的position(GTID)
4. 恢复截取的日志到从库
5. 从库身份解除,替代主库工作

2.2 故障模拟及恢复

2.2.1 主库数据操作
	create database relay charset utf8;
	use relay
	create table t1 (id int);
	insert into t1 values(1);
	drop database relay;
2.2.2. 停止从库SQL线程
	stop slave sql_thread;
2.2.3 找relaylog的截取起点和终点
	起点:
		Relay_Log_File: db01-relay-bin.000002
		Relay_Log_Pos: 485
	终点:
		mysql> show relaylog events in 'db01-relay-bin.000002';
		| db01-relay-bin.000002 | 1145 | Query          |         7 |        1074 | drop database relay       

2.2.4 截取日志到从库
	mysqlbinlog --start-position=485 --stop-position=1145  /data/3308/data/db01-relay-	bin.000002>/tmp/relay.sql
从库恢复relaylog
	source /tmp/relay.sql
2.2.5 从库身份解除
	db01 [relay]>stop slave;
	db01 [relay]>reset slave all

你可能感兴趣的:(MySQL-主从复制-下)