环境介绍:

master:23.247.76.253

[root@nagios_client1 tool]#

mysql -V

mysql  Ver 14.14 Distrib 5.6.32, for linux-glibc2.5 (x86_64) using  EditLine wrapper

[root@nagios_client1 tool]#

cat /etc/redhat-release

CentOS release 6.7 (Final)


slave:23.247.78.254

[root@nagios_client2 ~]#

 mysql -V

mysql  Ver 14.14 Distrib 5.6.32, for linux-glibc2.5 (x86_64) using  EditLine wrapper

[root@nagios_client2 ~]#

 cat /etc/redhat-release

CentOS release 6.9 (Final)


1. 设置master

修改配置文件:

vim /usr/local/mysql_2/my.cnf


在[mysqld]部分查看是否有以下内容,如果没有则添加:

server-id=1

log-bin=mysql-bin


除了这两行是必须的外,还有两个参数,你可以选择性的使用:

binlog-do-db=test3,databasename2

binlog-ignore-db=databasename1,databasename2


binlog-do-db=需要复制的数据库名,多个数据库名,使用逗号分隔。binlog-ignore-db=不需要复制的数据库库名,多个数据库名,使用逗号分隔。这两个参数其实用一个就可以了。设置的test3为我要同步的数据库。


修改后重启:

/etc/init.d/mysqld restart

Shutting down MySQL.. SUCCESS! 

Starting MySQL. SUCCESS! 


mysql> grant replication slave on *.* to 'repl'@'23.247.78.254' identified by '123456';

//这里的repl是为slave端设置的访问master端mysql数据的用户,密码为123456,这里的23.247.78.254为slave的ip。

mysql> flush tables with read lock;  //锁定数据库,此时不允许更改任何数据

Query OK, 0 rows affected (0.00 sec)

mysql> show master status;  //查看状态,这些数据是要记录的,一会要在slave端用到

+------------------+----------+--------------+------------------+-------------------+

| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |

+------------------+----------+--------------+------------------+-------------------+

| mysql-bin.000002 |      330 | test3        |                  |                   |

+------------------+----------+--------------+------------------+-------------------+

1 row in set (0.00 sec)


设置slave:

先修改slave的配置文件my.cnf:

vim /etc/my.cnf


找到 “server-id = 1” 这一行,删除掉或者改为 “server-id = 2” 总之不能让这个id和master一样,否则会报错。另外在从上,你也可以选择性的增加如下两行,对应于主上增加的两行:

replicate-do-db=databasename1,databasename2

replicate-ignore-db=databasename1,databasename2


改完后,重启slave:

service mysqld restart



然后在slave上配置主从:

mysql> stop slave;

Query OK, 0 rows affected, 1 warning (0.00 sec)


mysql> change master to master_host='23.247.76.253', master_port=3306,

    -> master_user='repl', master_password='123456',

    -> master_log_file='mysql-bin.000002', master_log_pos=330;

Query OK, 0 rows affected, 2 warnings (0.02 sec)


mysql> start slave; 


//其中master_log_file和master_log_pos是在上面使用 show master status 查到的数据。执行完这一步后,需要在master上执行一步:

mysql> unlock tables;

Query OK, 0 rows affected (0.00 sec)


然后查看slave的状态:

mysql> show slave status\G;


遇到错误:

             Slave_IO_Running: No

            Slave_SQL_Running: Yes

                Last_IO_Errno: 1593

                Last_IO_Error: Fatal error: The slave I/O thread stops because master and slave have equal MySQL server ids; these ids must be different for replication to work (or the --replicate-same-server-id option must be used on slave but this does not always make sense; please check the manual before using it).


解决:

mysql> show variables like 'server_id';    //查看服务器的id号set global server_id=XX                  修改id号

+---------------+-------+

| Variable_name | Value |

+---------------+-------+

| server_id     | 1     |

+---------------+-------+

1 row in set (0.00 sec)


mysql> set global server_id=2;

Query OK, 0 rows affected (0.00 sec)


mysql> show variables like 'server_id';

+---------------+-------+

| Variable_name | Value |

+---------------+-------+

| server_id     | 2     |

+---------------+-------+

1 row in set (0.00 sec)


确认以下两项参数都为yes:

Slave_IO_Running: Yes

Slave_SQL_Running: Yes


测试主从同步:

master执行:

mysql> select count(*) from t1

    -> ;

+----------+

| count(*) |

+----------+

|        0 |

+----------+

1 row in set (0.00 sec)


mysql> drop table t1;      //删除t1表

Query OK, 0 rows affected (0.01 sec)


mysql> select count(*) from t1;

ERROR 1146 (42S02): Table 'test3.t1' doesn't exist


slave执行:

mysql> select count(*) from t1

    -> ;

+----------+

| count(*) |

+----------+

|        0 |

+----------+

1 row in set (0.00 sec)


mysql> select count(*) from t1;

ERROR 1146 (42S02): Table 'test3.t1' doesn't exist


在mster执行删除t1表,然后再slave查下t1表,提示t1表不存在。证明已经同步了master的操作;

主从配置起来很简单,但是这种机制也是非常脆弱的,一旦我们不小心在从上写了数据,那么主从也就被破坏了。另外如果重启master,务必要先把slave停掉,也就是说需要在slave上去执行stop slave 命令,然后再去重启master的mysql服务,否则很有可能就会中断了。当然重启完后,还需要把slave给开启 start slave。


nagios 监控 mysql 主从同步状态

slave查询

[root@nagios_client2 ~]# mysql -uroot -p"123456" -e "show slave status\G"|grep "Running:"

Warning: Using a password on the command line interface can be insecure.

             Slave_IO_Running: Yes

            Slave_SQL_Running: Yes

"Slave_IO_Running: Yes“和“Slave_SQL_Running: Yes”,这两个值全是"Yes"就表明主从库同步成功

命令:

[root@nagios_client2 ~]#

slave_status=($(mysql -uroot -p"123456" -e "show slave status\G"|grep Running |awk '{print $2}'))

Warning: Using a password on the command line interface can be insecure.

[root@nagios_client2 ~]#

echo ${slave_status[0]}

Yes

[root@nagios_client2 ~]#

echo ${slave_status[1]}

Yes

查看检查脚本:

cat /usr/local/nagios/libexec/check_mysql_slave


#!/bin/sh 
slave_status=($(mysql -uroot -p"123456" -e "show slave status\G"|grep Running |awk '{print $2}'))
if [ "${slave_status[0]}" = "Yes" -a "${slave_status[1]}" = "Yes" ] 
     then 
     echo "OK nagios_client2-slave is running" 
     exit 0 
else 
     echo "Critical nagios_client2-slave is error" 
     exit 2 
fi

在nrpe.cfg文件里加:

vi /usr/local/nagios/etc/nrpe.cfg

command[check_mysql_slave]=/usr/local/nagios/libexec/check_mysql_slave


重启nrpe:

pkill nrpe

/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d


服务器端的配置:

[root@nagios_server objects]# 

/usr/local/nagios/libexec/check_nrpe -H 23.247.78.254 -c check_mysql_slave

OK nagios_client2-slave is running


编辑mysql服务监控配置文件

vi /usr/local/nagios/etc/services/mysql.cfg 

加上:

define service {

use                   generic-service

host_name             nagios_client2

service_description   check_mysql_replication_status

check_command         check_nrpe!check_mysql_slave

max_check_attempts    2

normal_check_interval 2

retry_check_interval  2

check_period          24x7

notification_interval 10

notification_period   24x7

notification_options  w,u,c,r

contact_groups        admins

process_perf_data     1

}

加到mysql服务组:vi servicegroups.cfg

        members                 nagios_client1,port_3306,nagios_client2,port_3306,nagios_client2,check_mysql_replication_status


/etc/init.d/nagios 
checkconfig   #检测配置文件

[root@nagios_server services]#

/etc/init.d/nagios  reload        #重新加载配置文件


最后结果: