目录
一、系统要求
二、逻辑原理
三、汇报字段
四、监控告警设置
五、MySQL 监控指标采集脚本
1、60_port_collector.sh
2、30_monitordata_collector.sh
# cp /data/mysql-5.6.16/bin/mysql /usr/bin/
# cp /data/mysql-5.6.16/bin/mysqladmin /usr/bin/
通过 Base_Mysql_plugin/60_port_collector.sh 获取当前机器所有 MySQL 实例端口号,进而获取 MySQL 相关指标信息,push 到 open-falcon 中。
将脚本 30_monitordata_collector.sh 放置到 falcon-agent 的 plugin/Base_Mysql_plugin 目录,在 portal 上将 plugin 目录绑定到相应的 host group,falcon-agent 通过自身的调度器执行该脚本,由 falcon-agent 解析脚本的标准输出,将得到监控项推送到 falcon-judge 进行报警阀值判断判断。
module | prefix | metric | attribute | tag | type | note |
---|---|---|---|---|---|---|
Test_slave_status | mysqld_ |
slavestatus | compute | port=[mysql_port] |
GAUGE | 主从同步状态,正常值为 2 |
Test_app_alive | alive | compute | GAUGE | 数据库存活状态,正常值为 1 | ||
Test_max_connection | Threads_connected | undefined | GAUGE | 数据库连接数过多抛错时停止本次数据参加脚本(直接 exit) | ||
global_status |
Aborted_clients | compute | COUNTER | 由于客户端没有正确关闭连接导致客户端终止而中断的连接数 | ||
Aborted_connects | compute | COUNTER | 试图连接到MySQL服务器而失败的连接数 | |||
Bytes_received | compute | COUNTER | 从所有客户端接收到的字节数 | |||
Bytes_sent | compute | COUNTER | 发送给所有客户端的字节数 | |||
Com_lock_tables | compute | COUNTER | Com_xxx 语句计数变量表示每个xxx 语句执行的次数。每类语句有一个状态变量。例如,Com_select 和 Com_insert 分别统计 SELECT(QPS) 和 INSERT 语句执行的次数 |
|||
Com_rollback | compute | COUNTER | ||||
Com_delete | compute | COUNTER | ||||
Com_insert | compute | COUNTER | ||||
Com_insert_select | compute | COUNTER | ||||
Com_load | compute | COUNTER | ||||
Com_replace | compute | COUNTER | ||||
Com_select | compute | COUNTER | ||||
Com_update | compute | COUNTER | ||||
Qcache_hits | compute | COUNTER | 查询缓存被访问的次数 | |||
Slow_queries | compute | COUNTER | 查询时间超过 long_query_time 秒的查询的个数 | |||
Threads_connected | undefined | GAUGE | 当前打开的连接的数量 | |||
Threads_running | undefined | GAUGE | 激活的(非睡眠状态)线程数 | |||
Uptime | undefined | GAUGE | 服务器已经运行的时间(以秒为单位),判断数据库是否重启 | |||
slave_status | second_behind_master | undefined | GAUGE | 这个值是时间戳的差值。是 slave当前的时间戳和 master 记录该事件时的时间戳的差值 | ||
global_variables |
auto_increment_increment | undefined | GAUGE | 增量 | ||
auto_increment_offset | undefined | GAUGE | 起始值/偏移量 | |||
autocommit | undefined | GAUGE | 自动提交机制 | |||
binlog_format | undefined | GAUGE | 二进制日志格式 , 建议设置为 row | |||
general_log | undefined | GAUGE | 查询日志开关 | |||
gtid_mode | undefined | GAUGE | MySQL 通过全局变量 gtid_mode控制开启/关闭 GTID 模式 | |||
query_cache_size | undefined | GAUGE | 查询缓存大小 | |||
query_cache_type | undefined | GAUGE | 缓存类型,决定缓存什么样的查询 | |||
read_only | undefined | GAUGE | 当变量对复制从服务器设置为ON时,从服务器不允许更新,除非通过从服务器的线程或用户拥有SUPER权限。 可以确保从服务器不接受客户端的更新命令 |
|||
report_host | undefined | GAUGE | Report 系列是设置在从库上的,包含四个参数 report-[host|port|user|password]. 当 my.cnf 中设置了 report-host 时,在从库执行 start slave 的时候,会将 report-host 和 report-port(默认 3306)发给主库,主库记录在全局哈希结构变量 slave_list 中 |
|||
report_port | undefined | GAUGE | ||||
server_id | undefined | GAUGE | 服务器 ID | |||
server_uuid | undefined | GAUGE | 服务器 UUID | |||
skip_name_resolve | undefined | GAUGE | 跳过 DNS 反向解析 | |||
slave_skip_errors | undefined | GAUGE | 跳过的错误号 | |||
slow_query_log | undefined | GAUGE | 开启慢查询日志 | |||
sql_mode | undefined | GAUGE | 当前的服务器 SQL 模式,可以动态设置 | |||
time_zone | undefined | GAUGE | 当前的时区 | |||
tx_isolation | undefined | GAUGE | MySQL 默认的隔离级别 | |||
version | undefined | GAUGE | MySQL 版本 | |||
max_connections | undefined | GAUGE | 数据库最大连接数 |
一般以上标红指标可在 Open-Falcon 设置 Screen 以进一步做观察分析。
说明: 请根据实际情部署情况以及使用方式,自行调整监控项触发条件,以下报警条件只是基础监控,详细报警条件请自行调整。
监控项 |
告警触发条件 |
备注 |
---|---|---|
net.port.listen/port=[mysql_port] | all(#3)==0 | 数据库端口 down |
mysqld_alive/port=[mysql_port] | all(#2)==0 | 数据库存活状态异常 |
mysqld_Seconds_Behind_Master/port=[mysql_port] | all(#5)>=120 | MySQL主从延迟超过2分钟 |
mysqld_Uptime/port=[mysql_port] | diff(#2)<=0 | 数据库实例重启 |
mysqld_Threads_connected/port=[mysql_port] | all(#3)>=6000 | 数据库连接数过多 |
mysqld_max_connections/port=[mysql_port] | all(#2)<=5000 | 数据库最大连接数设置过低 |
功能:获取当前机器所有实例端口号,产出数据作为 30_monitordata_collector.sh 脚本的输入数据。
#!/bin/bash
service="mysqld"
dirname=$(cd $(dirname $0);pwd|awk -F\/ '$0=$NF')
path="/home/falcon2/falcon_monitor/$dirname"
Get_portstatus(){
mkdir -p $path
port_field=$(($(cat /etc/issue|awk -F'[ .]' 'NR==1&&$0=$3')-1))
ss -tunlp|awk '$NF~/'$service'/{match($'$port_field',/:([0-9]+)$/,a);print a[1]" '$service'"}' > $path/${service}_portstatus
}
Get_portstatus
#!/bin/bash
service=mysqld
#step=$(echo $0|grep -Po '\d+(?=_)')
step=60
dirname=$(cd $(dirname $0);pwd|awk -F\/ '$0=$NF')
path="/home/falcon2/falcon_monitor/$dirname"
mondata_file="$path/falcon_${service}_monitor_"
tmp_mondata_file="$path/tmp_falcon_${service}_monitor_"
binpath="/home/falcon2/agent/nagios/libexec"
mysqld_max_con=13684
user="wufeimonitor"
pass="wufei@show1024"
host="127.0.0.1"
metric_arrays=(metric_global_status metric_slave_status metric_global_variables)
metric_global_status=(Aborted_clients:compute Aborted_connects:compute Bytes_received:compute Bytes_sent:compute Com_lock_tables:compute Com_rollback:compute Com_delete:compute Com_insert:compute Com_insert_select:compute Com_load:compute Com_replace:compute Com_select:compute Com_update:compute Qcache_hits:compute Slow_queries:compute Threads_connected:undefined Threads_running:undefined Uptime:undefined)
metric_slave_status=(second_behind_master:undefined)
metric_global_variables=(auto_increment_increment:undefined auto_increment_offset:undefined autocommit:undefined binlog_format:undefined general_log:undefined gtid_mode:undefined query_cache_size:undefined query_cache_type:undefined read_only:undefined report_host:undefined report_port:undefined server_id:undefined server_uuid:undefined skip_name_resolve:undefined slave_skip_errors:undefined slow_query_log:undefined sql_mode:undefined time_zone:undefined tx_isolation:undefined version:undefined max_connections:undefined)
Get_current_value(){
flag=$1
case $flag in
global_status)
sql="show global status"
eval $(mysql -u$user -p$pass -h$host -P$port -e "$sql" 2>/dev/null|awk '{printf("mysqld_%s=\"%s\"\n",$1,$2)}')
;;
slave_status)
sql="show slave status\G"
eval $(mysql -u$user -p$pass -h$host -P$port -e "$sql" 2>/dev/null|awk -F'[: ]+' 'NR>1&&$0="mysqld_"$2"="$3')
;;
global_variables)
sql="show global variables"
eval $(mysql -u$user -p$pass -h$host -P$port -e "$sql" 2>/dev/null|awk '{printf("mysqld_%s=\"%s\"\n",$1,$2)}')
;;
esac
}
Get_last_value(){
eval $(cat $mondata_file$port|awk -F\| '{printf("%s_last=\"%s\"\n",$1,$2)}')
}
Curl_falcon(){
for metric_array in ${metric_arrays[@]};do
{
for pre_metric in $(eval echo \${$metric_array[@]});do
{
[[ "$pre_metric" =~ ':compute' ]] \
&& countertype="COUNTER" \
|| countertype="GAUGE"
metric="mysqld_${pre_metric%%:*}"
value=$(eval echo \$$metric)
echo $metric $value $countertype
curl -s -X POST -d '[{"metric":"'$metric'","endpoint":"'$HOSTNAME'","timestamp":'$(date +%s)',"step":'$step',"value":'$value',"counterType":"'$countertype'","tags":"port='$port'"}]' http://127.0.0.1:1988/v1/push &>/dev/null
} &
done
} &
done
}
Test_max_connection(){
/usr/bin/mysql -u$user -p$pass -h$host -P$port -e 'quit' 2>&1 |grep -qi 'Too many connections' \
&& curl -s -X POST -d '[{"metric":"mysqld_Threads_connected","endpoint":"'$HOSTNAME'","timestamp":'$(date +%s)',"step":'$step',"value":'$mysqld_max_con',"counterType":"GAUGE","tags":"port='$port'"}]' http://127.0.0.1:1988/v1/push &>/dev/null \
&& exit
}
Test_app_alive(){
app_alive_status=$(/usr/bin/mysqladmin -u$user -p$pass -h$host -P$port ping 2>/dev/null |grep -i "mysqld is alive"|wc -l)
curl -s -X POST -d '[{"metric":"mysqld_alive","endpoint":"'$HOSTNAME'","timestamp":'$(date +%s)',"step":'$step',"value":'$app_alive_status',"counterType":"GAUGE","tags":"port='$port'"}]' http://127.0.0.1:1988/v1/push &>/dev/null
}
Test_port_status(){
port_status=$($binpath/check_tcp -H 127.0.0.1 -p $port|awk -F'[ :]' '{print $2=="OK"?0:1}')
curl -s -X POST -d '[{"metric":"mysqld_port","endpoint":"'$HOSTNAME'","timestamp":'$(date +%s)',"step":'$step',"value":'$port_status',"counterType":"GAUGE","tags":"port='$port'"}]' http://127.0.0.1:1988/v1/push &>/dev/null
}
Test_slave_status(){
slave_flag=$(/usr/bin/mysql -u$user -p$pass -h$host -P$port -e "show slave status\G" |grep -i Master_Host |wc -l)
[[ "$slave_flag" == "1" ]] \
&& slave_status=$(/usr/bin/mysql -u$user -p$pass -h$host -P$port -e "show slave status\G" 2>/dev/null |egrep -i "Slave_IO_Running|Slave_SQL_Running"|grep -i "yes"|grep -v "grep"|wc -l) \
|| slave_status=2
curl -s -X POST -d '[{"metric":"mysqld_slavestatus","endpoint":"'$HOSTNAME'","timestamp":'$(date +%s)',"step":'$step',"value":'$slave_status',"counterType":"GAUGE","tags":"port='$port'"}]' http://127.0.0.1:1988/v1/push &>/dev/null
}
Main(){
while read port service;do
{
Test_slave_status
#Test_port_status
Test_app_alive
Test_max_connection
Get_current_value global_status
Get_current_value slave_status
Get_current_value global_variables
Curl_falcon
} &
done< $path/${service}_portstatus
wait
}
Main