生产环境下遇到一个问题,有数据库节点的连接数略高,而实际业务压力不大。查看processlist发现有大量状态为“Waiting in connection_control plugin”的等待连接。
该状态的连接总数达到338个
应该是Connection-Control Plugins起作用了,在测试环境模拟一下。
该插件默认未启用,需要自行安装
mysql> INSTALL PLUGIN CONNECTION_CONTROL SONAME 'connection_control.so';
Query OK, 0 rows affected (0.40 sec)
mysql> INSTALL PLUGIN CONNECTION_CONTROL_FAILED_LOGIN_ATTEMPTS SONAME 'connection_control.so';
Query OK, 0 rows affected (0.01 sec)
执行以下SQL确认插件已生效
mysql> show plugins; 或
mysql> select PLUGIN_NAME, PLUGIN_STATUS from INFORMATION_SCHEMA.PLUGINS where PLUGIN_NAME like 'connection%';
+------------------------------------------+---------------+
| PLUGIN_NAME | PLUGIN_STATUS |
+------------------------------------------+---------------+
| CONNECTION_CONTROL | ACTIVE |
| CONNECTION_CONTROL_FAILED_LOGIN_ATTEMPTS | ACTIVE |
+------------------------------------------+---------------+
2 rows in set (0.00 sec)
mysql> show variables like "connection_control%";
+-------------------------------------------------+------------+
| Variable_name | Value |
+-------------------------------------------------+------------+
| connection_control_failed_connections_threshold | 3 |
| connection_control_max_connection_delay | 2147483647 |
| connection_control_min_connection_delay | 1000 |
+-------------------------------------------------+------------+
3 rows in set (0.00 sec)
参数含义:
connection_control_failed_connections_threshold:单个用户登录失败(由于密码错误引起)次数上限,默认3次
connection_control_max_connection_delay:失败上限之后再次尝试登录前最小等待时间,单位ms
connection_control_min_connection_delay:失败上限之后再次尝试登录前最小等待时间,默认1秒(1000ms)
上述3个参数均可以利用 set global 的方式在线修改。
例:禁止在MySQL运行过程中卸载,配置方法如下:
[mysqld]
plugin-load-add=connection_control.so
connection-control=FORCE_PLUS_PERMANENT
connection-control-failed-login-attempts=FORCE_PLUS_PERMANENT
connection_control_failed_connections_threshold=5
connection_control_max_connection_delay=2147483647
connection_control_min_connection_delay=1500
尝试3次错误输入密码后,在第4次登录时会delay 1秒(由connection_control_min_connection_delay指定),同时Connection_control_delay_generated计数+1(若登录密码继续输入错误,则delay秒数与计数器继续增加。直到成功登录为止之后,此时delay清零,但计数器不清零。需要注意的是,即使后续密码正确,依然要先延迟一定的秒数,才会进入账号校验流程)
mysql> show global status like "%conn%control%";
+------------------------------------+-------+
| Variable_name | Value |
+------------------------------------+-------+
| Connection_control_delay_generated | 1 |
+------------------------------------+-------+
1 row in set (0.00 sec)
上述信息同样可以在如下原始表中查询
mysql> select * from information_schema.CONNECTION_CONTROL_FAILED_LOGIN_ATTEMPTS;
+-------------------+-----------------+
| USERHOST | FAILED_ATTEMPTS |
+-------------------+-----------------+
| 'select_user'@'%' | 1 |
+-------------------+-----------------+
1 row in set (0.01 sec)
此外,可以通过如下SQL查看受限用户清单,包括来源用户、IP和登录失败次数
select * from connection_control_failed_login_attempts;
开启多个连接,继续试错下去,此时可以看到进程中
mysql> select * from information_schema.PROCESSLIST where USER='select_user';
+----+-------------+-----------+------+---------+------+--------------------------------------+------+---------+-----------+---------------+
| ID | USER | HOST | DB | COMMAND | TIME | STATE | INFO | TIME_MS | ROWS_SENT | ROWS_EXAMINED |
+----+-------------+-----------+------+---------+------+--------------------------------------+------+---------+-----------+---------------+
| 54 | select_user | localhost | NULL | Connect | 2 | Waiting in connection_control plugin | NULL | 2485 | 0 | 0 |
| 52 | select_user | localhost | NULL | Connect | 7 | Waiting in connection_control plugin | NULL | 7038 | 0 | 0 |
| 53 | select_user | localhost | NULL | Connect | 4 | Waiting in connection_control plugin | NULL | 4591 | 0 | 0 |
+----+-------------+-----------+------+---------+------+--------------------------------------+------+---------+-----------+---------------+
3 rows in set (0.00 sec)
这时,我们就回到了一开始提出的生产环境下遇到的问题。
由于问题连接过多,逐个kill掉显然不太现实
pt-kill --user=dba --ask-pass --socket=/tmp/mysql.sock --no-version-check --match-command Connect --match-state "Waiting in connection_control plugin" --victims all --interval 10 --print --kill
不方便安装pt工具的可以从如下表中检索ID信息
mysql> select ID from information_schema.PROCESSLIST where Command='Connect' and STATE='Waiting in connection_control plugin';
检索结果如下
再在文件编辑器中,利用查找替换功能,
(1)将 “ |” 替换为 ";"
(2)将 “|” 替换为 “kill”
执行拼接出来的kill语句后,问题暂时消失。
后续排查确认是zabbix监控账户密码配置出错导致。由于每分钟会进行探测,导致delay时间越来越大。
修改zabbix配置文件:/etc/zabbix/zabbix_agentd.d/userparameter_mysql.conf 的密码设置(如有调用脚本,同步修改)
UserParameter=mysql.size[*],bash -c 'echo "select sum($(case "$3" in both|"") echo "data_length+index_length";; data|index) echo "$3_length";; free) echo "data_free";; esac)) from information_schema.tables$([[ "$1" = "all" || ! "$1" ]] || echo " where table_schema=\"$1\"")$([[ "$2" = "all" || ! "$2" ]] || echo "and table_name=\"$2\"");" | HOME=/etc/zabbix /usr/bin/mysql -N'
UserParameter=mysql.ping,/usr/bin/mysqladmin -uzabbix -p'XXXXXX' ping | grep -c alive
UserParameter=mysql.version,/usr/bin/mysql -V
UserParameter=mysql_connection,/usr/bin/mysql -uzabbix -p'XXXXXX' -e "show status like '%Threads_connected%';" 2>/dev/null | grep -v '|' | awk '{print $2}'| tail -n 1
UserParameter=mysqlnodenumber[*],/bin/bash /etc/zabbix/scripts/mysql.sh $1
问题得以解决。
Connect状态的连接也会占用连接数。此类连接若大量积压,会很快达到max_connections的上限,最终引起too many connections的故障。
解决办法:
后台跑pt-kill,定期kill
pt-kill --victims=all --kill --match-command Connect --match-state "Waiting in connection_control plugin" --interval=120 u=dba,S=/tmp/mysql.sock --ask-pass --daemonize
如何避免root账号由于密码输入错误次数过多,导致延迟过久,无法马上登录的问题。
改善办法:
1、root账号授权时限定本地登录。最大限度避免远程恶意登录,快速推高connection_control插件的delay值,导致类似root@'%'账号受限的问题。
2、root账号禁止用于定期执行的脚本中。
3、手残党,密码都打不对,就不说了。
mysql> UNINSTALL PLUGIN CONNECTION_CONTROL_FAILED_LOGIN_ATTEMPTS;
mysql> UNINSTALL PLUGIN CONNECTION_CONTROL;
去除my.cnf文件中相关配置,防止启动报错。
MySQL 5.7 Reference Manual: 6.4.2.1 Connection-Control Plugin Installation
关于MySQL非法连接尝试的解决方法(会话控制插件CONNECTION_CONTROL)
non-bug #89155 Connection-control exhausts all max_connection resources
pt-kill 用法记录