一步一步从Linux线程到MySQL慢sql定位

中午了,正在吃着黄焖鸡外卖,突然手机短信声音响了,一看一台mysql数据库服务器的cpu,IO使用率查过告警阀值了,都快到100%。赶紧放下黄焖鸡,快速登录到有问题的mysql数据库服务器。

下面模拟一下当时排查的步骤,使用的mysql5.7.26版本
用iostat看一下IO情况

[mysql@localhost ~]$ iostat -mxt 1
Linux 3.10.0-1062.9.1.el7.x86_64 (localhost.localdomain)        08/23/2020      _x86_64_        (1 CPU)

08/23/2020 03:32:41 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.49    0.00   78.02    6.59    0.00    9.89

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00 3460.44    0.00    54.07     0.00    32.00     0.89    0.26    0.26    0.00   0.26  89.45
scd0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-0              0.00     0.00 3460.44    0.00    54.07     0.00    32.00     0.90    0.26    0.26    0.00   0.26  89.56
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

用top看看一下cpu使用情况,看看什么进程导致cpu高

top - 15:37:21 up  8:50,  5 users,  load average: 0.45, 0.19, 0.12
Tasks: 144 total,   1 running, 143 sleeping,   0 stopped,   0 zombie
%Cpu(s): 99.0 us,  1.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  2175376 total,   663640 free,   599256 used,   912480 buff/cache
KiB Swap:  2097148 total,  2096884 free,      264 used.  1417460 avail Mem 

   PID USER      PR   NI    VIRT    RES        SHR S      %CPU %MEM     TIME+ COMMAND                                                                              
 40474 mysql     20   0 1266400 324264   9496 S    96.7 14.9   3:42.00 mysqld                                                                               
 40916 root      20   0  204492  12108   4304 S       2.7  0.6   1:22.59 iotop                                                                                
     1   root      20   0  125520   3716   2344 S  0.0  0.2   0:05.60 systemd                                                                              
     2   root      20   0       0      0      0 S  0.0  0.0   0:00.01 kthreadd                                                                             
     4   root       0 -20       0      0      0 S  0.0  0.0   0:00.00 kworker/0:0H                                                                         
     6  root      20   0       0      0      0 S  0.0  0.0   0:01.45 ksoftirqd/0                       

果然是mysql进程导致cpu使用高,知道了是mysql数据库导致的,可是还是不知道什么原因,如果是新手,这个时候是不是有点慌,无从下手的感觉,如果是老手,肯定知道大概率是慢sql的问题,其它的原因概率比较低,例如mysql的bug,业务qps暴增等等。

于是用show看看mysql线程情况

mysql> show full processlist;
+----+------+-----------+--------+---------+------+----------+-----------------------+
| Id | User | Host      | db     | Command | Time | State    | Info                  |
+----+------+-----------+--------+---------+------+----------+-----------------------+
|  8 | root | localhost | sbtest | Query   |    0 | starting | show full processlist |
| 10 | root | localhost | NULL   | Sleep   |   28 |          | NULL                  |
+----+------+-----------+--------+---------+------+----------+-----------------------+
2 rows in set (0.00 sec)

在实际的生产上,结果可能有好几十,上百个结果,甚至有上千,要从这么多的线程里排查出导致cpu高的那个,简直是折磨人,这个不仅仅要经验足,还得对跑的业务非常熟悉,知道那些业务,那些大表可能会导致cpu高,你想想公司里有多少人满足这个条件,估计屈指可数。

那么有没有更直接,更直观的工具进行精确定位呢,在mysql5.7以后的版本,performance_schema.threads这个性能视图中,就有linxu系统的的线程ID,有了这个ID,就把mysql和操作系统精密的关联在一起了,下面就带大家,一步一步从Linux线程到MySQL慢sql精确定位

1.找到mysql服务的进程PID
[mysql@localhost ~]$ ps -ef|grep -i mysqld|grep -v grep
mysql     39463  39441  0 14:21 pts/0    00:00:00 /bin/sh /u02/mysql/bin/mysqld_safe --defaults-file=/u02/conf/my3308.cnf
mysql     40474  39463  7 14:21 pts/0    00:07:57 /u02/mysql/bin/mysqld --defaults-file=/u02/conf/my3308.cnf --basedir=/u02/mysql --datadir=/u02/data/3308 --plugin-dir=/u02/mysql/lib/plugin --log-error=/u02/log/3308/error.log --open-files-limit=65535 --pid-file=/u02/run/3308/mysqld.pid --socket=/u02/run/3308/mysql.sock --port=3308

在这里mysql的PID为:40474

2.用top找到cpu消耗最大的线程ID
[mysql@localhost ~]$ top -H -p 40474
Threads:  33 total,   1 running,  32 sleeping,   0 stopped,   0 zombie
%Cpu(s):100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  2175376 total,   670468 free,   592304 used,   912604 buff/cache
KiB Swap:  2097148 total,  2096884 free,      264 used.  1424364 avail Mem 

   PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                                              
 40624 mysql     20   0 1266400 324264   9496 R 99.3 14.9   5:14.97 mysqld                                                                               
 40474 mysql     20   0 1266400 324264   9496 S  0.0 14.9   0:05.29 mysqld                                                                               
 40475 mysql     20   0 1266400 324264   9496 S  0.0 14.9   0:00.00 mysqld                                                                               
 40476 mysql     20   0 1266400 324264   9496 S  0.0 14.9   0:00.00 mysqld                                                                               
 40477 mysql     20   0 1266400 324264   9496 S  0.0 14.9   0:00.00 mysqld 

消耗cpu资源最高的线程PID为:40624

3.根据系统的线程PID,精确定位慢sql

连接mysql数据库,使用以下sql语句,进行慢sql的精确定位

mysql> select PROCESSLIST_ID,THREAD_OS_ID,PROCESSLIST_USER,PROCESSLIST_HOST,PROCESSLIST_DB,PROCESSLIST_COMMAND,PROCESSLIST_INFO from performance_schema.threads where THREAD_OS_ID=40624;
+----------------+--------------+------------------+------------------+----------------+---------------------+-------------------------------------------+
| PROCESSLIST_ID | THREAD_OS_ID | PROCESSLIST_USER | PROCESSLIST_HOST | PROCESSLIST_DB | PROCESSLIST_COMMAND | PROCESSLIST_INFO                          |
+----------------+--------------+------------------+------------------+----------------+---------------------+-------------------------------------------+
|              8 |        40624 | root             | localhost        | sbtest         | Query               | select count(*) from sbtest1 a ,sbtest1 b |
+----------------+--------------+------------------+------------------+----------------+---------------------+-------------------------------------------+
1 row in set (0.00 sec)

“select count(*) from sbtest1 a ,sbtest1 b”这条sql语句就是导致cpu暴增的元凶,找到之后,该怎么办,当然是先快速恢复业务,将这个select语句的连接kill掉。

[mysql@localhost ~]$ /u02/mysql/bin/mysql -uroot -proot --socket=/u02/run/3308/mysql.sock
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 12
Server version: 5.7.26-log Source distribution

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>kill 8;

到这里,整个cpu使用率100%导致业务响应速度变慢,定位到慢sql,恢复业务的全过程,这个技能你get了吗

你可能感兴趣的:(一步一步从Linux线程到MySQL慢sql定位)