导 读
作者:徐晨亮,MySQL课程学员
微信号:xuchenliang573,欢迎交流拍砖
知数堂只分享干货,各精品课程讲授的都是职场实用技能
今天线上发生CPU使用率超过95%的报警, 登上RDS以后发现一堆的sending data状态的SQL,大致有3个问题SQL,因为这3个SQL导致了其他原本很快的SQL也被“拖慢了”,以下是其中的一个SQL,拿出来记录一下
建表DDL:
mysql> show create table task_log\G
*************************** 1. row ***************************
Table: task_log
Create Table: CREATE TABLE `task_log` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`created_at` datetime(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3) COMMENT '创建时间',
`task_type` int(1) NOT NULL DEFAULT '0' COMMENT '任务类型 1:每日任务 2:成长任务 3:签到任务',
....
`user_id` varchar(24) NOT NULL DEFAULT '' COMMENT '用户ID',
`tenant_id` bigint(20) NOT NULL DEFAULT '0' COMMENT '县编码',
`task_id` bigint(20) NOT NULL DEFAULT '0' COMMENT '任务id',
....
PRIMARY KEY (`id`),
KEY `idx_create_at` (`created_at`) USING BTREE,
KEY `idx_user_id_create_at` (`user_id`,`created_at`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=5549 DEFAULT CHARSET=utf8mb4 COMMENT='任务明细表'
1 row in set (0.00 sec)
mysql> show create table account\G
*************************** 1. row ***************************
Table: account
Create Table: CREATE TABLE `account` (
`id` varchar(24) NOT NULL DEFAULT '' COMMENT '代码生成',
`auto_pk` bigint(11) NOT NULL AUTO_INCREMENT COMMENT '主键',
`nick_name` varchar(45) NOT NULL DEFAULT '' COMMENT '昵称',
....
`forbidden` tinyint(1) NOT NULL DEFAULT '0' COMMENT '是否禁言(0:否,1:是)',
`created_at` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
....
`tenant_id` bigint(20) NOT NULL DEFAULT '0' COMMENT '县编码',
`portrait_update_flag` tinyint(1) NOT NULL DEFAULT '0' COMMENT '用户修改头像标记:0否 1是',
PRIMARY KEY (`auto_pk`),
UNIQUE KEY `uk_id` (`id`),
KEY `idx_name_anonymous` (`nick_name`,`anonymous`),
KEY `idx_nav_version` (`nav_version`),
KEY `idx_unid_authType` (`union_id`(191),`auth_type`),
KEY `idx_refCode_anonymous` (`ref_code`,`anonymous`),
KEY `idx_updated_at` (`updated_at`),
KEY `idx_authUid_authType` (`auth_uid`,`auth_type`),
KEY `idx_snda_id` (`snda_id`)
) ENGINE=InnoDB AUTO_INCREMENT=16668 DEFAULT CHARSET=utf8mb4 COMMENT='客户端用户表'
1 row in set (0.00 sec)
首先,看下执行计划:
mysql> desc
SELECT t1.id AS taskLogId,
t4.maxCreateTimeByDay,
t1.task_type,
t1.event_name AS eventName,
t1.completed AS completed,
t1.finish_times AS finishTimes,
t1.frequency,
t1.experience,
t1.integral,
t1.status,
t1.reason,
t1.user_id AS userId,
t2.auth_type AS authType,
t2.mobile AS mobile,
t2.nick_name AS nickName,
t1.task_id AS taskId,
t1.tenant_id AS tenantId,
t4.date
FROM(
SELECT user_id, task_id, MAX(created_at) "maxCreateTimeByDay", tenant_id, DATE_FORMAT(created_at, '%Y-%m-%d') "date"
FROM task_log
GROUP BY user_id, task_id, date) t4 JOIN task_log t1 ON t4.user_id = t1.user_id AND t4.task_id=t1.task_id and t4.maxCreateTimeByDay= t1.created_at
INNER JOIN account t2 ON t4.user_id= t2.ID AND t4.tenant_id= t2.tenant_id
WHERE t1.tenant_id= 2;
从执行计划里可以看到,如下展示的部分没有用到索引,并且在与表task log关联时也没有走索引,这里有点特殊的情况就是t4跟t1关联是3个字段关联。
SELECT user_id, task_id, MAX(created_at) "maxCreateTimeByDay", tenant_id, DATE_FORMAT(created_at, '%Y-%m-%d') "date"
FROM task_log
GROUP BY user_id, task_id, date)
建立以下索引:
mysql> create index idx_tenantid on task_log(tenant_id);
Query OK, 0 rows affected (0.06 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> alter table task_log add index idx_userid_taskid_create(user_id,task_id,created_at);
Query OK, 0 rows affected (0.10 sec)
Records: 0 Duplicates: 0 Warnings: 0
再来看执行计划:
从执行计划看,问题就只剩下子查询的部分,如果这个部分能用到索引,那么这个SQL基本上就能达到要求了
先看下运行时间:
194 rows in set (0.09 sec)
总共运行时间为90ms
再来分析下子查询的部分:
SELECT user_id, task_id, MAX(created_at) "maxCreateTimeByDay", tenant_id, DATE_FORMAT(created_at, '%Y-%m-%d') "date"
FROM task_log
GROUP BY user_id, task_id, date)
这个部分是根据user_id,task_id,date分组,而date又是把created_at字段格式化后取日期的部分,这个部分是没有办法直接用到索引的
查询了下生产环境的RDS版本为5.7.18,可以用5.7的虚拟列完美解决这个问题呀!
先来看下MySQL官方文档关于虚拟列咋写的:
虚拟列分为VIRTUAL和STORED两种,两者的区别是VIRTUAL只是在读行的时候计算结果,但在物理上是不存储的,并且InnoDB引擎支持在虚拟列上建二级索引;
而STORED则是当行数据进行插入或更新时计算并存储的,是需要占用物理空间的,并且也可以添加为索引列。
但虚拟列在实际使用过程中也需要注意如下:
1、虚拟列支持使用内置函数和运算符,但不能使用 CONNECTION_ID(), CURRENT_USER(), NOW()
2、子查询、参数、变量、 存储函数和自定义函数都无法使用虚拟列可以参照其他的虚拟列,但是参照的虚拟列必须定义在前面,假如参照的列为非虚拟列,那么该列可以定义在虚拟列的前后
3、自增属性无法使用在虚拟列并且自增列不能作为基础列
4、MySQL5.7.10版本,如果表达式计算造成截断或将错误数据传入函数,CREATE TABLE语句会造成错误并且DDL操作会被拒绝
OK,大致了解了虚拟列,我们看下DDL语法
col_name data_type [GENERATED ALWAYS] AS (expression)
[VIRTUAL | STORED] [NOT NULL | NULL]
[UNIQUE [KEY]] [[PRIMARY] KEY]
[COMMENT 'string']
回到正题上来,首先,增加虚拟列:
alter table task_log add date date generated always as (DATE_FORMAT(created_at, '%Y-%m-%d')) stored after created_at;
对虚拟列建立索引:
create index idx_userid_taskid_crtime on task_log(user_id,task_id,date);
再看下执行计划:
mysql> desc
SELECT t1.id AS taskLogId,
t4.maxCreateTimeByDay,
t1.task_type,
t1.event_name AS eventName,
t1.completed AS completed,
t1.finish_times AS finishTimes,
t1.frequency,
t1.experience,
t1.integral,
t1.status,
t1.reason,
t1.user_id AS userId,
t2.auth_type AS authType,
t2.mobile AS mobile,
t2.nick_name AS nickName,
t1.task_id AS taskId,
t1.tenant_id AS tenantId,
t4.date
FROM(
SELECT user_id, task_id, MAX(created_at) "maxCreateTimeByDay", tenant_id, DATE_FORMAT(created_at, '%Y-%m-%d') "date"
FROM task_log
GROUP BY user_id, task_id, date) t4 JOIN task_log t1 ON t4.user_id = t1.user_id AND t4.task_id=t1.task_id and t4.maxCreateTimeByDay= t1.created_at
INNER JOIN account t2 ON t4.user_id= t2.ID AND t4.tenant_id= t2.tenant_id
WHERE t1.tenant_id= 2;
从执行计划上看,子查询的部分也已经用上了刚刚对虚拟列建立的索引,但是可以注意到有两个warnings。
mysql> show warnings\G
*************************** 1. row ***************************
Level: Warning
Code: 1052
Message: Column 'date' in group statement is ambiguous
*************************** 2. row ***************************
Level: Note
Code: 1003
Message: /* select#1 */ select `xucl`.`t1`.`id` AS `taskLogId`,`t4`.`maxCreateTimeByDay` AS `maxCreateTimeByDay`,`xucl`.`t1`.`task_type` AS `task_type`,`xucl`.`t1`.`event_name` AS `eventName`,`xucl`.`t1`.`completed` AS `completed`,`xucl`.`t1`.`finish_times` AS `finishTimes`,`xucl`.`t1`.`frequency` AS `frequency`,`xucl`.`t1`.`experience` AS `experience`,`xucl`.`t1`.`integral` AS `integral`,`xucl`.`t1`.`status` AS `status`,`xucl`.`t1`.`reason` AS `reason`,`xucl`.`t1`.`user_id` AS `userId`,`xucl`.`t2`.`auth_type` AS `authType`,`xucl`.`t2`.`mobile` AS `mobile`,`xucl`.`t2`.`nick_name` AS `nickName`,`xucl`.`t1`.`task_id` AS `taskId`,`xucl`.`t1`.`tenant_id` AS `tenantId`,`t4`.`date` AS `date` from (/* select#2 */ select `xucl`.`task_log`.`user_id` AS `user_id`,`xucl`.`task_log`.`task_id` AS `task_id`,max(`xucl`.`task_log`.`created_at`) AS `maxCreateTimeByDay`,`xucl`.`task_log`.`tenant_id` AS `tenant_id`,date_format(`xucl`.`task_log`.`created_at`,'%Y-%m-%d') AS `date` from `xucl`.`task_log` group by `xucl`.`task_log`.`user_id`,`xucl`.`task_log`.`task_id`,`xucl`.`task_log`.`date`) `t4` join `xucl`.`task_log` `t1` join `xucl`.`account` `t2` where ((`t4`.`maxCreateTimeByDay` = `xucl`.`t1`.`created_at`) and (`t4`.`task_id` = `xucl`.`t1`.`task_id`) and (`t4`.`tenant_id` = `xucl`.`t2`.`tenant_id`) and (`xucl`.`t2`.`id` = `xucl`.`t1`.`user_id`) and (`t4`.`user_id` = `xucl`.`t1`.`user_id`) and (`xucl`.`t1`.`tenant_id` = 2))
2 rows in set (0.00 sec)
可以看到的是新增的date跟原先DATE_FORMAT(created_at, '%Y-%m-%d')是重名的,这个地方也是本次优化一个需要注意的地方,虽然这里报了warning,但是我们可以巧妙地解决无需修改线上代码就能达到优化的结果
看下执行结果:
mysql> show profiles;
+----------+------------+-----------------------------------------------------------------+
| Query_ID | Duration | Query |
+----------+------------+-----------------------------------------------------------------+
| 1 | 0.00081025 | desc SELECT t1.id AS taskLogId, t4.maxCreateTimeByDay... |
| 2 | 0.02203450 | SELECT t1.id AS taskLogId, t4.maxCreateTimeByDay.... |
+----------+------------+-----------------------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)
mysql> show profile for query 2;
+----------------------+----------+
| Status | Duration |
+----------------------+----------+
| starting | 0.000245 |
| checking permissions | 0.000007 |
| checking permissions | 0.000003 |
| checking permissions | 0.000006 |
| Opening tables | 0.000041 |
| init | 0.000093 |
| System lock | 0.000011 |
| optimizing | 0.000005 |
| optimizing | 0.000005 |
| statistics | 0.000020 |
| preparing | 0.000021 |
| Sorting result | 0.000016 |
| statistics | 0.000113 |
| preparing | 0.000014 |
| executing | 0.000011 |
| Sending data | 0.000105 |
| executing | 0.000003 |
| Sending data | 0.020828 |
| end | 0.000023 |
| query end | 0.000017 |
| closing tables | 0.000006 |
| removing tmp table | 0.000371 |
| closing tables | 0.000017 |
| freeing items | 0.000038 |
| cleaning up | 0.000017 |
+----------------------+----------+
25 rows in set, 1 warning (0.00 sec)
可见,整个执行效果在21ms左右,性能提升4.3倍。
但是有点小小强迫症,让开发尽快修改代码,SQL修改如下:
mysql> desc
SELECT t1.id AS taskLogId,
t4.maxCreateTimeByDay,
t1.task_type,
t1.event_name AS eventName,
t1.completed AS completed,
t1.finish_times AS finishTimes,
t1.frequency,
t1.experience,
t1.integral,
t1.status,
t1.reason,
t1.user_id AS userId,
t2.auth_type AS authType,
t2.mobile AS mobile,
t2.nick_name AS nickName,
t1.task_id AS taskId,
t1.tenant_id AS tenantId,
t4.date
FROM(
SELECT user_id, task_id, MAX(created_at) "maxCreateTimeByDay", tenant_id, date
FROM task_log
GROUP BY user_id, task_id, date) t4 JOIN task_log t1 ON t4.user_id = t1.user_id AND t4.task_id=t1.task_id and t4.maxCreateTimeByDay= t1.created_at
INNER JOIN account t2 ON t4.user_id= t2.ID AND t4.tenant_id= t2.tenant_id
WHERE t1.tenant_id= 2;
最后生产环境上线以后,由原先的378ms提升到36ms,提升了10.5倍,效果还是很明显的。
加入知数堂
挑战40万+年薪!
知数堂
叶金荣与吴炳锡联合打造
领跑IT精英培训
行业资深专家强强联合,倾心定制
MySQL实战/MySQL优化/MongoDB/
Python/ SQL优化/Hadoop+ELK
数门精品课程
“阅读原文”可获更多正课试听视频
密码:hg3h
紧随技术发展趋势,定期优化培训教案
融入大量生产案例,贴合企业一线需求
社群陪伴学习,一次报名,可学1年
DBA、开发工程师必修课
上千位学员已华丽转身,薪资翻番,职位提升
改变已悄然发生,你还在等什么?
高性能MySQL,SQL优化-ZST
(QQ群号:579036588)