MySQL5.7利用虚拟列优化

导 读

作者:徐晨亮,MySQL课程学员

微信号:xuchenliang573,欢迎交流拍砖

知数堂只分享干货,各精品课程讲授的都是职场实用技能smiley_4.png

今天线上发生CPU使用率超过95%的报警, 登上RDS以后发现一堆的sending data状态的SQL,大致有3个问题SQL,因为这3个SQL导致了其他原本很快的SQL也被“拖慢了”,以下是其中的一个SQL,拿出来记录一下

建表DDL:

mysql> show create table task_log\G
*************************** 1. row ***************************
       Table: task_log
Create Table: CREATE TABLE `task_log` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `created_at` datetime(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3) COMMENT '创建时间',
  `task_type` int(1) NOT NULL DEFAULT '0' COMMENT '任务类型 1:每日任务 2:成长任务 3:签到任务',
....
  `user_id` varchar(24) NOT NULL DEFAULT '' COMMENT '用户ID',
  `tenant_id` bigint(20) NOT NULL DEFAULT '0' COMMENT '县编码',
  `task_id` bigint(20) NOT NULL DEFAULT '0' COMMENT '任务id',
....
  PRIMARY KEY (`id`),
  KEY `idx_create_at` (`created_at`) USING BTREE,
  KEY `idx_user_id_create_at` (`user_id`,`created_at`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=5549 DEFAULT CHARSET=utf8mb4 COMMENT='任务明细表'
1 row in set (0.00 sec)
mysql> show create table account\G
*************************** 1. row ***************************
       Table: account
Create Table: CREATE TABLE `account` (
  `id` varchar(24) NOT NULL DEFAULT '' COMMENT '代码生成',
  `auto_pk` bigint(11) NOT NULL AUTO_INCREMENT COMMENT '主键',
  `nick_name` varchar(45) NOT NULL DEFAULT '' COMMENT '昵称',
   ....
  `forbidden` tinyint(1) NOT NULL DEFAULT '0' COMMENT '是否禁言(0:否,1:是)',
  `created_at` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
   ....
  `tenant_id` bigint(20) NOT NULL DEFAULT '0' COMMENT '县编码',
  `portrait_update_flag` tinyint(1) NOT NULL DEFAULT '0' COMMENT '用户修改头像标记:0否 1是',
  PRIMARY KEY (`auto_pk`),
  UNIQUE KEY `uk_id` (`id`),
  KEY `idx_name_anonymous` (`nick_name`,`anonymous`),
  KEY `idx_nav_version` (`nav_version`),
  KEY `idx_unid_authType` (`union_id`(191),`auth_type`),
  KEY `idx_refCode_anonymous` (`ref_code`,`anonymous`),
  KEY `idx_updated_at` (`updated_at`),
  KEY `idx_authUid_authType` (`auth_uid`,`auth_type`),
  KEY `idx_snda_id` (`snda_id`)
) ENGINE=InnoDB AUTO_INCREMENT=16668 DEFAULT CHARSET=utf8mb4 COMMENT='客户端用户表'
1 row in set (0.00 sec)

首先,看下执行计划:

mysql> desc
SELECT t1.id AS taskLogId,
       t4.maxCreateTimeByDay,
       t1.task_type,
       t1.event_name AS eventName,
t1.completed AS completed,
       t1.finish_times AS finishTimes,
       t1.frequency,
       t1.experience,
       t1.integral,
       t1.status,
       t1.reason,
       t1.user_id AS userId,
       t2.auth_type AS authType,
       t2.mobile AS mobile,
       t2.nick_name AS nickName,
       t1.task_id AS taskId,
       t1.tenant_id AS tenantId,
       t4.date
  FROM(
SELECT user_id, task_id, MAX(created_at) "maxCreateTimeByDay", tenant_id, DATE_FORMAT(created_at, '%Y-%m-%d') "date"
  FROM task_log
GROUP BY user_id, task_id, date) t4 JOIN task_log t1 ON t4.user_id = t1.user_id AND t4.task_id=t1.task_id and t4.maxCreateTimeByDay= t1.created_at
  INNER JOIN account t2 ON t4.user_id= t2.ID AND t4.tenant_id= t2.tenant_id
WHERE t1.tenant_id= 2;

640?wx_fmt=png

从执行计划里可以看到,如下展示的部分没有用到索引,并且在与表task log关联时也没有走索引,这里有点特殊的情况就是t4跟t1关联是3个字段关联。

SELECT user_id, task_id, MAX(created_at) "maxCreateTimeByDay", tenant_id, DATE_FORMAT(created_at, '%Y-%m-%d') "date"
  FROM task_log
GROUP BY user_id, task_id, date)

建立以下索引:

mysql> create index idx_tenantid on task_log(tenant_id);
Query OK, 0 rows affected (0.06 sec)
Records: 0  Duplicates: 0  Warnings: 0
mysql> alter table task_log add index idx_userid_taskid_create(user_id,task_id,created_at);
Query OK, 0 rows affected (0.10 sec)
Records: 0  Duplicates: 0  Warnings: 0

再来看执行计划:

640?wx_fmt=png

从执行计划看,问题就只剩下子查询的部分,如果这个部分能用到索引,那么这个SQL基本上就能达到要求了

先看下运行时间:

194 rows in set (0.09 sec)

总共运行时间为90ms

再来分析下子查询的部分:

SELECT user_id, task_id, MAX(created_at) "maxCreateTimeByDay", tenant_id, DATE_FORMAT(created_at, '%Y-%m-%d') "date"
  FROM task_log
GROUP BY user_id, task_id, date)

这个部分是根据user_id,task_id,date分组,而date又是把created_at字段格式化后取日期的部分,这个部分是没有办法直接用到索引的

查询了下生产环境的RDS版本为5.7.18,可以用5.7的虚拟列完美解决这个问题呀!

先来看下MySQL官方文档关于虚拟列咋写的:

MySQL5.7利用虚拟列优化_第1张图片

  • 虚拟列分为VIRTUAL和STORED两种,两者的区别是VIRTUAL只是在读行的时候计算结果,但在物理上是不存储的,并且InnoDB引擎支持在虚拟列上建二级索引;

  • 而STORED则是当行数据进行插入或更新时计算并存储的,是需要占用物理空间的,并且也可以添加为索引列。

但虚拟列在实际使用过程中也需要注意如下:

1、虚拟列支持使用内置函数和运算符,但不能使用  CONNECTION_ID(), CURRENT_USER(), NOW()

2、子查询、参数、变量、 存储函数和自定义函数都无法使用虚拟列可以参照其他的虚拟列,但是参照的虚拟列必须定义在前面,假如参照的列为非虚拟列,那么该列可以定义在虚拟列的前后

3、自增属性无法使用在虚拟列并且自增列不能作为基础列

4、MySQL5.7.10版本,如果表达式计算造成截断或将错误数据传入函数,CREATE TABLE语句会造成错误并且DDL操作会被拒绝

OK,大致了解了虚拟列,我们看下DDL语法

col_name data_type [GENERATED ALWAYS] AS (expression) 
    [VIRTUAL | STORED] [NOT NULL | NULL] 
    [UNIQUE [KEY]] [[PRIMARY] KEY] 
    [COMMENT 'string']

回到正题上来,首先,增加虚拟列:

alter table task_log add date date generated always as (DATE_FORMAT(created_at, '%Y-%m-%d')) stored after created_at;

对虚拟列建立索引:

create index idx_userid_taskid_crtime on task_log(user_id,task_id,date);

再看下执行计划:

mysql> desc
SELECT t1.id AS taskLogId,
       t4.maxCreateTimeByDay,
       t1.task_type,
       t1.event_name AS eventName,
       t1.completed AS completed,
       t1.finish_times AS finishTimes,
       t1.frequency,
       t1.experience,
       t1.integral,
       t1.status,
       t1.reason,
       t1.user_id AS userId,
       t2.auth_type AS authType,
       t2.mobile AS mobile,
       t2.nick_name AS nickName,
       t1.task_id AS taskId,
       t1.tenant_id AS tenantId,
       t4.date
  FROM(
SELECT user_id, task_id, MAX(created_at) "maxCreateTimeByDay", tenant_id, DATE_FORMAT(created_at, '%Y-%m-%d') "date"
  FROM task_log
GROUP BY user_id, task_id, date) t4 JOIN task_log t1 ON t4.user_id = t1.user_id AND t4.task_id=t1.task_id and t4.maxCreateTimeByDay= t1.created_at
  INNER JOIN account t2 ON t4.user_id= t2.ID AND t4.tenant_id= t2.tenant_id
WHERE t1.tenant_id= 2;

640?wx_fmt=png


从执行计划上看,子查询的部分也已经用上了刚刚对虚拟列建立的索引,但是可以注意到有两个warnings。

mysql> show warnings\G
*************************** 1. row ***************************
  Level: Warning
   Code: 1052
Message: Column 'date' in group statement is ambiguous
*************************** 2. row ***************************
  Level: Note
   Code: 1003
Message: /* select#1 */ select `xucl`.`t1`.`id` AS `taskLogId`,`t4`.`maxCreateTimeByDay` AS `maxCreateTimeByDay`,`xucl`.`t1`.`task_type` AS `task_type`,`xucl`.`t1`.`event_name` AS `eventName`,`xucl`.`t1`.`completed` AS `completed`,`xucl`.`t1`.`finish_times` AS `finishTimes`,`xucl`.`t1`.`frequency` AS `frequency`,`xucl`.`t1`.`experience` AS `experience`,`xucl`.`t1`.`integral` AS `integral`,`xucl`.`t1`.`status` AS `status`,`xucl`.`t1`.`reason` AS `reason`,`xucl`.`t1`.`user_id` AS `userId`,`xucl`.`t2`.`auth_type` AS `authType`,`xucl`.`t2`.`mobile` AS `mobile`,`xucl`.`t2`.`nick_name` AS `nickName`,`xucl`.`t1`.`task_id` AS `taskId`,`xucl`.`t1`.`tenant_id` AS `tenantId`,`t4`.`date` AS `date` from (/* select#2 */ select `xucl`.`task_log`.`user_id` AS `user_id`,`xucl`.`task_log`.`task_id` AS `task_id`,max(`xucl`.`task_log`.`created_at`) AS `maxCreateTimeByDay`,`xucl`.`task_log`.`tenant_id` AS `tenant_id`,date_format(`xucl`.`task_log`.`created_at`,'%Y-%m-%d') AS `date` from `xucl`.`task_log` group by `xucl`.`task_log`.`user_id`,`xucl`.`task_log`.`task_id`,`xucl`.`task_log`.`date`) `t4` join `xucl`.`task_log` `t1` join `xucl`.`account` `t2` where ((`t4`.`maxCreateTimeByDay` = `xucl`.`t1`.`created_at`) and (`t4`.`task_id` = `xucl`.`t1`.`task_id`) and (`t4`.`tenant_id` = `xucl`.`t2`.`tenant_id`) and (`xucl`.`t2`.`id` = `xucl`.`t1`.`user_id`) and (`t4`.`user_id` = `xucl`.`t1`.`user_id`) and (`xucl`.`t1`.`tenant_id` = 2))
2 rows in set (0.00 sec)

可以看到的是新增的date跟原先DATE_FORMAT(created_at, '%Y-%m-%d')是重名的,这个地方也是本次优化一个需要注意的地方,虽然这里报了warning,但是我们可以巧妙地解决无需修改线上代码就能达到优化的结果


看下执行结果:

mysql> show profiles;                                                                                                      
+----------+------------+-----------------------------------------------------------------+
| Query_ID | Duration   | Query                                                           |
+----------+------------+-----------------------------------------------------------------+
|        1 | 0.00081025 | desc SELECT t1.id AS taskLogId,        t4.maxCreateTimeByDay... |
|        2 | 0.02203450 | SELECT t1.id AS taskLogId,        t4.maxCreateTimeByDay....     |
+----------+------------+-----------------------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)

mysql> show profile for query 2;
+----------------------+----------+
| Status               | Duration |
+----------------------+----------+
| starting             | 0.000245 |
| checking permissions | 0.000007 |
| checking permissions | 0.000003 |
| checking permissions | 0.000006 |
| Opening tables       | 0.000041 |
| init                 | 0.000093 |
| System lock          | 0.000011 |
| optimizing           | 0.000005 |
| optimizing           | 0.000005 |
| statistics           | 0.000020 |
| preparing            | 0.000021 |
| Sorting result       | 0.000016 |
| statistics           | 0.000113 |
| preparing            | 0.000014 |
| executing            | 0.000011 |
| Sending data         | 0.000105 |
| executing            | 0.000003 |
| Sending data         | 0.020828 |
| end                  | 0.000023 |
| query end            | 0.000017 |
| closing tables       | 0.000006 |
| removing tmp table   | 0.000371 |
| closing tables       | 0.000017 |
| freeing items        | 0.000038 |
| cleaning up          | 0.000017 |
+----------------------+----------+
25 rows in set, 1 warning (0.00 sec)

可见,整个执行效果在21ms左右,性能提升4.3倍。


但是有点小小强迫症,让开发尽快修改代码,SQL修改如下:

mysql> desc
SELECT t1.id AS taskLogId,
       t4.maxCreateTimeByDay,
       t1.task_type,
       t1.event_name AS eventName,
t1.completed AS completed,
       t1.finish_times AS finishTimes,
       t1.frequency,
       t1.experience,
       t1.integral,
       t1.status,
       t1.reason,
       t1.user_id AS userId,
       t2.auth_type AS authType,
       t2.mobile AS mobile,
       t2.nick_name AS nickName,
       t1.task_id AS taskId,
       t1.tenant_id AS tenantId,
       t4.date
  FROM(
SELECT user_id, task_id, MAX(created_at) "maxCreateTimeByDay", tenant_id, date
  FROM task_log
GROUP BY user_id, task_id, date) t4 JOIN task_log t1 ON t4.user_id = t1.user_id AND t4.task_id=t1.task_id and t4.maxCreateTimeByDay= t1.created_at
  INNER JOIN account t2 ON t4.user_id= t2.ID AND t4.tenant_id= t2.tenant_id
WHERE t1.tenant_id= 2;

最后生产环境上线以后,由原先的378ms提升到36ms,提升了10.5倍,效果还是很明显的。

640?wx_fmt=png

640?wx_fmt=png


640?


加入知数堂

挑战40万+年薪!



640?640?640?640?


知数堂

叶金荣与吴炳锡联合打造

领跑IT精英培训

行业资深专家强强联合,倾心定制

MySQL实战/MySQL优化/MongoDB/

Python/ SQL优化/Hadoop+ELK

数门精品课程

“阅读原文”可获更多正课试听视频

密码:hg3h

紧随技术发展趋势,定期优化培训教案

融入大量生产案例,贴合企业一线需求

社群陪伴学习,一次报名,可学1年

DBA、开发工程师必修课

上千位学员已华丽转身,薪资翻番,职位提升

改变已悄然发生,你还在等什么?

640?



640?wx_fmt=gif


扫码加入QQ技术交流群

高性能MySQL,SQL优化-ZST

(QQ群号:579036588)    


MySQL5.7利用虚拟列优化_第2张图片


你可能感兴趣的:(MySQL5.7利用虚拟列优化)