在本文中,我们将学习如何通过为表格列建立索引来加速SQL查询的响应时间。我们将介绍如何安装MySQL、创建存储过程、分析查询以及理解索引的影响。
我在Ubuntu上使用了MySQL版本8。此外,我使用Dbeavor工具作为MySQL客户端连接到MySQL服务器。那么,我们一起来学习吧。
我使用MySQL进行了演示;但是,这个概念在所有其他数据库中都是相同的。
squids.cn 目前可体验全网zui低价RDS,免费的迁移工具DBMotion、SQL开发工具等。
1、我们可以按照以下方式安装MySQL并使用root用户访问它。这个MySQL实例仅用于测试,因此我使用了一个简单的密码。
$ sudo apt install mysql-server
$ sudo systemctl start mysql.service
$ sudo mysql
mysql> SET GLOBAL validate_password.policy = 0;
mysql> ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY 'password';
mysql> exit
$ mysql -uroot -ppassword
2、创建一个数据库并使用它。
mysql> create database testdb;
mysql> show databases;
mysql> use testdb;
3、创建两个表,employee1和employee2。其中,employee1没有主键,而employee2有主键。
mysql> CREATE TABLE employee1 (id int,LastName varchar(255),FirstName varchar(255),Address varchar(255),profile varchar(255));
Query OK, 0 rows affected (0.01 sec)
mysql> CREATE TABLE employee2 (id int primary key,LastName varchar(255),FirstName varchar(255),Address varchar(255),profile varchar(255));
Query OK, 0 rows affected (0.02 sec
mysql> show tables;
+------------------+
| Tables_in_testdb |
+------------------+
| employee1 |
| employee2 |
+------------------+
2 rows in set (0.00 sec)
4、现在,如果我们检查每个表的索引,我们会发现employee2表在id列上已经有一个索引,因为它是一个主键。
mysql> SHOW INDEXES FROM employee1 \G;
Empty set (0.00 sec)
ERROR:
No query specified
mysql> SHOW INDEXES FROM employee2 \G;
*************************** 1. row ***************************
Table: employee2
Non_unique: 0
Key_name: PRIMARY
Seq_in_index: 1
Column_name: id
Collation: A
Cardinality: 0
Sub_part: NULL
Packed: NULL
Null:
Index_type: BTREE
Comment:
Index_comment:
Visible: YES
Expression: NULL
1 row in set (0.00 sec)
ERROR:
No query specified
5、现在,创建一个存储过程来在两个表中插入大量数据。我们在每个表中插入20000条记录。然后我们可以使用CALL procedure-name命令来调用存储过程。
mysql>
CREATE PROCEDURE testdb.BulkInsert()
BEGIN
DECLARE i INT DEFAULT 1;
truncate table employee1;
truncate table employee2;
WHILE (i <= 20000) DO
INSERT INTO testdb.employee1 (id, FirstName, Address) VALUES(i, CONCAT("user","-",i), CONCAT("address","-",i));
INSERT INTO testdb.employee2 (id,FirstName, Address) VALUES(i,CONCAT("user","-",i), CONCAT("address","-",i));
SET i = i+1;
END WHILE;
END
mysql> CALL testdb.BulkInsert() ;
mysql> SELECT COUNT(*) from employee1 e ;
COUNT(*)|
--------+
20000|
mysql> SELECT COUNT(*) from employee2 e ;
COUNT(*)|
--------+
20000|
6、现在,如果我们选择任意随机id的记录,我们会发现从employee1表中得到的响应比较慢,因为它没有任何索引。
mysql> select * from employee2 where id = 15433;
+-------+----------+------------+---------------+---------+
| id | LastName | FirstName | Address | profile |
+-------+----------+------------+---------------+---------+
| 15433 | NULL | user-15433 | address-15433 | NULL |
+-------+----------+------------+---------------+---------+
1 row in set (0.00 sec)
mysql> select * from employee1 where id = 15433;
+-------+----------+------------+---------------+---------+
| id | LastName | FirstName | Address | profile |
+-------+----------+------------+---------------+---------+
| 15433 | NULL | user-15433 | address-15433 | NULL |
+-------+----------+------------+---------------+---------+
1 row in set (0.03 sec)
mysql> select * from employee1 where id = 19728;
+-------+----------+------------+---------------+---------+
| id | LastName | FirstName | Address | profile |
+-------+----------+------------+---------------+---------+
| 19728 | NULL | user-19728 | address-19728 | NULL |
+-------+----------+------------+---------------+---------+
1 row in set (0.03 sec)
mysql> select * from employee2 where id = 19728;
+-------+----------+------------+---------------+---------+
| id | LastName | FirstName | Address | profile |
+-------+----------+------------+---------------+---------+
| 19728 | NULL | user-19728 | address-19728 | NULL |
+-------+----------+------------+---------------+---------+
1 row in set (0.00 sec)
mysql> select * from employee1 where id = 3456;
+------+----------+-----------+--------------+---------+
| id | LastName | FirstName | Address | profile |
+------+----------+-----------+--------------+---------+
| 3456 | NULL | user-3456 | address-3456 | NULL |
+------+----------+-----------+--------------+---------+
1 row in set (0.04 sec)
mysql> select * from employee2 where id = 3456;
+------+----------+-----------+--------------+---------+
| id | LastName | FirstName | Address | profile |
+------+----------+-----------+--------------+---------+
| 3456 | NULL | user-3456 | address-3456 | NULL |
+------+----------+-----------+--------------+---------+
1 row in set (0.00 sec)
7、现在检查命令EXPLAIN ANALYZE的输出。这个命令实际上执行查询并计划查询,将其纳入仪表,然后在执行计划的各个点执行它,同时计算行数并测量花费的时间。
在这里,我们发现对于employee1,执行了一个表扫描,这意味着为了获取输出,扫描或搜索了整个表。我们也称之为表的全扫描。
mysql> explain analyze select * from employee1 where id = 3456;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: (employee1.id = 3456) (cost=1989 rows=1965) (actual time=5.24..29.3 rows=1 loops=1)
-> Table scan on employee1 (cost=1989 rows=19651) (actual time=0.0504..27.3 rows=20000 loops=1)
|
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.03 sec)
# Here is detailed explanation from ChatGPT.
Filter: (employee1.id = 3456): This indicates that there is a filter operation being performed on the "employee1" table, and only rows where the "id" column has a value of 3456 will be selected.
(cost=1989 rows=1965) (actual time=5.3..31.9 rows=1 loops=1): This part provides some performance-related information about the query execution:
cost=1989: It represents the cost estimate for the entire query execution. Cost is a relative measure of how much computational effort is required to execute the query.
rows=1965: It indicates the estimated number of rows that will be processed in this part of the query.
actual time=5.3..31.9: This shows the actual time taken for this part of the query to execute, which is measured in milliseconds.
rows=1 loops=1: The number of times this part of the query is executed in a loop.
-> Table scan on employee1 (cost=1989 rows=19651) (actual time=0.034..29.7 rows=20000 loops=1): This part shows that a table scan is being performed on the "employee1" table:
Table scan: This means that the database is scanning the entire "employee1" table to find the rows that match the filter condition.
cost=1989: The cost estimate for this table scan operation.
rows=19651: The estimated number of rows in the "employee1" table.
actual time=0.034..29.7: The actual time taken for the table scan operation, measured in milliseconds.
rows=20000 loops=1: The number of times this table scan operation is executed in a loop.
Overall, this query plan suggests that the database is executing a query that filters the "employee1" table to only return rows where the "id" column is equal to 3456.
The table scan operation reads a total of 20,000 rows to find the matching row(s) and has an estimated cost of 1989 units.
The actual execution time is 5.3 to 31.9 milliseconds, depending on the number of rows that match the filter condition.
8、对于employee2表,我们发现只搜索了一行,然后获取了结果。因此,如果表中有大量记录,我们会观察到SQL查询响应时间的显著提高。
mysql> explain analyze select * from employee2 where id = 3456;
+---------------------------------------------------------------------------------------------------+
| EXPLAIN |
+---------------------------------------------------------------------------------------------------+
| -> Rows fetched before execution (cost=0..0 rows=1) (actual time=110e-6..190e-6 rows=1 loops=1)
|
+---------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
# As per ChatGPT explanation of this query plan is :
Rows fetched before execution: This part indicates that the database is fetching some data before the main query is executed.
(cost=0..0 rows=1): The cost estimate for this operation is 0 units, and it expects to fetch only one row.
(actual time=110e-6..190e-6 rows=1 loops=1): This provides the actual time taken for the data fetching operation:
actual time=110e-6..190e-6: The actual time range for the fetching operation, measured in microseconds (µs).
rows=1: The number of rows fetched.
loops=1: The number of times this data fetching operation is executed in a loop.
Overall, this part of the query plan indicates that the database is fetching a single row before executing the main query.
The actual time taken for this data fetching operation is in the range of 110 to 190 microseconds. This preliminary data fetch might be related to obtaining some essential information or parameters needed for the subsequent execution of the main query.
9、现在,让我们使它更有趣。让我们分析当我们在两个表上搜索非索引列FirstName的记录时的查询计划。从输出中,我们发现进行了表扫描来搜索记录,这需要相当的时间来获取数据。
mysql> explain analyze select * from employee2 where FirstName = 'user-13456';
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: (employee2.FirstName = 'user-13456') (cost=2036 rows=2012) (actual time=15.7..24 rows=1 loops=1)
-> Table scan on employee2 (cost=2036 rows=20115) (actual time=0.0733..17.8 rows=20000 loops=1)
|
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.02 sec)
mysql> explain analyze select * from employee1 where FirstName = 'user-13456';
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: (employee1.FirstName = 'user-13456') (cost=1989 rows=1965) (actual time=23.7..35.2 rows=1 loops=1)
-> Table scan on employee1 (cost=1989 rows=19651) (actual time=0.0439..28.9 rows=20000 loops=1)
|
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.03 sec)
10、现在,让我们在employee1表上为FirstName列创建一个索引。
mysql> CREATE INDEX index1 ON employee1 (FirstName);
Query OK, 0 rows affected (0.13 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> show indexes from employee1 \G;
*************************** 1. row ***************************
Table: employee1
Non_unique: 1
Key_name: index1
Seq_in_index: 1
Column_name: FirstName
Collation: A
Cardinality: 19651
Sub_part: NULL
Packed: NULL
Null: YES
Index_type: BTREE
Comment:
Index_comment:
Visible: YES
Expression: NULL
1 row in set (0.01 sec)
ERROR:
No query specified
11、现在,让我们再次检查当我们为FirstName列搜索单个记录时两个表的查询计划。我们发现employee1迅速地提供了响应,只需要搜索1行,当使用FirstName列上的索引时,对employee1表进行了索引查找。但对于employee2,响应时间较长,需要搜索所有20000行才能得到响应。
mysql> explain analyze select * from employee1 where FirstName = 'user-13456';
+-------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+-------------------------------------------------------------------------------------------------------------------------------------+
| -> Index lookup on employee1 using index1 (FirstName='user-13456') (cost=0.35 rows=1) (actual time=0.0594..0.0669 rows=1 loops=1)
|
+-------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> explain analyze select * from employee2 where FirstName = 'user-13456';
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: (employee2.FirstName = 'user-13456') (cost=2036 rows=2012) (actual time=15.7..23.5 rows=1 loops=1)
-> Table scan on employee2 (cost=2036 rows=20115) (actual time=0.075..17.5 rows=20000 loops=1)
|
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.02 sec)
这篇文章将帮助我们理解表上索引的影响。如何使用explain analyze命令分析查询。此外,还有关于如何设置MySQL以及如何编写存储过程进行批量插入的学习。
作者:Chandra Shekhar Pandey
更多技术干货请关注公众号“云原生数据库”
squids.cn 提供云数据库RDS,数据库迁移工具、SQL开发工具等