索引加速功能真能提升10倍吗?--TiDB V6.1.0-V7.1.0建索引速度对比

作者: 我是咖啡哥 原文来源: https://tidb.net/blog/a93d7c03

记得在某个6的版本发布宣传里面看到,新版本建索引速度提升了10倍,一直没抽时间亲自测试验证下,前段时间V7.1发布,趁着这次机会,把我的一个V6.1.0的环境做下升级,顺便测试了下各版本创建索引速度对比下。

不测不知道,一测吓一跳,满满的惊喜。一个3kw的表新索引速度从v6.1.0的21min,到v7.1.0只要1min8s。提升了20倍!

以下来自官方文档相关版本关于索引加速功能的Release Notes:

TiDB v6.3.0 支持开启添加索引加速功能,提升了创建索引回填过程的速度。开启该功能后,TiDB 添加索引的性能提升约为原来的 3 倍。 从 v6.5.0 起,添加索引加速功能默认开启。

1、环境说明

环境信息

单节点TiDB环境,物理机。配置:40c512G,7T SSD TiDB初始版本:v6.1.0,逐步升级到v6.5.0和v7.1.0

表信息:3kw

直接使用的tpcc里面的customer表,3000w,大小约18G


[[email protected]][tpcc][05:17:48]> show create table customer\G
*************************** 1. row ***************************
       Table: customer
Create Table: CREATE TABLE `customer` (
  `c_id` int(11) NOT NULL,
  `c_d_id` int(11) NOT NULL,
  `c_w_id` int(11) NOT NULL,
  `c_first` varchar(16) DEFAULT NULL,
  `c_middle` char(2) DEFAULT NULL,
  `c_last` varchar(16) DEFAULT NULL,
  `c_street_1` varchar(20) DEFAULT NULL,
  `c_street_2` varchar(20) DEFAULT NULL,
  `c_city` varchar(20) DEFAULT NULL,
  `c_state` char(2) DEFAULT NULL,
  `c_zip` char(9) DEFAULT NULL,
  `c_phone` char(16) DEFAULT NULL,
  `c_since` datetime DEFAULT NULL,
  `c_credit` char(2) DEFAULT NULL,
  `c_credit_lim` decimal(12,2) DEFAULT NULL,
  `c_discount` decimal(4,4) DEFAULT NULL,
  `c_balance` decimal(12,2) DEFAULT NULL,
  `c_ytd_payment` decimal(12,2) DEFAULT NULL,
  `c_payment_cnt` int(11) DEFAULT NULL,
  `c_delivery_cnt` int(11) DEFAULT NULL,
  `c_data` varchar(500) DEFAULT NULL,
  PRIMARY KEY (`c_w_id`,`c_d_id`,`c_id`) /*T![clustered_index] NONCLUSTERED */,
  KEY `idx_customer` (`c_w_id`,`c_d_id`,`c_last`,`c_first`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
1 row in set (0.01 sec)

[[email protected]][tpcc][05:21:48]> show variables like 'tidb_ddl%';
+----------------------------+--------------+
| Variable_name              | Value        |
+----------------------------+--------------+
| tidb_ddl_error_count_limit | 512          |
| tidb_ddl_reorg_batch_size  | 256          |
| tidb_ddl_reorg_priority    | PRIORITY_LOW |
| tidb_ddl_reorg_worker_cnt  | 4            |
+----------------------------+--------------+
4 rows in set (0.00 sec)

2、v6.1.0测试3kw的表加索引

测试加3个不同类型列的索引

分别测试三个索引,一个是varchar列、一个是char列,一个组合索引

alter table tpcc.customer add index idx_01(c_city);
alter table tpcc.customer add index idx_02(c_phone);
alter table tpcc.customer add index idx_03(c_phone,c_since);

修改tidb_ddl_reorg_worker_cnt,观察建索引的速度。

经测试,时间差不多,后面就主要以组合索引idx_03为主来做对比。

--默认4时

[[email protected]][tpcc][05:25:40]> alter table tpcc.customer add index idx_01(c_city);
Query OK, 0 rows affected (21 min 4.48 sec)

[[email protected]][tpcc][05:46:48]> alter table tpcc.customer add index idx_02(c_phone);
Query OK, 0 rows affected (21 min 8.08 sec)
[[email protected]][tpcc][06:08:50]> set global tidb_ddl_reorg_worker_cnt=8;
Query OK, 0 rows affected (0.02 sec)

[[email protected]][tpcc][06:08:58]> alter table tpcc.customer add index idx_03(c_phone,c_since);
Query OK, 0 rows affected (10 min 44.26 sec)

[[email protected]][tpcc][06:22:43]>  set global tidb_ddl_reorg_worker_cnt=16;
Query OK, 0 rows affected (0.02 sec)

[[email protected]][tpcc][06:23:01]> alter table tpcc.customer add index idx_03(c_phone,c_since);
Query OK, 0 rows affected (9 min 6.01 sec)

[[email protected]][tpcc][08:38:03]> alter table tpcc.customer drop index  idx_03;
Query OK, 0 rows affected (0.33 sec)

[[email protected]][tpcc][08:38:18]>  set global tidb_ddl_reorg_worker_cnt=32;
Query OK, 0 rows affected (0.02 sec)

[[email protected]][tpcc][08:38:29]> alter table tpcc.customer add index idx_03(c_phone,c_since);
Query OK, 0 rows affected (8 min 35.48 sec)


加索引时间

--平均每秒6000行,4个worker
select 30000000/(6000*4)/60;
--平均每秒1600行,32个worker
select 30000000/(1600*32)/60;

tidb日志中可以看到:当tidb_ddl_reorg_worker_cnt=4时,创建索引的读取数据的速度。单线程每秒5k-7k行左右。 当tidb_ddl_reorg_worker_cnt=32时,创建索引的读取数据的速度。单线程每秒只有1600行左右。

[2023/06/19 17:35:06.521 +08:00] [INFO] [backfilling.go:271] ["[ddl] backfill worker back fill index"] [workerID=1] [addedCount=90624] [scanCount=90624] [nextHandle=12317094.next] [speed(rows/s)=6611.8594994214745]
[2023/06/19 17:35:06.521 +08:00] [INFO] [backfilling.go:271] ["[ddl] backfill worker back fill index"] [workerID=3] [addedCount=90624] [scanCount=90624] [nextHandle=12609699.next] [speed(rows/s)=6681.175296206864]
[2023/06/19 17:35:06.689 +08:00] [INFO] [backfilling.go:271] ["[ddl] backfill worker back fill index"] [workerID=2] [addedCount=90624] [scanCount=90624] [nextHandle=12463112.next] [speed(rows/s)=6661.681631740738]
[2023/06/19 17:35:06.904 +08:00] [INFO] [backfilling.go:271] ["[ddl] backfill worker back fill index"] [workerID=0] [addedCount=90624] [scanCount=90624] [nextHandle=12171054.next] [speed(rows/s)=6364.5596641745815]

....

[2023/06/20 08:45:14.810 +08:00] [INFO] [backfilling.go:271] ["[ddl] backfill worker back fill index"] [workerID=28] [addedCount=30208] [scanCount=30208] [nextHandle=27312608.next] [speed(rows/s)=1657.5438087595428]

3、v6.5.0测试3kw的表加索引

确认TiDB版本和表信息

[[email protected]][tpcc][09:14:33]> select version();
+--------------------+
| version()          |
+--------------------+
| 5.7.25-TiDB-v6.5.0 |
+--------------------+
1 row in set (0.00 sec)

[[email protected]][tpcc][09:14:47]> select count(0) from tpcc.customer;
+----------+
| count(0) |
+----------+
| 30000000 |
+----------+
1 row in set (0.02 sec)

测试加索引速度

alter table tpcc.customer add index idx_01(c_city);
alter table tpcc.customer add index idx_02(c_phone);
alter table tpcc.customer add index idx_03(c_phone,c_since);

默认:tidb_ddl_reorg_worker_cnt=4

[[email protected]][tpcc][09:19:36]> show variables like 'tidb_ddl_reorg_worker_cnt';
+---------------------------+-------+
| Variable_name             | Value |
+---------------------------+-------+
| tidb_ddl_reorg_worker_cnt | 4     |
+---------------------------+-------+
1 row in set (0.00 sec)

[[email protected]][tpcc][09:19:40]> alter table tpcc.customer add index idx_01(c_city);
Query OK, 0 rows affected (1 min 19.36 sec)

[[email protected]][tpcc][09:21:03]> alter table tpcc.customer add index idx_02(c_phone);
Query OK, 0 rows affected (1 min 19.49 sec)

[[email protected]][tpcc][09:23:51]> alter table tpcc.customer add index idx_03(c_phone,c_since);
Query OK, 0 rows affected (1 min 16.22 sec)



分别测试tidb_ddl_reorg_worker_cnt=8、16、32

set global tidb_ddl_reorg_worker_cnt=8;
alter table tpcc.customer drop index idx_03;
alter table tpcc.customer add index idx_03(c_phone,c_since);

set global tidb_ddl_reorg_worker_cnt=16;
alter table tpcc.customer drop index idx_03;
alter table tpcc.customer add index idx_03(c_phone,c_since);

set global tidb_ddl_reorg_worker_cnt=32;
alter table tpcc.customer drop index idx_03;
alter table tpcc.customer add index idx_03(c_phone,c_since);

操作日志:

[[email protected]][tpcc][09:39:18]> set global tidb_ddl_reorg_worker_cnt=8;
Query OK, 0 rows affected (0.02 sec)

[[email protected]][tpcc][09:40:07]> alter table tpcc.customer drop index idx_03;
Query OK, 0 rows affected (0.73 sec)

[[email protected]][tpcc][09:40:08]> alter table tpcc.customer add index idx_03(c_phone,c_since);
Query OK, 0 rows affected (53.23 sec)

[[email protected]][tpcc][09:41:03]> set global tidb_ddl_reorg_worker_cnt=16;
Query OK, 0 rows affected (0.02 sec)

[[email protected]][tpcc][09:41:41]> alter table tpcc.customer drop index idx_03;
Query OK, 0 rows affected (0.49 sec)

[[email protected]][tpcc][09:41:41]> alter table tpcc.customer add index idx_03(c_phone,c_since);
Query OK, 0 rows affected (43.99 sec)

[[email protected]][tpcc][09:42:27]> set global tidb_ddl_reorg_worker_cnt=32;
Query OK, 0 rows affected (0.24 sec)

[[email protected]][tpcc][09:43:24]> alter table tpcc.customer drop index idx_03;
Query OK, 0 rows affected (0.68 sec)

[[email protected]][tpcc][09:43:25]> alter table tpcc.customer add index idx_03(c_phone,c_since);
Query OK, 0 rows affected (38.04 sec)

索引加速功能

执行 ADMIN SHOW DDL JOBS 语句查看 JOB_TYPE 一列中是否含有 ingest 字样

[[email protected]][tpcc][09:17:35]> admin show ddl jobs;
+--------+---------+------------+------------------------+----------------------+-----------+----------+-----------+---------------------+---------------------+---------------------+---------+
| JOB_ID | DB_NAME | TABLE_NAME | JOB_TYPE               | SCHEMA_STATE         | SCHEMA_ID | TABLE_ID | ROW_COUNT | CREATE_TIME         | START_TIME          | END_TIME            | STATE   |
+--------+---------+------------+------------------------+----------------------+-----------+----------+-----------+---------------------+---------------------+---------------------+---------+
|  23464 | tpcc    | customer   | add index /* ingest */ | write reorganization |     23386 |    23447 |  25853098 | 2023-06-21 09:22:31 | 2023-06-21 09:22:32 | NULL                | running |
|  23463 | tpcc    | customer   | add index /* ingest */ | public               |     23386 |    23447 |  30000000 | 2023-06-21 09:19:44 | 2023-06-21 09:19:44 | 2023-06-21 09:21:03 | synced  |
|  23462 | tpcc    | customer   | drop index             | none                 |     23386 |    23447 |         0 | 2023-06-21 09:19:34 | 2023-06-21 09:19:34 | 2023-06-21 09:19:34 | synced  |
|  23461 | tpcc    | customer   | add index /* ingest */ | public               |     23386 |    23447 |  30000000 | 2023-06-21 09:16:40 | 2023-06-21 09:16:40 | 2023-06-21 09:17:44 | synced  |

相关参数:

tidb_ddl_enable_fast_reorg
tidb_ddl_disk_quota

https://docs.pingcap.com/zh/tidb/stable/system-variables#tidb_ddl_enable_fast_reorg-span-classversion-mark%E4%BB%8E-v630-%E7%89%88%E6%9C%AC%E5%BC%80%E5%A7%8B%E5%BC%95%E5%85%A5span

在升级到 v6.5.0 及以上版本时,建议你检查 TiDB 的 temp-dir 路径是否正确挂载了 SSD 磁盘。该参数是 TiDB 的配置参数,设置后需要重启 TiDB 才能生效。因此,在升级前提前进行设置,可以避免再次重启。

使用快速索引功能,建索引的时候,数据会回填到temp-dir指定目录,过程中可以看到该目录空间在持续增长。


[tidb@host_130 tmp_ddl-4000]$ du -sh /tmp/tidb/tmp_ddl-4000/
0 /tmp/tidb/tmp_ddl-4000/
[tidb@host_130 tmp_ddl-4000]$ du -sh /tmp/tidb/tmp_ddl-4000/
88M /tmp/tidb/tmp_ddl-4000/
[tidb@host_130 tmp_ddl-4000]$ du -sh /tmp/tidb/tmp_ddl-4000/
193M  /tmp/tidb/tmp_ddl-4000/
[tidb@host_130 tmp_ddl-4000]$ tree /tmp/tidb/tmp_ddl-4000/
/tmp/tidb/tmp_ddl-4000/
└── 23464
    ├── d4cacd68-ac06-558d-8dca-9bd229f322d2
    │   ├── 000002.log
    │   ├── CURRENT
    │   ├── LOCK
    │   ├── MANIFEST-000001
    │   └── OPTIONS-000003
    └── d4cacd68-ac06-558d-8dca-9bd229f322d2.sst
        ├── 010ff049-964c-4f42-8880-47d69a44fede.sst
        ├── 078d2401-a319-4a62-928a-9af030faf742.sst
        ├── 2497f448-a7fe-4e6f-a9be-cea2bd4413c4.sst
        ├── 26ac2d83-2ca9-4962-80a3-2dcdf556e3f1.sst
        ├── 39549396-c9f2-4f06-8edf-7b325376e6a4.sst
        ├── 69be99c9-cc7c-42bf-97cc-f67547d72865.sst
        ├── 70b137b4-0fcf-4139-8c25-d03a4a1ef932.sst
        ├── 735fe13f-cf00-468c-91f9-2d9578064d60.sst
        ├── 7c50175e-f97f-49e5-8a5b-eb7ea4acbfb1.sst
        ├── 86e551b8-93fb-4862-82cb-a3a0c34ce280.sst
        ├── 913eef57-b90b-45df-b1d5-85ee52a95471.sst
        ├── 9d736cbb-fc5b-4df2-a203-aab03afdedea.sst
        ├── def3a75f-bee9-4e45-b1ae-a8085f7daa57.sst
        ├── e0151d63-ff16-4f09-946f-c18f35fb1e94.sst
        └── f32bbfe0-9672-4f25-a20f-af391e7c50d6.sst

3 directories, 20 files

4、v7.1.0测试3kw的表加索引

TiDB版本和表信息

[[email protected]][tpcc][08:40:23]> select version();
+--------------------+
| version()          |
+--------------------+
| 5.7.25-TiDB-v7.1.0 |
+--------------------+
1 row in set (0.00 sec)

[[email protected]][tpcc][08:40:42]> select count(0) from tpcc.customer;
+----------+
| count(0) |
+----------+
| 30000000 |
+----------+
1 row in set (4.00 sec)

不同worker下,加索引速度对比

默认:tidb_ddl_reorg_worker_cnt=4

[[email protected]][tpcc][08:40:55]> set global tidb_ddl_reorg_worker_cnt=4;
Query OK, 0 rows affected (0.04 sec)

[[email protected]][tpcc][08:41:20]> show variables like 'tidb_ddl_%';
+--------------------------------+--------------+
| Variable_name                  | Value        |
+--------------------------------+--------------+
| tidb_ddl_disk_quota            | 107374182400 |
| tidb_ddl_enable_fast_reorg     | ON           |
| tidb_ddl_error_count_limit     | 512          |
| tidb_ddl_flashback_concurrency | 64           |
| tidb_ddl_reorg_batch_size      | 256          |
| tidb_ddl_reorg_priority        | PRIORITY_LOW |
| tidb_ddl_reorg_worker_cnt      | 4            |
+--------------------------------+--------------+
7 rows in set (0.01 sec)


[[email protected]][tpcc][10:37:58]> alter table tpcc.customer add index idx_01(c_city);
Query OK, 0 rows affected (1 min 8.36 sec)

[[email protected]][tpcc][10:29:31]> alter table tpcc.customer add index idx_02(c_phone);
Query OK, 0 rows affected (1 min 7.93 sec)

[[email protected]][tpcc][10:36:00]> alter table tpcc.customer add index idx_03(c_phone,c_since);
Query OK, 0 rows affected (1 min 7.64 sec)


分别测试tidb_ddl_reorg_worker_cnt=8、16、32

set global tidb_ddl_reorg_worker_cnt=8;
alter table tpcc.customer drop index idx_03;
alter table tpcc.customer add index idx_03(c_phone,c_since);

set global tidb_ddl_reorg_worker_cnt=16;
alter table tpcc.customer drop index idx_03;
alter table tpcc.customer add index idx_03(c_phone,c_since);

set global tidb_ddl_reorg_worker_cnt=32;
alter table tpcc.customer drop index idx_03;
alter table tpcc.customer add index idx_03(c_phone,c_since);

执行记录:

[[email protected]][tpcc][10:43:58]> set global tidb_ddl_reorg_worker_cnt=8;
Query OK, 0 rows affected (0.03 sec)

[[email protected]][tpcc][10:44:00]> alter table tpcc.customer drop index idx_03;
Query OK, 0 rows affected (0.47 sec)

[[email protected]][tpcc][10:44:01]> alter table tpcc.customer add index idx_03(c_phone,c_since);
Query OK, 0 rows affected (44.50 sec)

[[email protected]][tpcc][10:44:47]> set global tidb_ddl_reorg_worker_cnt=16;
Query OK, 0 rows affected (0.01 sec)

[[email protected]][tpcc][10:46:04]> alter table tpcc.customer drop index idx_03;
Query OK, 0 rows affected (0.47 sec)

[[email protected]][tpcc][10:46:05]> alter table tpcc.customer add index idx_03(c_phone,c_since);
Query OK, 0 rows affected (36.20 sec)

[[email protected]][tpcc][10:46:42]> set global tidb_ddl_reorg_worker_cnt=32;
Query OK, 0 rows affected (0.02 sec)

[[email protected]][tpcc][10:46:46]> alter table tpcc.customer drop index idx_03;
Query OK, 0 rows affected (0.41 sec)

[[email protected]][tpcc][10:46:46]> alter table tpcc.customer add index idx_03(c_phone,c_since);
Query OK, 0 rows affected (31.45 sec)

不同tidb_ddl_reorg_batch_size下速度对比

tidb_ddl_reorg_batch_size 默认256.

set global tidb_ddl_reorg_worker_cnt=4;

set global tidb_ddl_reorg_batch_size=512;
alter table tpcc.customer drop index idx_03;
alter table tpcc.customer add index idx_03(c_phone,c_since);

set global tidb_ddl_reorg_batch_size=1024;
alter table tpcc.customer drop index idx_03;
alter table tpcc.customer add index idx_03(c_phone,c_since);


set global tidb_ddl_reorg_batch_size=2048;
alter table tpcc.customer drop index idx_03;
alter table tpcc.customer add index idx_03(c_phone,c_since);

操作日志:

[[email protected]][tpcc][05:23:46]> set global tidb_ddl_reorg_batch_size=512;
Query OK, 0 rows affected (0.02 sec)

[[email protected]][tpcc][05:23:53]> alter table tpcc.customer drop index idx_03;
Query OK, 0 rows affected (0.44 sec)

[[email protected]][tpcc][05:23:54]> alter table tpcc.customer add index idx_03(c_phone,c_since);
Query OK, 0 rows affected (1 min 5.60 sec)

[[email protected]][tpcc][05:25:01]> show variables like 'tidb_ddl_%';
+--------------------------------+--------------+
| Variable_name                  | Value        |
+--------------------------------+--------------+
| tidb_ddl_disk_quota            | 107374182400 |
| tidb_ddl_enable_fast_reorg     | ON           |
| tidb_ddl_error_count_limit     | 512          |
| tidb_ddl_flashback_concurrency | 64           |
| tidb_ddl_reorg_batch_size      | 512          |
| tidb_ddl_reorg_priority        | PRIORITY_LOW |
| tidb_ddl_reorg_worker_cnt      | 4            |
+--------------------------------+--------------+
7 rows in set (0.00 sec)

[[email protected]][tpcc][05:25:26]> set global tidb_ddl_reorg_batch_size=1024;
Query OK, 0 rows affected (0.02 sec)

[[email protected]][tpcc][05:25:31]> alter table tpcc.customer drop index idx_03;
Query OK, 0 rows affected (0.47 sec)

[[email protected]][tpcc][05:25:32]> alter table tpcc.customer add index idx_03(c_phone,c_since);

Query OK, 0 rows affected (58.42 sec)

[[email protected]][tpcc][05:26:30]> 
[[email protected]][tpcc][05:26:30]> set global tidb_ddl_reorg_batch_size=2048;
Query OK, 0 rows affected (0.02 sec)

[[email protected]][tpcc][05:26:57]> alter table tpcc.customer drop index idx_03;
Query OK, 0 rows affected (0.42 sec)

[[email protected]][tpcc][05:26:58]> alter table tpcc.customer add index idx_03(c_phone,c_since);
Query OK, 0 rows affected (56.39 sec)

5、小结

版本 表数据量 tidb_ddl_reorg_worker_cnt=4(默认) tidb_ddl_reorg_worker_cnt=8 tidb_ddl_reorg_worker_cnt=16 tidb_ddl_reorg_worker_cnt=32
v6.1.0 3000w 21min 10min44s 9min6s 8min35s
v6.5.0 3000w 1min19s 53s 43s 38s
v7.1.0 3000w 1min8s 44s 36s 31s

从上面的数据我们可以看到,在V6.5.0版本中,默认4个worker下,建索引速度对比v6.1.0提升了约20倍,随着worker个数增加速度会进一步提升,但是越到后面提升越少,可能跟数据量有关系哈,测试结果仅供参考。

V7.1.0,相比V6.5.0也有一定提升,在worker为8和16的情况下,大约提升了16%((53-44)/53=0.1698,(43-36)/43=0.1628)。

另外,在V7.1.0下,测试了不同tidb_ddl_reorg_batch_size下的影响,也是会有一定提升。

版本 表数据量 tidb_ddl_reorg_batch_size=512(默认) tidb_ddl_reorg_batch_size=1024 tidb_ddl_reorg_batch_size=2048
v7.1.0 3000w 1min5s 58s 56s

以上就是在同一台机器上测试不同TiDB版本建索引速度的对比。V6.3开始新增的索引加速功能确实提升非常大,还在用老版本的同学,快升级测试下吧。

你可能感兴趣的:(tidb,数据库,算法)