phoenix 查询优化 40秒变为0.02秒

 

phoenix 5.0

 

数据量不多,就是100万,测试一般是够用的。

0: jdbc:phoenix:192.168.199.154> select count(1) from T_EXTENSION_ALL_DATAS_SHOW;
+-----------+
| COUNT(1)  |
+-----------+
| 999999    |
+-----------+
1 row selected (8.714 seconds)

 

其中rowkey,主键是:CONSTRAINT PK PRIMARY KEY (SHOW_DATE, SEQ_ID, EMAIL)

0: jdbc:phoenix:192.168.199.154> select * from T_EXTENSION_ALL_DATAS_SHOW limit 2;
+-------------+---------+------------+-------------+-----------+---------------+----------+------------+-------+----------------------+
|  SHOW_DATE  | SEQ_ID  |   EMAIL    | TIME_SPEND  | CAM_SITE  | TOKEN_EARNED  | REVENUE  | TIPS_SENT  |  TOY  |     CREATED_DATE     |
+-------------+---------+------------+-------------+-----------+---------------+----------+------------+-------+----------------------+
| 2018-11-24  | 1       | [email protected]  | 65          | cam4      | 20.5          | 200.5    | 21         | ambi  | 2018-11-24 15:22:40  |
| 2018-11-24  | 2       | [email protected]  | 65          | cam4      | 20.5          | 200.5    | 21         | ambi  | 2018-11-24 15:22:40  |
+-------------+---------+------------+-------------+-----------+---------------+----------+------------+-------+----------------------+
2 rows selected (0.273 seconds)

 

-- 查询条件只有日期,查询最大ID 。可以看到速度还是很快的。

0: jdbc:phoenix:192.168.199.154> select seq_id from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' order by seq_id desc limit 1;
+---------+
| SEQ_ID  |
+---------+
| 999999  |
+---------+
1 row selected (0.032 seconds)


0: jdbc:phoenix:192.168.199.154> explain select seq_id from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' order by seq_id desc limit 1;
+-----------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                                      PLAN                                                       | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+-----------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK 1 ROWS 715 BYTES SERIAL 1-WAY REVERSE RANGE SCAN OVER T_EXTENSION_ALL_DATAS_SHOW ['2018-11-24']  | 715             | 1              | 0            |
|     SERVER FILTER BY FIRST KEY ONLY                                                                             | 715             | 1              | 0            |
|     SERVER 1 ROW LIMIT                                                                                          | 715             | 1              | 0            |
| CLIENT 1 ROW LIMIT                                                                                              | 715             | 1              | 0            |
+-----------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
4 rows selected (0.064 seconds)

-- 查询条件,增加email。直接超时,报错。

0: jdbc:phoenix:192.168.199.154> select seq_id from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and email='[email protected]' order by seq_id desc limit 1;
Error: org.apache.phoenix.exception.PhoenixIOException: Failed after attempts=16, exceptions:
Sat Nov 24 15:43:47 CST 2018, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=60121

 

-- 为email增加全局二级索引,然后查询。走二级索引,速度就是快。0.059秒

0: jdbc:phoenix:192.168.199.154> CREATE INDEX IDX_T_EXTENSION_ALL_DATAS_SHOW_EMAIL ON T_EXTENSION_ALL_DATAS_SHOW(EMAIL);
999,999 rows affected (56.13 seconds)
0: jdbc:phoenix:192.168.199.154> select seq_id from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and email='[email protected]' order by seq_id desc limit 1;
+---------+
| SEQ_ID  |
+---------+
| 45555   |
+---------+
1 row selected (0.059 seconds)
0: jdbc:phoenix:192.168.199.154> 

 

-- 换成网站的查询条件,需要57秒!

0: jdbc:phoenix:192.168.199.154> select seq_id from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and cam_site='cam4' order by seq_id desc limit 1;
+---------+
| SEQ_ID  |
+---------+
| 499999  |
+---------+
1 row selected (57.825 seconds)

0: jdbc:phoenix:192.168.199.154> explain select seq_id from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and cam_site='cam4' order by seq_id desc limit 1;
+------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
|                                                             PLAN                                                             | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
+------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 2-CHUNK 639317 ROWS 314572822 BYTES PARALLEL 1-WAY REVERSE RANGE SCAN OVER T_EXTENSION_ALL_DATAS_SHOW ['2018-11-24']  | 314572822       | 639317         | 1543044967313  |
|     SERVER FILTER BY CAM_SITE = 'cam4'                                                                                       | 314572822       | 639317         | 1543044967313  |
|     SERVER 1 ROW LIMIT                                                                                                       | 314572822       | 639317         | 1543044967313  |
| CLIENT 1 ROW LIMIT                                                                                                           | 314572822       | 639317         | 1543044967313  |
+------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
4 rows selected (0.069 seconds)

 

-- 老样子,为网站字段建立。二级索引,从57秒 变为 0.024秒!

0: jdbc:phoenix:192.168.199.154> CREATE INDEX IDX_T_EXTENSION_ALL_DATAS_SHOW_CAM_SITE ON T_EXTENSION_ALL_DATAS_SHOW(CAM_SITE);
999,999 rows affected (59.081 seconds)

0: jdbc:phoenix:192.168.199.154> select seq_id from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and cam_site='cam4' order by seq_id desc limit 1;
+---------+
| SEQ_ID  |
+---------+
| 499999  |
+---------+
1 row selected (0.024 seconds)

 

-- 这次,我们在网站的基础上,增加一个玩具条件,会增么样呢?

-- 可以看到日期走了 RANGE SCAN,而网站和玩具都是FILTER,所以特别慢。

0: jdbc:phoenix:192.168.199.154> select seq_id from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and cam_site='cam4' and toy='ambi' order by seq_id desc limit 1;
+---------+
| SEQ_ID  |
+---------+
| 499999  |
+---------+
1 row selected (56.94 seconds)

0: jdbc:phoenix:192.168.199.154> explain select seq_id from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and cam_site='cam4' and toy='ambi' order by seq_id desc limit 1;
+------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
|                                                             PLAN                                                             | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
+------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 2-CHUNK 639317 ROWS 314572822 BYTES PARALLEL 1-WAY REVERSE RANGE SCAN OVER T_EXTENSION_ALL_DATAS_SHOW ['2018-11-24']  | 314572822       | 639317         | 1543044967313  |
|     SERVER FILTER BY (CAM_SITE = 'cam4' AND TOY = 'ambi')                                                                    | 314572822       | 639317         | 1543044967313  |
|     SERVER 1 ROW LIMIT                                                                                                       | 314572822       | 639317         | 1543044967313  |
| CLIENT 1 ROW LIMIT                                                                                                           | 314572822       | 639317         | 1543044967313  |
+------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
4 rows selected (0.029 seconds)
0: jdbc:phoenix:192.168.199.154> 

 

-- 这时候我们需要 建立组合二级索引,才能满足查询需求。速度又提升到了0.039秒。哈哈

0: jdbc:phoenix:192.168.199.154> CREATE INDEX IDX_T_EXTENSION_ALL_DATAS_SHOW_CAM_SITE_TOY ON T_EXTENSION_ALL_DATAS_SHOW(CAM_SITE,TOY);
999,999 rows affected (56.776 seconds)

0: jdbc:phoenix:192.168.199.154> select seq_id from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and cam_site='cam4' and toy='ambi' order by seq_id desc limit 1;
+---------+
| SEQ_ID  |
+---------+
| 499999  |
+---------+
1 row selected (0.039 seconds)


0: jdbc:phoenix:192.168.199.154> explain select seq_id from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and cam_site='cam4' and toy='ambi' order by seq_id desc limit 1;
+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                                                     PLAN                                                                      | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK 1 ROWS 73 BYTES SERIAL 1-WAY REVERSE RANGE SCAN OVER IDX_T_EXTENSION_ALL_DATAS_SHOW_CAM_SITE_TOY ['cam4','ambi','2018-11-24']  | 73              | 1              | 0            |
|     SERVER FILTER BY FIRST KEY ONLY                                                                                                           | 73              | 1              | 0            |
|     SERVER 1 ROW LIMIT                                                                                                                        | 73              | 1              | 0            |
| CLIENT 1 ROW LIMIT                                                                                                                            | 73              | 1              | 0            |
+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
4 rows selected (0.033 seconds)

 

这个时候我就问自己,功能页面多一个查询条件,就要多建立好几个二级索引。才能满足速度要求。

比如 cam_site 和 toy,都不是必填项。

用户

可能只输入cam_site,需要1个二级索引。

可能只输入toy,需要1个二级索引。

可能同时输入两个,需要1个二级索引。

简单的两个条件,就要3个索引。

 

比如页面有6个查询条件!!需要多少二级索引!!! 几何倍增长? 疯了疯了

同一张表,索引数量不得超过10,索引表越多,插入数据越慢!!

 

这时候,其实想法 比 技术要重要。

1、从需求方面 (大数据查询,不适合太多条件的查询)。无意义的查询条件,统统PK掉。

     需要规定一个必填的查询字段,比如最通用的:时间(yyyy-MM-dd)..

2、从表设计方面,row key可以是联合主键。可以利用这点,减少二级索引数量。比如show_date 就是主键。

3、从逻辑方面,比如我需要这几个查询条件,需要几个二级索引??

时间、邮箱、网站、玩具

实际上,我只要4个二级索引,就够了。

时间走 rowkey这就不说了。

如果用户输入 时间、邮箱、玩具,那么怎么处理? 耗时也挺长的41秒。

0: jdbc:phoenix:192.168.199.154> select seq_id from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and email='[email protected]' and toy='ambi' order by seq_id desc limit 1;
+---------+
| SEQ_ID  |
+---------+
| 45555   |
+---------+
1 row selected (41.395 seconds)


0: jdbc:phoenix:192.168.199.154> select seq_id from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and email='[email protected]';
+---------+
| SEQ_ID  |
+---------+
| 45555   |
+---------+
1 row selected (0.035 seconds)


0: jdbc:phoenix:192.168.199.154> select t1.seq_id as seq_id from (select seq_id,toy from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and email='[email protected]') t1 where t1.toy='ambi' order by t1.seq_id desc limit 1;
+---------+
| SEQ_ID  |
+---------+
| 45555   |
+---------+
1 row selected (40.367 seconds)


0: jdbc:phoenix:192.168.199.154> explain select t1.seq_id as seq_id from (select seq_id,toy from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and email='[email protected]') t1 where t1.toy='ambi' order by t1.seq_id desc limit 1;
+-----------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                                      PLAN                                                       | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+-----------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY REVERSE RANGE SCAN OVER IDX_T_EXTENSION_ALL_DATAS_SHOW_TOY ['ambi','2018-11-24']  | null            | null           | null         |
|     SERVER FILTER BY FIRST KEY ONLY AND "EMAIL" = '[email protected]'                                               | null            | null           | null         |
|     SERVER 1 ROW LIMIT                                                                                          | null            | null           | null         |
| CLIENT 1 ROW LIMIT                                                                                              | null            | null           | null         |
+-----------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
4 rows selected (0.04 seconds)

从执行计划来看,我想需要改造一下,email二级索引。

0: jdbc:phoenix:192.168.199.154> drop index IDX_T_EXTENSION_ALL_DATAS_SHOW_EMAIL on T_EXTENSION_ALL_DATAS_SHOW;
No rows affected (2.275 seconds)


0: jdbc:phoenix:192.168.199.154> CREATE INDEX IDX_T_EXTENSION_ALL_DATAS_SHOW_EMAIL ON T_EXTENSION_ALL_DATAS_SHOW(EMAIL) INCLUDE(CAM_SITE,TOY);
999,999 rows affected (76.893 seconds)


0: jdbc:phoenix:192.168.199.154> select t1.seq_id as seq_id from (select seq_id,toy from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and email='[email protected]') t1 where t1.toy='ambi' order by t1.seq_id desc limit 1;
+---------+
| SEQ_ID  |
+---------+
| 45555   |
+---------+
1 row selected (40.06 seconds)


0: jdbc:phoenix:192.168.199.154> explain select t1.seq_id as seq_id from (select seq_id,toy from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and email='[email protected]') t1 where t1.toy='ambi' order by t1.seq_id desc limit 1;
+-----------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                                      PLAN                                                       | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+-----------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY REVERSE RANGE SCAN OVER IDX_T_EXTENSION_ALL_DATAS_SHOW_TOY ['ambi','2018-11-24']  | null            | null           | null         |
|     SERVER FILTER BY FIRST KEY ONLY AND "EMAIL" = '[email protected]'                                               | null            | null           | null         |
|     SERVER 1 ROW LIMIT                                                                                          | null            | null           | null         |
| CLIENT 1 ROW LIMIT                                                                                              | null            | null           | null         |
+-----------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
4 rows selected (0.034 seconds)


0: jdbc:phoenix:192.168.199.154> select seq_id,toy from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and email='[email protected]';
+---------+-------+
| SEQ_ID  |  TOY  |
+---------+-------+
| 45555   | ambi  |
+---------+-------+
1 row selected (0.07 seconds)



执行的结果,还是不理想。完整的要40秒,连执行计划都没有任何变化。~~~~(>_<)~~~~

但是,子查询真的是非常快的,为啥包了一层就慢了40秒????

 

没办法出绝招了:Hint 

0: jdbc:phoenix:192.168.199.154> select /*+ INDEX(T_EXTENSION_ALL_DATAS_SHOW IDX_T_EXTENSION_ALL_DATAS_SHOW_EMAIL) */ t1.seq_id as seq_id from (select seq_id,toy from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and email='[email protected]') t1 where t1.toy='ambi' order by t1.seq_id desc limit 1;
+---------+
| SEQ_ID  |
+---------+
| 45555   |
+---------+
1 row selected (0.017 seconds)


0: jdbc:phoenix:192.168.199.154> explain select /*+ INDEX(T_EXTENSION_ALL_DATAS_SHOW IDX_T_EXTENSION_ALL_DATAS_SHOW_EMAIL) */ t1.seq_id as seq_id from (select seq_id,toy from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and email='[email protected]') t1 where t1.toy='ambi' order by t1.seq_id desc limit 1;
+---------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
|                                                                          PLAN                                                                           | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
+---------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 1-CHUNK 1182604 ROWS 314572800 BYTES PARALLEL 1-WAY REVERSE RANGE SCAN OVER IDX_T_EXTENSION_ALL_DATAS_SHOW_EMAIL ['[email protected]','2018-11-24']  | 314572800       | 1182604        | 1543050902166  |
|     SERVER FILTER BY "TOY" = 'ambi'                                                                                                                     | 314572800       | 1182604        | 1543050902166  |
|     SERVER 1 ROW LIMIT                                                                                                                                  | 314572800       | 1182604        | 1543050902166  |
| CLIENT 1 ROW LIMIT                                                                                                                                      | 314572800       | 1182604        | 1543050902166  |
+---------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
4 rows selected (0.048 seconds)

哈哈,太好了,非常完美。

phoenix ,通常是会自动选择最优的二级索引,但有时它并不是很聪明。

这个时候就需要我们告诉它,应该使用哪个二级索引!!

 

方法很简单,从逻辑角度 邮箱就是最细粒度。如果查询条件,有最细粒度。

先拿最细粒度字段走二级索引查询结果。(进过最细粒度条件的过滤,这个结果集就小了很多很多!)

再将结果集,使用玩具名称过滤。

意思就是查两次,速度一样很快。哈哈

 

 

注意:有时走二级索引,不一定会比过滤快?(结果集小的时候,filter有优势!)

另一种方式,也需要1秒多。这种方式,邮箱、时间走得二级索引、玩具也是走二级索引。就像是饶了一个大弯,哈哈。

0: jdbc:phoenix:192.168.199.154> select t1.seq_id from T_EXTENSION_ALL_DATAS_SHOW t1 inner join (select show_date,email,seq_id,toy from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and email='[email protected]') t2 on(t1.show_date=t2.show_date and t1.seq_id= t2.seq_id) where t1.toy='ambi' order by t1.seq_id desc limit 1;
+------------+
| T1.SEQ_ID  |
+------------+
| 45555      |
+------------+
1 row selected (1.171 seconds)


0: jdbc:phoenix:192.168.199.154> explain select t1.seq_id from T_EXTENSION_ALL_DATAS_SHOW t1 inner join (select show_date,email,seq_id,toy from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and email='[email protected]') t2 on(t1.show_date=t2.show_date and t1.seq_id= t2.seq_id) where t1.toy='ambi' order by t1.seq_id desc limit 1;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
|                                                                                PLAN                                                                                 | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY RANGE SCAN OVER IDX_T_EXTENSION_ALL_DATAS_SHOW_TOY ['ambi']                                                                           | 314572800       | 1182604        | 1543050902166  |
|     SERVER FILTER BY FIRST KEY ONLY                                                                                                                                 | 314572800       | 1182604        | 1543050902166  |
|     SERVER TOP 1 ROW SORTED BY ["T1.:SEQ_ID" DESC]                                                                                                                  | 314572800       | 1182604        | 1543050902166  |
| CLIENT MERGE SORT                                                                                                                                                   | 314572800       | 1182604        | 1543050902166  |
| CLIENT LIMIT 1                                                                                                                                                      | 314572800       | 1182604        | 1543050902166  |
|     PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE)                                                                                                                        | 314572800       | 1182604        | 1543050902166  |
|         CLIENT 1-CHUNK 1182604 ROWS 314572800 BYTES PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER IDX_T_EXTENSION_ALL_DATAS_SHOW_EMAIL ['[email protected]','2018-11-24']  | 314572800       | 1182604        | 1543050902166  |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
7 rows selected (0.051 seconds)

 

关于phoenix查询的优化,今天就到这里。END

 

  

     

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

你可能感兴趣的:(Phoenix)