phoenix 5.0
数据量不多,就是100万,测试一般是够用的。
0: jdbc:phoenix:192.168.199.154> select count(1) from T_EXTENSION_ALL_DATAS_SHOW;
+-----------+
| COUNT(1) |
+-----------+
| 999999 |
+-----------+
1 row selected (8.714 seconds)
其中rowkey,主键是:CONSTRAINT PK PRIMARY KEY (SHOW_DATE, SEQ_ID, EMAIL)
0: jdbc:phoenix:192.168.199.154> select * from T_EXTENSION_ALL_DATAS_SHOW limit 2;
+-------------+---------+------------+-------------+-----------+---------------+----------+------------+-------+----------------------+
| SHOW_DATE | SEQ_ID | EMAIL | TIME_SPEND | CAM_SITE | TOKEN_EARNED | REVENUE | TIPS_SENT | TOY | CREATED_DATE |
+-------------+---------+------------+-------------+-----------+---------------+----------+------------+-------+----------------------+
| 2018-11-24 | 1 | [email protected] | 65 | cam4 | 20.5 | 200.5 | 21 | ambi | 2018-11-24 15:22:40 |
| 2018-11-24 | 2 | [email protected] | 65 | cam4 | 20.5 | 200.5 | 21 | ambi | 2018-11-24 15:22:40 |
+-------------+---------+------------+-------------+-----------+---------------+----------+------------+-------+----------------------+
2 rows selected (0.273 seconds)
-- 查询条件只有日期,查询最大ID 。可以看到速度还是很快的。
0: jdbc:phoenix:192.168.199.154> select seq_id from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' order by seq_id desc limit 1;
+---------+
| SEQ_ID |
+---------+
| 999999 |
+---------+
1 row selected (0.032 seconds)
0: jdbc:phoenix:192.168.199.154> explain select seq_id from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' order by seq_id desc limit 1;
+-----------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+-----------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK 1 ROWS 715 BYTES SERIAL 1-WAY REVERSE RANGE SCAN OVER T_EXTENSION_ALL_DATAS_SHOW ['2018-11-24'] | 715 | 1 | 0 |
| SERVER FILTER BY FIRST KEY ONLY | 715 | 1 | 0 |
| SERVER 1 ROW LIMIT | 715 | 1 | 0 |
| CLIENT 1 ROW LIMIT | 715 | 1 | 0 |
+-----------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
4 rows selected (0.064 seconds)
-- 查询条件,增加email。直接超时,报错。
0: jdbc:phoenix:192.168.199.154> select seq_id from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and email='[email protected]' order by seq_id desc limit 1;
Error: org.apache.phoenix.exception.PhoenixIOException: Failed after attempts=16, exceptions:
Sat Nov 24 15:43:47 CST 2018, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=60121
-- 为email增加全局二级索引,然后查询。走二级索引,速度就是快。0.059秒
0: jdbc:phoenix:192.168.199.154> CREATE INDEX IDX_T_EXTENSION_ALL_DATAS_SHOW_EMAIL ON T_EXTENSION_ALL_DATAS_SHOW(EMAIL);
999,999 rows affected (56.13 seconds)
0: jdbc:phoenix:192.168.199.154> select seq_id from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and email='[email protected]' order by seq_id desc limit 1;
+---------+
| SEQ_ID |
+---------+
| 45555 |
+---------+
1 row selected (0.059 seconds)
0: jdbc:phoenix:192.168.199.154>
-- 换成网站的查询条件,需要57秒!
0: jdbc:phoenix:192.168.199.154> select seq_id from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and cam_site='cam4' order by seq_id desc limit 1;
+---------+
| SEQ_ID |
+---------+
| 499999 |
+---------+
1 row selected (57.825 seconds)
0: jdbc:phoenix:192.168.199.154> explain select seq_id from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and cam_site='cam4' order by seq_id desc limit 1;
+------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 2-CHUNK 639317 ROWS 314572822 BYTES PARALLEL 1-WAY REVERSE RANGE SCAN OVER T_EXTENSION_ALL_DATAS_SHOW ['2018-11-24'] | 314572822 | 639317 | 1543044967313 |
| SERVER FILTER BY CAM_SITE = 'cam4' | 314572822 | 639317 | 1543044967313 |
| SERVER 1 ROW LIMIT | 314572822 | 639317 | 1543044967313 |
| CLIENT 1 ROW LIMIT | 314572822 | 639317 | 1543044967313 |
+------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
4 rows selected (0.069 seconds)
-- 老样子,为网站字段建立。二级索引,从57秒 变为 0.024秒!
0: jdbc:phoenix:192.168.199.154> CREATE INDEX IDX_T_EXTENSION_ALL_DATAS_SHOW_CAM_SITE ON T_EXTENSION_ALL_DATAS_SHOW(CAM_SITE);
999,999 rows affected (59.081 seconds)
0: jdbc:phoenix:192.168.199.154> select seq_id from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and cam_site='cam4' order by seq_id desc limit 1;
+---------+
| SEQ_ID |
+---------+
| 499999 |
+---------+
1 row selected (0.024 seconds)
-- 这次,我们在网站的基础上,增加一个玩具条件,会增么样呢?
-- 可以看到日期走了 RANGE SCAN,而网站和玩具都是FILTER,所以特别慢。
0: jdbc:phoenix:192.168.199.154> select seq_id from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and cam_site='cam4' and toy='ambi' order by seq_id desc limit 1;
+---------+
| SEQ_ID |
+---------+
| 499999 |
+---------+
1 row selected (56.94 seconds)
0: jdbc:phoenix:192.168.199.154> explain select seq_id from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and cam_site='cam4' and toy='ambi' order by seq_id desc limit 1;
+------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 2-CHUNK 639317 ROWS 314572822 BYTES PARALLEL 1-WAY REVERSE RANGE SCAN OVER T_EXTENSION_ALL_DATAS_SHOW ['2018-11-24'] | 314572822 | 639317 | 1543044967313 |
| SERVER FILTER BY (CAM_SITE = 'cam4' AND TOY = 'ambi') | 314572822 | 639317 | 1543044967313 |
| SERVER 1 ROW LIMIT | 314572822 | 639317 | 1543044967313 |
| CLIENT 1 ROW LIMIT | 314572822 | 639317 | 1543044967313 |
+------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
4 rows selected (0.029 seconds)
0: jdbc:phoenix:192.168.199.154>
-- 这时候我们需要 建立组合二级索引,才能满足查询需求。速度又提升到了0.039秒。哈哈
0: jdbc:phoenix:192.168.199.154> CREATE INDEX IDX_T_EXTENSION_ALL_DATAS_SHOW_CAM_SITE_TOY ON T_EXTENSION_ALL_DATAS_SHOW(CAM_SITE,TOY);
999,999 rows affected (56.776 seconds)
0: jdbc:phoenix:192.168.199.154> select seq_id from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and cam_site='cam4' and toy='ambi' order by seq_id desc limit 1;
+---------+
| SEQ_ID |
+---------+
| 499999 |
+---------+
1 row selected (0.039 seconds)
0: jdbc:phoenix:192.168.199.154> explain select seq_id from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and cam_site='cam4' and toy='ambi' order by seq_id desc limit 1;
+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK 1 ROWS 73 BYTES SERIAL 1-WAY REVERSE RANGE SCAN OVER IDX_T_EXTENSION_ALL_DATAS_SHOW_CAM_SITE_TOY ['cam4','ambi','2018-11-24'] | 73 | 1 | 0 |
| SERVER FILTER BY FIRST KEY ONLY | 73 | 1 | 0 |
| SERVER 1 ROW LIMIT | 73 | 1 | 0 |
| CLIENT 1 ROW LIMIT | 73 | 1 | 0 |
+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
4 rows selected (0.033 seconds)
这个时候我就问自己,功能页面多一个查询条件,就要多建立好几个二级索引。才能满足速度要求。
比如 cam_site 和 toy,都不是必填项。
用户
可能只输入cam_site,需要1个二级索引。
可能只输入toy,需要1个二级索引。
可能同时输入两个,需要1个二级索引。
简单的两个条件,就要3个索引。
比如页面有6个查询条件!!需要多少二级索引!!! 几何倍增长? 疯了疯了
同一张表,索引数量不得超过10,索引表越多,插入数据越慢!!
这时候,其实想法 比 技术要重要。
1、从需求方面 (大数据查询,不适合太多条件的查询)。无意义的查询条件,统统PK掉。
需要规定一个必填的查询字段,比如最通用的:时间(yyyy-MM-dd)..
2、从表设计方面,row key可以是联合主键。可以利用这点,减少二级索引数量。比如show_date 就是主键。
3、从逻辑方面,比如我需要这几个查询条件,需要几个二级索引??
时间、邮箱、网站、玩具
实际上,我只要4个二级索引,就够了。
时间走 rowkey这就不说了。
如果用户输入 时间、邮箱、玩具,那么怎么处理? 耗时也挺长的41秒。
0: jdbc:phoenix:192.168.199.154> select seq_id from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and email='[email protected]' and toy='ambi' order by seq_id desc limit 1;
+---------+
| SEQ_ID |
+---------+
| 45555 |
+---------+
1 row selected (41.395 seconds)
0: jdbc:phoenix:192.168.199.154> select seq_id from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and email='[email protected]';
+---------+
| SEQ_ID |
+---------+
| 45555 |
+---------+
1 row selected (0.035 seconds)
0: jdbc:phoenix:192.168.199.154> select t1.seq_id as seq_id from (select seq_id,toy from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and email='[email protected]') t1 where t1.toy='ambi' order by t1.seq_id desc limit 1;
+---------+
| SEQ_ID |
+---------+
| 45555 |
+---------+
1 row selected (40.367 seconds)
0: jdbc:phoenix:192.168.199.154> explain select t1.seq_id as seq_id from (select seq_id,toy from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and email='[email protected]') t1 where t1.toy='ambi' order by t1.seq_id desc limit 1;
+-----------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+-----------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY REVERSE RANGE SCAN OVER IDX_T_EXTENSION_ALL_DATAS_SHOW_TOY ['ambi','2018-11-24'] | null | null | null |
| SERVER FILTER BY FIRST KEY ONLY AND "EMAIL" = '[email protected]' | null | null | null |
| SERVER 1 ROW LIMIT | null | null | null |
| CLIENT 1 ROW LIMIT | null | null | null |
+-----------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
4 rows selected (0.04 seconds)
从执行计划来看,我想需要改造一下,email二级索引。
0: jdbc:phoenix:192.168.199.154> drop index IDX_T_EXTENSION_ALL_DATAS_SHOW_EMAIL on T_EXTENSION_ALL_DATAS_SHOW;
No rows affected (2.275 seconds)
0: jdbc:phoenix:192.168.199.154> CREATE INDEX IDX_T_EXTENSION_ALL_DATAS_SHOW_EMAIL ON T_EXTENSION_ALL_DATAS_SHOW(EMAIL) INCLUDE(CAM_SITE,TOY);
999,999 rows affected (76.893 seconds)
0: jdbc:phoenix:192.168.199.154> select t1.seq_id as seq_id from (select seq_id,toy from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and email='[email protected]') t1 where t1.toy='ambi' order by t1.seq_id desc limit 1;
+---------+
| SEQ_ID |
+---------+
| 45555 |
+---------+
1 row selected (40.06 seconds)
0: jdbc:phoenix:192.168.199.154> explain select t1.seq_id as seq_id from (select seq_id,toy from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and email='[email protected]') t1 where t1.toy='ambi' order by t1.seq_id desc limit 1;
+-----------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+-----------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY REVERSE RANGE SCAN OVER IDX_T_EXTENSION_ALL_DATAS_SHOW_TOY ['ambi','2018-11-24'] | null | null | null |
| SERVER FILTER BY FIRST KEY ONLY AND "EMAIL" = '[email protected]' | null | null | null |
| SERVER 1 ROW LIMIT | null | null | null |
| CLIENT 1 ROW LIMIT | null | null | null |
+-----------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
4 rows selected (0.034 seconds)
0: jdbc:phoenix:192.168.199.154> select seq_id,toy from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and email='[email protected]';
+---------+-------+
| SEQ_ID | TOY |
+---------+-------+
| 45555 | ambi |
+---------+-------+
1 row selected (0.07 seconds)
执行的结果,还是不理想。完整的要40秒,连执行计划都没有任何变化。~~~~(>_<)~~~~
但是,子查询真的是非常快的,为啥包了一层就慢了40秒????
没办法出绝招了:Hint
0: jdbc:phoenix:192.168.199.154> select /*+ INDEX(T_EXTENSION_ALL_DATAS_SHOW IDX_T_EXTENSION_ALL_DATAS_SHOW_EMAIL) */ t1.seq_id as seq_id from (select seq_id,toy from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and email='[email protected]') t1 where t1.toy='ambi' order by t1.seq_id desc limit 1;
+---------+
| SEQ_ID |
+---------+
| 45555 |
+---------+
1 row selected (0.017 seconds)
0: jdbc:phoenix:192.168.199.154> explain select /*+ INDEX(T_EXTENSION_ALL_DATAS_SHOW IDX_T_EXTENSION_ALL_DATAS_SHOW_EMAIL) */ t1.seq_id as seq_id from (select seq_id,toy from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and email='[email protected]') t1 where t1.toy='ambi' order by t1.seq_id desc limit 1;
+---------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+---------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 1-CHUNK 1182604 ROWS 314572800 BYTES PARALLEL 1-WAY REVERSE RANGE SCAN OVER IDX_T_EXTENSION_ALL_DATAS_SHOW_EMAIL ['[email protected]','2018-11-24'] | 314572800 | 1182604 | 1543050902166 |
| SERVER FILTER BY "TOY" = 'ambi' | 314572800 | 1182604 | 1543050902166 |
| SERVER 1 ROW LIMIT | 314572800 | 1182604 | 1543050902166 |
| CLIENT 1 ROW LIMIT | 314572800 | 1182604 | 1543050902166 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
4 rows selected (0.048 seconds)
哈哈,太好了,非常完美。
phoenix ,通常是会自动选择最优的二级索引,但有时它并不是很聪明。
这个时候就需要我们告诉它,应该使用哪个二级索引!!
方法很简单,从逻辑角度 邮箱就是最细粒度。如果查询条件,有最细粒度。
先拿最细粒度字段走二级索引查询结果。(进过最细粒度条件的过滤,这个结果集就小了很多很多!)
再将结果集,使用玩具名称过滤。
意思就是查两次,速度一样很快。哈哈
注意:有时走二级索引,不一定会比过滤快?(结果集小的时候,filter有优势!)
另一种方式,也需要1秒多。这种方式,邮箱、时间走得二级索引、玩具也是走二级索引。就像是饶了一个大弯,哈哈。
0: jdbc:phoenix:192.168.199.154> select t1.seq_id from T_EXTENSION_ALL_DATAS_SHOW t1 inner join (select show_date,email,seq_id,toy from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and email='[email protected]') t2 on(t1.show_date=t2.show_date and t1.seq_id= t2.seq_id) where t1.toy='ambi' order by t1.seq_id desc limit 1;
+------------+
| T1.SEQ_ID |
+------------+
| 45555 |
+------------+
1 row selected (1.171 seconds)
0: jdbc:phoenix:192.168.199.154> explain select t1.seq_id from T_EXTENSION_ALL_DATAS_SHOW t1 inner join (select show_date,email,seq_id,toy from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' and email='[email protected]') t2 on(t1.show_date=t2.show_date and t1.seq_id= t2.seq_id) where t1.toy='ambi' order by t1.seq_id desc limit 1;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY RANGE SCAN OVER IDX_T_EXTENSION_ALL_DATAS_SHOW_TOY ['ambi'] | 314572800 | 1182604 | 1543050902166 |
| SERVER FILTER BY FIRST KEY ONLY | 314572800 | 1182604 | 1543050902166 |
| SERVER TOP 1 ROW SORTED BY ["T1.:SEQ_ID" DESC] | 314572800 | 1182604 | 1543050902166 |
| CLIENT MERGE SORT | 314572800 | 1182604 | 1543050902166 |
| CLIENT LIMIT 1 | 314572800 | 1182604 | 1543050902166 |
| PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE) | 314572800 | 1182604 | 1543050902166 |
| CLIENT 1-CHUNK 1182604 ROWS 314572800 BYTES PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER IDX_T_EXTENSION_ALL_DATAS_SHOW_EMAIL ['[email protected]','2018-11-24'] | 314572800 | 1182604 | 1543050902166 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
7 rows selected (0.051 seconds)
关于phoenix查询的优化,今天就到这里。END