phoenix 分页limit offset

offset 这个是phoenix新版本提供的属性,但是存在性能问题,大数据的情况下是不推荐的。

 

我本来想试试offset,随着数量增加,耗时会如何增加。

但这两个不同的测试sql,让我有了新发现:

两条语句的执行计划:

0: jdbc:phoenix:192.168.199.154> explain select login_date,email from T_EXTENSION_ALL_DATAS_LOGIN where login_date='2018-11-24' and country='China' order by created_date desc limit 5000 offset 20000;
+---------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                                                       PLAN                                                                        | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+---------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK 25000 ROWS 1625000 BYTES SERIAL 1-WAY RANGE SCAN OVER IDX_T_EXTENSION_ALL_DATAS_LOGIN_DATE_COUNTRY_CREATED ['2018-11-24','China']  | 1625000         | 25000          | 0            |
|     SERVER FILTER BY FIRST KEY ONLY                                                                                                               | 1625000         | 25000          | 0            |
|     SERVER OFFSET 20000                                                                                                                           | 1625000         | 25000          | 0            |
|     SERVER 25000 ROW LIMIT                                                                                                                        | 1625000         | 25000          | 0            |
| CLIENT 5000 ROW LIMIT                                                                                                                             | 1625000         | 25000          | 0            |
+---------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
5 rows selected (0.038 seconds)



0: jdbc:phoenix:192.168.199.154> explain select seq_id from T_EXTENSION_ALL_DATAS_SHOW where show_date='2018-11-24' order by seq_id desc limit 20 offset 20000;
+--------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                                           PLAN                                                           | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+--------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK 20020 ROWS 14314300 BYTES SERIAL 1-WAY REVERSE RANGE SCAN OVER T_EXTENSION_ALL_DATAS_SHOW ['2018-11-24']  | 14314300        | 20020          | 0            |
|     SERVER FILTER BY FIRST KEY ONLY                                                                                      | 14314300        | 20020          | 0            |
|     SERVER OFFSET 20000                                                                                                  | 14314300        | 20020          | 0            |
|     SERVER 20020 ROW LIMIT                                                                                               | 14314300        | 20020          | 0            |
| CLIENT 20 ROW LIMIT                                                                                                      | 14314300        | 20020          | 0            |
+--------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
5 rows selected (0.032 seconds)

 

你说 跳跃2W行,查询5000条的速度快?  还是 跳跃2W行,查询20条的速度快?

offset 2W与 20W的结果如下:

select login_date,email from T_EXTENSION_ALL_DATAS_LOGIN 
where login_date='2018-11-24' and country='China' 
order by created_date desc 
limit 5000 offset 20000;
-- 5,000 rows selected (0.416 seconds)



select seq_id from T_EXTENSION_ALL_DATAS_SHOW 
where show_date='2018-11-24' 
order by seq_id desc 
limit 20 offset 20000;
-- 20 rows selected (2.071 seconds)



select login_date,email from T_EXTENSION_ALL_DATAS_LOGIN 
where login_date='2018-11-24' and country='China' 
order by created_date desc 
limit 5000 offset 200000;
-- 5,000 rows selected (0.515 seconds)


select seq_id from T_EXTENSION_ALL_DATAS_SHOW 
where show_date='2018-11-24' 
order by seq_id desc 
limit 20 offset 200000;
-- 20 rows selected (20.508 seconds)

都是走RANGE SCAN 为什么差距,这么大呢?

 

原因在于:

IDX_T_EXTENSION_ALL_DATAS_LOGIN_DATE_COUNTRY_CREATED 这个二级索引,他比使用 T_EXTENSION_ALL_DATAS_SHOW 这种表row key索引,字段内容要少很多,所以快。

 

-- 创建排序索引语法

CREATE INDEX IDX_T_EXTENSION_ALL_DATAS_LOGIN_DATE_COUNTRY_CREATED ON T_EXTENSION_ALL_DATAS_LOGIN(LOGIN_DATE,COUNTRY,CREATED_DATE DESC);


0: jdbc:phoenix:192.168.199.154> select * from  IDX_T_EXTENSION_ALL_DATAS_LOGIN_DATE_COUNTRY_CREATED limit 1;
+--------------+------------+----------------------+-----------------+
| :LOGIN_DATE  | 0:COUNTRY  |    0:CREATED_DATE    |     :EMAIL      |
+--------------+------------+----------------------+-----------------+
| 2018-11-24   | China      | 2018-11-24 15:26:57  | [email protected]  |
+--------------+------------+----------------------+-----------------+

那么使用这种全局的排序索引,4个字段,offset,20W。只要0.5S。

由于这个条件只有50W的数据,只能这么试试了。接近50W,耗时1S左右。

select login_date,email from T_EXTENSION_ALL_DATAS_LOGIN where login_date='2018-11-24' and country='China' order by created_date desc limit 5000 offset 495000;

-- 5,000 rows selected (1.025 seconds)

 

结论:虽然分页这些不推荐使用offset。但是某些小数据场景下,还是可以使用的。

推荐不超过50W的小数据集,可以使用offset。

 

 

 

你可能感兴趣的:(Phoenix)