drill源码分析_利用Drill 1.2中的新功能– ANSI SQL分析/窗口功能

drill源码分析

今天,我们非常高兴地宣布,最新版本的Apache Drill 1.2)作为MapR发行版的一部分。 可下载适用于MapR的Drill 1.2软件包http://doc.mapr.com/display/MapR/Apache+Drill+on+MapR
您可以使用MapR沙箱和该教程的各种动手进行试验, 网址为https://www.mapr.com/products/mapr-sandbox-hadoop/download-sandbox-drill

自从最初的Beta版本可用(Sep'14)以来,Apache Drill已经获得了广泛的用户采用和社区动力。 许多客户已经在生产环境中部署并使用了Drill,他们发现Drill在其环境中具有各种用例,例如数据探索,Hadoop Data Lake上的Adhoc查询/ BI和JSON数据分析,这些工具非常有价值。

Drill的通用版本1.0于15年5月发布,Drill 1.1则于15年7月初发布。 这些发行版中的每一个都为Drill的交互式自助服务数据探索和即席SQL查询功能添加了重要的新功能,并使其在规模和可管理性方面都可用于企业。 Drill 1.2在基础上进行了扩展,并通过高级SQL支持,更深入的Hive集成和性能增强提高了标准。 Drill 1.2包含超过250个错误修复程序,以及包括以下内容在内的多项新增强功能。

  • 新的ANSI SQL分析/窗口函数-超前/滞后,First_Value / Last_Value,NTile
  • 优化的Hive表读取功能
  • 支持多个Hive版本
  • 元数据缓存可提高大型木地板文件的查询性能
  • 改进了对HBase / MapR-DB表的行键下推
  • 钻研Web UI安全性
  • 删除表命令
  • 内存处理方面的改进

在此博客文章中,我想简要介绍Drill添加的新分析功能,即符合ANSI SQL的Analytic和Window函数,以及如何开始使用这些功能。 Drill中SQL窗口函数包括对PARTITION BY和OVER子句的支持,各种汇总窗口函数的总和,最大值,最小值,计数,平均和分析函数,例如First_Value,Last_Value,Lead,Lag,NTile,Row_Number,Rank。 窗口函数具有高度的通用性,可以使用户减少需要编写的,自然适合的联接,子查询,显式游标,从而无需花费很多代码即可解决各种用例。

在我以前的文章中, 将原始数据变成真正的见解并使用高度动态的数据集 ,我已经使用Yelp的示例业务评论演示数据集在Drill中演示了各种查询功能。 这篇文章继续利用相同的数据集来展示分析/窗口功能。

首先,让我们以嵌入式模式启动Drill(也可以使用分布式模式)

NRentachintala-MAC:bin nrentachintala$ ./drill-embedded
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
Oct 19, 2015 9:20:03 AM org.glassfish.jersey.server.ApplicationHandler initialize
INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29 01:25:26...
apache drill 1.2.0 
"a drill in the hand is better than two in the bush"

在钻取中列出可用的架构。

0: jdbc:drill:zk=local> show schemas;
+---------------------+
|     SCHEMA_NAME     |
+---------------------+
| INFORMATION_SCHEMA  |
| cp.default          |
| dfs.default         |
| dfs.root            |
| dfs.tmp             |
| dfs.yelp            |
| sys                 |
+---------------------+

7 rows selected (1.755 seconds)

切换到使用已加载Yelp数据的工作空间。

0: jdbc:drill:zk=local> use dfs.yelp;
+-------+---------------------------------------+
|  ok   |                summary                |
+-------+---------------------------------------+
| true  | Default schema changed to [dfs.yelp]  |
+-------+---------------------------------------+

1 row selected (0.129 seconds)

让我们开始探索Yelp数据集中可用的数据集之一-商业信息

0: jdbc:drill:zk=local> select * from `business.json` limit 1;
+-------------+--------------+-------+------+------------+------+--------------+------+-----------+-------+-------+----------+------------+------+---------------+
| business_id | full_address | hours | open | categories | city | review_count | name | longitude | state | stars | latitude | attributes | type | neighborhoods |
+-------------+--------------+-------+------+------------+------+--------------+------+-----------+-------+-------+----------+------------+------+---------------+
| vcNAWiLM4dR7D2nwwJ7nCA | 4840 E Indian School Rd
Ste 101
Phoenix, AZ 85018 | {"Tuesday":{"close":"17:00","open":"08:00"},"Friday":{"close":"17:00","open":"08:00"},"Monday":{"close":"17:00","open":"08:00"},"Wednesday":{"close":"17:00","open":"08:00"},"Thursday":{"close":"17:00","open":"08:00"},"Sunday":{},"Saturday":{}} | true | ["Doctors","Health & Medical"] | Phoenix | 7 | Eric Goldberg, MD | -111.983758 | AZ | 3.5 | 33.499313 | {"By Appointment Only":true,"Good For":{},"Ambience":{},"Parking":{},"Music":{},"Hair Types Specialized In":{},"Payment Types":{},"Dietary Restrictions":{}} | business | [] |
+-------------+--------------+-------+------+------------+------+--------------+------+-----------+-------+-------+----------+------------+------+---------------+
1 row selected (0.514 seconds)

现在让我们检查几个“钻取”窗口功能的用法。

首先,只需根据每个城市中的#条评论以及行号即可获得Yelp最佳企业。

0: jdbc:drill:zk=local> SELECT name, city, review_count,row_number()
. . . . . . . . . . . > OVER (PARTITION BY city ORDER BY review_count DESC) as rownum 
. . . . . . . . . . . > FROM `business.json` limit 15;  

+----------------------------------------+------------+---------------+---------+
|                  name                  |    city    | review_count  | rownum  |
+----------------------------------------+------------+---------------+---------+
| Cupz N' Crepes                         | Ahwatukee  | 124           | 1       |
| My Wine Cellar                         | Ahwatukee  | 98            | 2       |
| Kathy's Alterations                    | Ahwatukee  | 12            | 3       |
| McDonald's                             | Ahwatukee  | 7             | 4       |
| U-Haul                                 | Ahwatukee  | 5             | 5       |
| Hi-Health                              | Ahwatukee  | 4             | 6       |
| Healthy and Clean Living Environments  | Ahwatukee  | 4             | 7       |
| Active Kids Pediatrics                 | Ahwatukee  | 4             | 8       |
| Roberto's Authentic Mexican Food       | Anthem     | 117           | 1       |
| Q to U BBQ                             | Anthem     | 74            | 2       |
| Outlets At Anthem                      | Anthem     | 64            | 3       |
| Dara Thai                              | Anthem     | 56            | 4       |
| Cafe Provence                          | Anthem     | 53            | 5       |
| Shanghai Club                          | Anthem     | 50            | 6       |
| Two Brothers Kitchen                   | Anthem     | 43            | 7       |
+----------------------------------------+------------+---------------+---------+
15 rows selected (0.67 seconds)

查看每个商家的#评论与城市中所有商家的平均#评论相比。

0: jdbc:drill:zk=local> SELECT name, city,review_count,
. . . . . . . . . . . > Avg(review_count) OVER (PARTITION BY City) AS city_reviews_avg
. . . . . . . . . . . > FROM `business.json`limit 15;
+----------------------------------------+------------+---------------+---------------------+
|                  name                  |    city    | review_count  |  city_reviews_avg   |
+----------------------------------------+------------+---------------+---------------------+
| Hi-Health                              | Ahwatukee  | 4             | 32.25               |
| My Wine Cellar                         | Ahwatukee  | 98            | 32.25               |
| U-Haul                                 | Ahwatukee  | 5             | 32.25               |
| Cupz N' Crepes                         | Ahwatukee  | 124           | 32.25               |
| McDonald's                             | Ahwatukee  | 7             | 32.25               |
| Kathy's Alterations                    | Ahwatukee  | 12            | 32.25               |
| Healthy and Clean Living Environments  | Ahwatukee  | 4             | 32.25               |
| Active Kids Pediatrics                 | Ahwatukee  | 4             | 32.25               |
| Anthem Community Center                | Anthem     | 4             | 14.492063492063492  |
| Scrapbooks To Remember                 | Anthem     | 4             | 14.492063492063492  |
| Hungry Howie's Pizza                   | Anthem     | 7             | 14.492063492063492  |
| Pinata Nueva                           | Anthem     | 3             | 14.492063492063492  |
| Starbucks Coffee Company               | Anthem     | 13            | 14.492063492063492  |
| Pizza Hut                              | Anthem     | 6             | 14.492063492063492  |
| Rays Pizza                             | Anthem     | 19            | 14.492063492063492  |
+----------------------------------------+------------+---------------+---------------------+
15 rows selected (0.395 seconds)

检查每个商家的评论数对城市中所有商家的评论总数的贡献。

0: jdbc:drill:zk=local> SELECT name, city,review_count,
. . . . . . . . . . . > Sum(review_count) OVER (PARTITION BY City) AS city_reviews_sum
. . . . . . . . . . . > FROM `business.json`limit 15;
+----------------------------------------+------------+---------------+-------------------+
|                  name                  |    city    | review_count  | city_reviews_sum  |
+----------------------------------------+------------+---------------+-------------------+
| Hi-Health                              | Ahwatukee  | 4             | 258               |
| My Wine Cellar                         | Ahwatukee  | 98            | 258               |
| U-Haul                                 | Ahwatukee  | 5             | 258               |
| Cupz N' Crepes                         | Ahwatukee  | 124           | 258               |
| McDonald's                             | Ahwatukee  | 7             | 258               |
| Kathy's Alterations                    | Ahwatukee  | 12            | 258               |
| Healthy and Clean Living Environments  | Ahwatukee  | 4             | 258               |
| Active Kids Pediatrics                 | Ahwatukee  | 4             | 258               |
| Anthem Community Center                | Anthem     | 4             | 913               |
| Scrapbooks To Remember                 | Anthem     | 4             | 913               |
| Hungry Howie's Pizza                   | Anthem     | 7             | 913               |
| Pinata Nueva                           | Anthem     | 3             | 913               |
| Starbucks Coffee Company               | Anthem     | 13            | 913               |
| Pizza Hut                              | Anthem     | 6             | 913               |
| Rays Pizza                             | Anthem     | 19            | 913               |
+----------------------------------------+------------+---------------+-------------------+
15 rows selected (0.543 seconds)

现在。 让我们尝试稍微复杂的查询。 根据#reviews列出前10个城市及其排名最高的企业。 这些查询中可以使用钻窗功能,例如rank,densed_rank。

. . . . . . . . . . . > WITH X
. . . . . . . . . . . > AS
. . . . . . . . . . . > (SELECT name, city, review_count,
. . . . . . . . . . . > RANK()
. . . . . . . . . . . > OVER (PARTITION BY city
. . . . . . . . . . . > ORDER BY review_count DESC) AS review_rank
. . . . . . . . . . . > FROM `business.json`)
. . . . . . . . . . . > SELECT X.name, X.city, X.review_count
. . . . . . . . . . . > FROM X
. . . . . . . . . . . > WHERE X.review_rank =1 ORDER BY review_count DESC LIMIT 10;
+-------------------------------------------+-------------+---------------+
|                   name                    |    city     | review_count  |
+-------------------------------------------+-------------+---------------+
| Mon Ami Gabi                              | Las Vegas   | 4084          |
| Studio B                                  | Henderson   | 1336          |
| Phoenix Sky Harbor International Airport  | Phoenix     | 1325          |
| Four Peaks Brewing Co                     | Tempe       | 1110          |
| The Mission                               | Scottsdale  | 783           |
| Joe's Farm Grill                          | Gilbert     | 770           |
| The Old Fashioned                         | Madison     | 619           |
| Cornish Pasty Company                     | Mesa        | 578           |
| SanTan Brewing Company                    | Chandler    | 469           |
| Yard House                                | Glendale    | 321           |
+-------------------------------------------+-------------+---------------+
10 rows selected (0.49 seconds)

将每个业务的#reviews与该城市的最高和最低评论数进行比较。

0: jdbc:drill:zk=local> SELECT name, city, review_count,
. . . . . . . . . . . > FIRST_VALUE(review_count)
. . . . . . . . . . . > OVER(PARTITION BY city ORDER BY review_count DESC) AS top_review_count,
. . . . . . . . . . . > LAST_VALUE(review_count)
. . . . . . . . . . . > OVER(PARTITION BY city ORDER BY review_cout DESC) AS bottom_review_count
. . . . . . . . . . . > FROM `business.json` limit 15;

+----------------------------------------+------------+---------------+-------------------+----------------------+
|                  name                  |    city    | review_count  | top_review_count  | bottom_review_count  |
+----------------------------------------+------------+---------------+-------------------+----------------------+
| My Wine Cellar                         | Ahwatukee  | 98            | 124               | 12                   |
| McDonald's                             | Ahwatukee  | 7             | 124               | 12                   |
| U-Haul                                 | Ahwatukee  | 5             | 124               | 12                   |
| Hi-Health                              | Ahwatukee  | 4             | 124               | 12                   |
| Healthy and Clean Living Environments  | Ahwatukee  | 4             | 124               | 12                   |
| Active Kids Pediatrics                 | Ahwatukee  | 4             | 124               | 12                   |
| Cupz N' Crepes                         | Ahwatukee  | 124           | 124               | 12                   |
| Kathy's Alterations                    | Ahwatukee  | 12            | 124               | 12                   |
| Q to U BBQ                             | Anthem     | 74            | 117               | 117                  |
| Dara Thai                              | Anthem     | 56            | 117               | 117                  |
| Cafe Provence                          | Anthem     | 53            | 117               | 117                  |
| Shanghai Club                          | Anthem     | 50            | 117               | 117                  |
| Two Brothers Kitchen                   | Anthem     | 43            | 117               | 117                  |
| The Tennessee Grill                    | Anthem     | 32            | 117               | 117                  |
| Dollyrockers Boutique and Salon        | Anthem     | 30            | 117               | 117                  |
+----------------------------------------+------------+---------------+-------------------+----------------------+
15 rows selected (0.516 seconds)

将#reviews与之前和之后业务的#reviews进行比较

0: jdbc:drill:zk=local> SELECT city, review_count, name,
. . . . . . . . . . . > LAG(review_count, 1) OVER(PARTITION BY city ORDER BY review_count DESC) 
. . . . . . . . . . . > AS preceding_count,
. . . . . . . . . . . > LEAD(review_count, 1) OVER(PARTITION BY city ORDER BY review_count DESC) 
. . . . . . . . . . . > AS following_count
. . . . . . . . . . . > FROM `business.json` limit 15;
+------------+---------------+----------------------------------------+------------------+------------------+
|    city    | review_count  |                  name                  | preceding_count  | following_count  |
+------------+---------------+----------------------------------------+------------------+------------------+
| Ahwatukee  | 124           | Cupz N' Crepes                         | null             | 98               |
| Ahwatukee  | 98            | My Wine Cellar                         | 124              | 12               |
| Ahwatukee  | 12            | Kathy's Alterations                    | 98               | 7                |
| Ahwatukee  | 7             | McDonald's                             | 12               | 5                |
| Ahwatukee  | 5             | U-Haul                                 | 7                | 4                |
| Ahwatukee  | 4             | Hi-Health                              | 5                | 4                |
| Ahwatukee  | 4             | Healthy and Clean Living Environments  | 4                | 4                |
| Ahwatukee  | 4             | Active Kids Pediatrics                 | 4                | null             |
| Anthem     | 117           | Roberto's Authentic Mexican Food       | null             | 74               |
| Anthem     | 74            | Q to U BBQ                             | 117              | 64               |
| Anthem     | 64            | Outlets At Anthem                      | 74               | 56               |
| Anthem     | 56            | Dara Thai                              | 64               | 53               |
| Anthem     | 53            | Cafe Provence                          | 56               | 50               |
| Anthem     | 50            | Shanghai Club                          | 53               | 43               |
| Anthem     | 43            | Two Brothers Kitchen                   | 50               | 32               |
+------------+---------------+----------------------------------------+------------------+------------------+
15 rows selected (0.518 seconds)

有关Window功能和其他Drill 1.2功能的更多详细信息和文档,请参阅Drill文档和MapR文档 。 祝贺Drill社区有了另一个重要的里程碑,并期待更多。

翻译自: https://www.javacodegeeks.com/2015/10/leveraging-new-features-in-drill-1-2-ansi-sql-analyticwindow-functions.html

drill源码分析

你可能感兴趣的:(drill源码分析_利用Drill 1.2中的新功能– ANSI SQL分析/窗口功能)