Phoenix快速入门 - BigData工具篇

Phoenix是HBase的一个开源SQL外壳。您可以使用标准的JDBC api而不是常规的HBase客户机api来创建表、插入数据和查询HBase数据。

使用Phoenix,可以编写更少的代码,得到更好的处理性能:

  • 将SQL查询编译为本机HBase扫描
  • 确定您的扫描键的最佳开始和结束位置
  • 调度并行执行扫描
  • 为数据提供运算能力
  • 将where子句中的条件推到服务器端过滤器
  • 通过服务器端钩子(称为协处理器)执行聚合查询

除了这些项目,我们还有一些有趣的改进工作,以进一步优化性能:

  • 用于改进非行键列查询性能的二级索引
  • 统计数据收集以改进并行化,并指导优化之间的选择
  • 跳过扫描过滤器来优化IN,LIKE和OR查询
  • 可选的行键加盐以均匀分配写负载

下面先来一个官例:

1、环境搭建参考:DataX+Phoenix+Hbase大数据分析平台整合

2、创建自己的SQL脚本并使用命令行工具$PhoenixPaht/bin/psql.py执行脚本。

首先,创建us_population.sql文件,包含表定义

[admin@mvxl2429 examples]$ pwd
/mnt/yst/phoenix-5.0.0/examples
[admin@mvxl2429 examples]$ cat us_population.sql
CREATE TABLE IF NOT EXISTS us_population (
      state CHAR(2) NOT NULL,
      city VARCHAR NOT NULL,
      population BIGINT
      CONSTRAINT my_pk PRIMARY KEY (state, city));

然后,创建一个us_population.csv文件,包含要存入该表的数据

[admin@mvxl2429 examples]$ cat us_population.csv
NY,New York,8143197
CA,Los Angeles,3844829
IL,Chicago,2842518
TX,Houston,2016582
PA,Philadelphia,1463281
AZ,Phoenix,1461575
TX,San Antonio,1256509
CA,San Diego,1255540
TX,Dallas,1213825
CA,San Jose,912332

最后,创建一个us_population_queries.sql文件,包含要运行的查询语句

[admin@mvxl2429 examples]$ cat us_population_queries.sql
SELECT state as "State",count(city) as "City Count",sum(population) as "Population Sum"
FROM us_population
GROUP BY state
ORDER BY sum(population) DESC;

从命令终端执行以下命令:

[admin@mvxl2429 examples]$ ../bin/sqlline.py localhost:2182 us_population.sql us_population.csv us_population_queries.sql
usage: sqlline.py [-h] [-v VERBOSE] [-c COLOR] [-fc FASTCONNECT]
                  [zookeepers] [sqlfile]
sqlline.py: error: unrecognized arguments: us_population.csv us_population_queries.sql
[admin@mvxl2429 examples]$ ../bin/psql.py localhost:2182 us_population.sql us_population.csv us_population_queries.sql
19/08/08 14:51:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
no rows upserted
Time: 1.508 sec(s)

csv columns from database.
CSV Upsert complete. 10 rows upserted
Time: 0.117 sec(s)

St                               City Count                           Population Sum 
-- ---------------------------------------- ---------------------------------------- 
NY                                        1                                  8143197 
CA                                        3                                  6012701 
TX                                        3                                  4486916 
IL                                        1                                  2842518 
PA                                        1                                  1463281 
AZ                                        1                                  1461575 
Time: 0.024 sec(s)

3、查看bin/performance.py脚本,为您想到的任何模式创建任意多的行,并对其运行定时查询。

性能测试的数量跟内存大小密切相关,基于机器配置不高,这里运行1000条数据作为测试。测试海量数据需要加大内存,避免OutOfMemoryError异常。

[admin@mvxl2429 bin]$ ./performance.py localhost:2182 1000
Phoenix Performance Evaluation Script 1.0
-----------------------------------------

Creating performance table...
19/08/08 16:04:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
no rows upserted
Time: 0.943 sec(s)

Query # 1 - Count - SELECT COUNT(1) FROM PERFORMANCE_1000;
Query # 2 - Group By First PK - SELECT HOST FROM PERFORMANCE_1000 GROUP BY HOST;
Query # 3 - Group By Second PK - SELECT DOMAIN FROM PERFORMANCE_1000 GROUP BY DOMAIN;
Query # 4 - Truncate + Group By - SELECT TRUNC(DATE,'DAY') DAY FROM PERFORMANCE_1000 GROUP BY TRUNC(DATE,'DAY');
Query # 5 - Filter + Count - SELECT COUNT(1) FROM PERFORMANCE_1000 WHERE CORE<10;

Generating and upserting data...

/mnt/yst/phoenix-5.0.0/bin/../phoenix-core-5.0.0-HBase-2.0-tests.jar
.

19/08/08 16:04:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
csv columns from database.
CSV Upsert complete. 1000 rows upserted
Time: 0.492 sec(s)

                                COUNT(1) 
---------------------------------------- 
                                    1000 
Time: 0.034 sec(s)

HO 
-- 
CS 
EU 
NA 
Time: 0.018 sec(s)

DOMAIN                                   
---------------------------------------- 
Apple.com                                
Google.com                               
Salesforce.com                           
Time: 0.024 sec(s)

DAY                     
----------------------- 
2019-08-08 00:00:00.000 
Time: 0.028 sec(s)

                                COUNT(1) 
---------------------------------------- 
                                      19 
Time: 0.019 sec(s)

[admin@mvxl2429 bin]$ ./sqlline.py localhost:2182 
Setting property: [incremental, false]
Setting property: [isolation, TRANSACTION_READ_COMMITTED]
issuing: !connect jdbc:phoenix:localhost:2182 none none org.apache.phoenix.jdbc.PhoenixDriver
Connecting to jdbc:phoenix:localhost:2182
19/08/08 15:21:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Connected to: Phoenix (version 5.0)
Driver: PhoenixEmbeddedDriver (version 5.0)
Autocommit status: true
Transaction isolation: TRANSACTION_READ_COMMITTED
Building list of tables and columns for tab-completion (set fastconnect to true to skip)...
146/146 (100%) Done
Done
sqlline version 1.2.0
0: jdbc:phoenix:localhost:2182> !tables
+------------+--------------+-------------------+---------------+----------+------------+----------------------------+-----------------+--------------+--+
| TABLE_CAT  | TABLE_SCHEM  |    TABLE_NAME     |  TABLE_TYPE   | REMARKS  | TYPE_NAME  | SELF_REFERENCING_COL_NAME  | REF_GENERATION  | INDEX_STATE  |  |
+------------+--------------+-------------------+---------------+----------+------------+----------------------------+-----------------+--------------+--+
|            | SYSTEM       | CATALOG           | SYSTEM TABLE  |          |            |                            |                 |              |  |
|            | SYSTEM       | FUNCTION          | SYSTEM TABLE  |          |            |                            |                 |              |  |
|            | SYSTEM       | LOG               | SYSTEM TABLE  |          |            |                            |                 |              |  |
|            | SYSTEM       | SEQUENCE          | SYSTEM TABLE  |          |            |                            |                 |              |  |
|            | SYSTEM       | STATS             | SYSTEM TABLE  |          |            |                            |                 |              |  |
|            |              | PERFORMANCE_1000  | TABLE         |          |            |                            |                 |              |  |
|            |              | PERSON            | TABLE         |          |            |                            |                 |              |  |
|            |              | US_POPULATION     | TABLE         |          |            |                            |                 |              |  |
+------------+--------------+-------------------+---------------+----------+------------+----------------------------+-----------------+--------------+--+
0: jdbc:phoenix:localhost:2182> select * from PERFORMANCE_1000 limit 3;
+-------+------------+------------+--------------------------+-------+-------+-----------------+
| HOST  |   DOMAIN   |  FEATURE   |           DATE           | CORE  |  DB   | ACTIVE_VISITOR  |
+-------+------------+------------+--------------------------+-------+-------+-----------------+
| CS    | Apple.com  | Dashboard  | 2019-08-08 16:04:38.000  | 373   | 658   | 3847            |
| CS    | Apple.com  | Dashboard  | 2019-08-08 16:04:47.000  | 465   | 29    | 4551            |
| CS    | Apple.com  | Dashboard  | 2019-08-08 16:05:05.000  | 298   | 1463  | 142             |
+-------+------------+------------+--------------------------+-------+-------+-----------------+
3 rows selected (0.041 seconds)

你可能感兴趣的:(Phoenix快速入门 - BigData工具篇)