Phoenix是HBase的一个开源SQL外壳。您可以使用标准的JDBC api而不是常规的HBase客户机api来创建表、插入数据和查询HBase数据。
使用Phoenix,可以编写更少的代码,得到更好的处理性能:
- 将SQL查询编译为本机HBase扫描
- 确定您的扫描键的最佳开始和结束位置
- 调度并行执行扫描
- 为数据提供运算能力
- 将where子句中的条件推到服务器端过滤器
- 通过服务器端钩子(称为协处理器)执行聚合查询
除了这些项目,我们还有一些有趣的改进工作,以进一步优化性能:
- 用于改进非行键列查询性能的二级索引
- 统计数据收集以改进并行化,并指导优化之间的选择
- 跳过扫描过滤器来优化IN,LIKE和OR查询
- 可选的行键加盐以均匀分配写负载
下面先来一个官例:
1、环境搭建参考:DataX+Phoenix+Hbase大数据分析平台整合
2、创建自己的SQL脚本并使用命令行工具$PhoenixPaht/bin/psql.py
执行脚本。
首先,创建us_population.sql
文件,包含表定义
[admin@mvxl2429 examples]$ pwd
/mnt/yst/phoenix-5.0.0/examples
[admin@mvxl2429 examples]$ cat us_population.sql
CREATE TABLE IF NOT EXISTS us_population (
state CHAR(2) NOT NULL,
city VARCHAR NOT NULL,
population BIGINT
CONSTRAINT my_pk PRIMARY KEY (state, city));
然后,创建一个us_population.csv
文件,包含要存入该表的数据
[admin@mvxl2429 examples]$ cat us_population.csv
NY,New York,8143197
CA,Los Angeles,3844829
IL,Chicago,2842518
TX,Houston,2016582
PA,Philadelphia,1463281
AZ,Phoenix,1461575
TX,San Antonio,1256509
CA,San Diego,1255540
TX,Dallas,1213825
CA,San Jose,912332
最后,创建一个us_population_queries.sql
文件,包含要运行的查询语句
[admin@mvxl2429 examples]$ cat us_population_queries.sql
SELECT state as "State",count(city) as "City Count",sum(population) as "Population Sum"
FROM us_population
GROUP BY state
ORDER BY sum(population) DESC;
从命令终端执行以下命令:
[admin@mvxl2429 examples]$ ../bin/sqlline.py localhost:2182 us_population.sql us_population.csv us_population_queries.sql
usage: sqlline.py [-h] [-v VERBOSE] [-c COLOR] [-fc FASTCONNECT]
[zookeepers] [sqlfile]
sqlline.py: error: unrecognized arguments: us_population.csv us_population_queries.sql
[admin@mvxl2429 examples]$ ../bin/psql.py localhost:2182 us_population.sql us_population.csv us_population_queries.sql
19/08/08 14:51:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
no rows upserted
Time: 1.508 sec(s)
csv columns from database.
CSV Upsert complete. 10 rows upserted
Time: 0.117 sec(s)
St City Count Population Sum
-- ---------------------------------------- ----------------------------------------
NY 1 8143197
CA 3 6012701
TX 3 4486916
IL 1 2842518
PA 1 1463281
AZ 1 1461575
Time: 0.024 sec(s)
3、查看bin/performance.py
脚本,为您想到的任何模式创建任意多的行,并对其运行定时查询。
性能测试的数量跟内存大小密切相关,基于机器配置不高,这里运行1000条数据作为测试。测试海量数据需要加大内存,避免OutOfMemoryError异常。
[admin@mvxl2429 bin]$ ./performance.py localhost:2182 1000
Phoenix Performance Evaluation Script 1.0
-----------------------------------------
Creating performance table...
19/08/08 16:04:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
no rows upserted
Time: 0.943 sec(s)
Query # 1 - Count - SELECT COUNT(1) FROM PERFORMANCE_1000;
Query # 2 - Group By First PK - SELECT HOST FROM PERFORMANCE_1000 GROUP BY HOST;
Query # 3 - Group By Second PK - SELECT DOMAIN FROM PERFORMANCE_1000 GROUP BY DOMAIN;
Query # 4 - Truncate + Group By - SELECT TRUNC(DATE,'DAY') DAY FROM PERFORMANCE_1000 GROUP BY TRUNC(DATE,'DAY');
Query # 5 - Filter + Count - SELECT COUNT(1) FROM PERFORMANCE_1000 WHERE CORE<10;
Generating and upserting data...
/mnt/yst/phoenix-5.0.0/bin/../phoenix-core-5.0.0-HBase-2.0-tests.jar
.
19/08/08 16:04:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
csv columns from database.
CSV Upsert complete. 1000 rows upserted
Time: 0.492 sec(s)
COUNT(1)
----------------------------------------
1000
Time: 0.034 sec(s)
HO
--
CS
EU
NA
Time: 0.018 sec(s)
DOMAIN
----------------------------------------
Apple.com
Google.com
Salesforce.com
Time: 0.024 sec(s)
DAY
-----------------------
2019-08-08 00:00:00.000
Time: 0.028 sec(s)
COUNT(1)
----------------------------------------
19
Time: 0.019 sec(s)
[admin@mvxl2429 bin]$ ./sqlline.py localhost:2182
Setting property: [incremental, false]
Setting property: [isolation, TRANSACTION_READ_COMMITTED]
issuing: !connect jdbc:phoenix:localhost:2182 none none org.apache.phoenix.jdbc.PhoenixDriver
Connecting to jdbc:phoenix:localhost:2182
19/08/08 15:21:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Connected to: Phoenix (version 5.0)
Driver: PhoenixEmbeddedDriver (version 5.0)
Autocommit status: true
Transaction isolation: TRANSACTION_READ_COMMITTED
Building list of tables and columns for tab-completion (set fastconnect to true to skip)...
146/146 (100%) Done
Done
sqlline version 1.2.0
0: jdbc:phoenix:localhost:2182> !tables
+------------+--------------+-------------------+---------------+----------+------------+----------------------------+-----------------+--------------+--+
| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | TYPE_NAME | SELF_REFERENCING_COL_NAME | REF_GENERATION | INDEX_STATE | |
+------------+--------------+-------------------+---------------+----------+------------+----------------------------+-----------------+--------------+--+
| | SYSTEM | CATALOG | SYSTEM TABLE | | | | | | |
| | SYSTEM | FUNCTION | SYSTEM TABLE | | | | | | |
| | SYSTEM | LOG | SYSTEM TABLE | | | | | | |
| | SYSTEM | SEQUENCE | SYSTEM TABLE | | | | | | |
| | SYSTEM | STATS | SYSTEM TABLE | | | | | | |
| | | PERFORMANCE_1000 | TABLE | | | | | | |
| | | PERSON | TABLE | | | | | | |
| | | US_POPULATION | TABLE | | | | | | |
+------------+--------------+-------------------+---------------+----------+------------+----------------------------+-----------------+--------------+--+
0: jdbc:phoenix:localhost:2182> select * from PERFORMANCE_1000 limit 3;
+-------+------------+------------+--------------------------+-------+-------+-----------------+
| HOST | DOMAIN | FEATURE | DATE | CORE | DB | ACTIVE_VISITOR |
+-------+------------+------------+--------------------------+-------+-------+-----------------+
| CS | Apple.com | Dashboard | 2019-08-08 16:04:38.000 | 373 | 658 | 3847 |
| CS | Apple.com | Dashboard | 2019-08-08 16:04:47.000 | 465 | 29 | 4551 |
| CS | Apple.com | Dashboard | 2019-08-08 16:05:05.000 | 298 | 1463 | 142 |
+-------+------------+------------+--------------------------+-------+-------+-----------------+
3 rows selected (0.041 seconds)