Phoenix定位为OLTP和操作型分析(operational analytics),大多用于在线业务,稳定性要求第一位。Phoenix的功能很强大,也很灵活,Phoenix SQL基于SQL-92标准,但是还是有很多方言,使用时需要特别注意。
DataX是阿里巴巴集团内被广泛使用的离线数据同步工具/平台,实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、TableStore(OTS)、MaxCompute(ODPS)、DRDS 等各种异构数据源之间高效的数据同步功能。
这里主要用到oraclereader和hbase11xwriter。
基础软件版本:
Apache Hadoop:hadoop-2.8.5
Apache HBase:hbase-1.4.10
Apache Phoenix:phoenix-4.14.3-HBase-1.4-bin
Phoenix版本支持:
Phoenix Current release 4.15.0 can run on Apache HBase 1.3, 1.4 and 1.5. CDH HBase 5.11, 5.12, 5.13 and 5.14 is supported by 4.14.0. Apache HBase 2.0 is supported by 5.0.0.
hbase客户端
hbase shell
phoenix客户端
phoenix-4.14.3-HBase-1.4-bin/bin/sqlline.py
# 进入hbase shell客户端$ hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/admin/hbase-1.4.10/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/admin/hadoop-2.8.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
Version 1.4.10, r76ab087819fe82ccf6f531096e18ad1bed079651, Wed Jun 5 16:48:11 PDT 2019
# 创建命名空间ZLXX
hbase(main):006:0> create_namespace 'ZLXX'
0 row(s) in 1.1150 seconds
# 创建hbase表,使用命名空间
hbase(main):001:0> create 'ZLXX:WT_TRADE_REFUND', 'INFO'
0 row(s) in 1.7510 seconds
=> Hbase::Table - ZLXX:WT_TRADE_REFUND
查看表描述信息(desc)
hbase(main):009:0> desc 'ZLXX:WT_TRADE_REFUND'
Table ZLXX:WT_TRADE_REFUND is ENABLED
ZLXX:WT_TRADE_REFUND
COLUMN FAMILIES DESCRIPTION
{NAME => 'INFO', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VE
RSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
1 row(s) in 0.0730 seconds
# 使用DataX往hbase表里面添加测试数据(存量批量数据采集)
数据源来自Oracle,数据目标系统是HBase原生创建的表。
使用oraclereader和hbase11xwriter。
具体配置zlxx_test.json
$ vi zlxx_test.json
{
"job": {
"content": [
{
"reader": {
"name": "oraclereader",
"parameter": {
"column": [],
"connection": [
{
"fetchSize": "1024",
"jdbcUrl": [
"$DW_ORCL_ZLXX_JDBCURL"
],
"querySql": [
"select id, trade_id, amount, status, TO_CHAR(modify_time,'YYYYMMDDhh24miss') AS modify_time from zlxx.trade_refund"
]
}
],
"password": "$DW_ORCL_ZLXX_PASSWORD",
"username": "$DW_ORCL_ZLXX_USERNAME"
}
},
"writer": {
"name": "hbase11xwriter",
"parameter": {
"hbaseConfig": {
"hbase.zookeeper.quorum": "192.168.10.211:12181,192.168.10.212:12181,192.168.10.213:12181"
},
"table": "ZLXX:WT_TRADE_REFUND",
"mode": "normal",
"nullMode": "empty",
"rowkeyColumn": [
{
"index":0,
"type":"string"
}
],
"column": [
{
"index":1,
"name": "INFO:TRADE_ID",
"type": "string"
},
{
"index":2,
"name": "INFO:AMOUNT",
"type": "string"
},
{
"index":3,
"name": "INFO:STATUS",
"type": "string"
},
{
"index":4,
"name": "INFO:MODIFY_TIME",
"type": "string"
}
],
"encoding": "utf-8"
}
}
}
],
"setting": {
"speed": {
"channel": "10"
}
}
}
}
执行DataX的shell脚本
数据库连接和用户名密码等配置在服务器的环境变量中
$ vi zlxx_test.sh
source ~/.bash_profile
python datax/bin/datax.py -j "-Xms1g -Xmx1g -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=datax/log" -p"-DDW_ORCL_ZLXX_JDBCURL=$DW_ORCL_ZLXX_JDBCURL -DDW_ORCL_ZLXX_PASSWORD=$DW_ORCL_ZLXX_PASSWORD -DDW_ORCL_ZLXX_USERNAME=$DW_ORCL_ZLXX_USERNAME"
执行shell脚本
$ sh zlxx_test.sh
......
任务总计耗时 : 10s
任务平均流量 : 1.77KB/s
记录写入速度 : 52rec/s
读出记录总数 : 528
读写失败总数 : 0
查看一下数据
hbase(main):012:0* scan 'ZLXX:WT_TRADE_REFUND', {LIMIT=>1}
ROW COLUMN+CELL
1005528 column=INFO:AMOUNT, timestamp=1594783514554, value=100
1005528 column=INFO:MODIFY_TIME, timestamp=1594783514554, value=20200228185803
1005528 column=INFO:STATUS, timestamp=1594783514554, value=01
1005528 column=INFO:TRADE_ID, timestamp=1594783514554, value=T12059592
1 row(s) in 0.1220 seconds
Tips:Phoenix不管你输入的是大写还是小写都默认把它转成大写的,如果要小写的话必须加上引号;所以上面从hbase shell创建的表和列均使用了大写,便于Phoenix自然映射。
# Phoenix使用as select创建视图映射到HBase表
先创建Phoenix schema:
0: jdbc:phoenix:> create schema ZLXX;
No rows affected (0.012 seconds)
创建视图,由于Phoenix大小写最终都是大写,所以这里可以使用小写
create view ZLXX.WT_TRADE_REFUND (
id varchar primary key,
info.trade_id varchar,
info.amount varchar,
info.status varchar,
info.modify_time varchar
) as select * from zlxx.wt_trade_refund;
视图创建效果:
0: jdbc:phoenix:> create view ZLXX.WT_TRADE_REFUND (
. . . . . . . . > id varchar primary key,
. . . . . . . . > info.trade_id varchar,
. . . . . . . . > info.amount varchar,
. . . . . . . . > info.status varchar,
. . . . . . . . > info.modify_time varchar
. . . . . . . . > ) as select * from zlxx.wt_trade_refund;
No rows affected (5.956 seconds)
# 走一个查询试试看(完美)
0: jdbc:phoenix:> select * from zlxx.wt_trade_refund limit 5;
+----------+------------+---------+---------+-----------------+
| ID | TRADE_ID | AMOUNT | STATUS | MODIFY_TIME |
+----------+------------+---------+---------+-----------------+
| 1005528 | T12059592 | 100 | 01 | 20200228185803 |
| 1005529 | T12059591 | 10 | 03 | 20200228185846 |
| 1005530 | T12059606 | 10 | 01 | 20200228195442 |
| 1005531 | T12059605 | 10 | 01 | 20200228195442 |
| 1005532 | noTradeId | 100 | 03 | 20200229224015 |
+----------+------------+---------+---------+-----------------+
5 rows selected (0.062 seconds)
# Phoenix使用column_encoded_bytes=0
创建视图映射到HBase表(测试下多视图)
由于需要名字相同,先把上面的view ZLXX.WT_TRADE_REFUND删除,稍后再使用as select创建不同名称的视图。
删除视图:
drop view ZLXX.WT_TRADE_REFUND;
创建视图:
create view ZLXX.WT_TRADE_REFUND (
id varchar primary key,
info.trade_id varchar,
info.amount varchar,
info.status varchar,
info.modify_time varchar
) column_encoded_bytes=0;
也是正常的,能够看到有数据查询出来
0: jdbc:phoenix:> drop view ZLXX.WT_TRADE_REFUND;
No rows affected (0.046 seconds)
0: jdbc:phoenix:> create view ZLXX.WT_TRADE_REFUND (
. . . . . . . . > id varchar primary key,
. . . . . . . . > info.trade_id varchar,
. . . . . . . . > info.amount varchar,
. . . . . . . . > info.status varchar,
. . . . . . . . > info.modify_time varchar
. . . . . . . . > ) column_encoded_bytes=0;
No rows affected (0.041 seconds)
limit查询数据:
0: jdbc:phoenix:> select * from zlxx.wt_trade_refund limit 5;
+----------+------------+---------+---------+-----------------+
| ID | TRADE_ID | AMOUNT | STATUS | MODIFY_TIME |
+----------+------------+---------+---------+-----------------+
| 1005528 | T12059592 | 100 | 01 | 20200228185803 |
| 1005529 | T12059591 | 10 | 03 | 20200228185846 |
| 1005530 | T12059606 | 10 | 01 | 20200228195442 |
| 1005531 | T12059605 | 10 | 01 | 20200228195442 |
| 1005532 | noTradeId | 100 | 03 | 20200229224015 |
+----------+------------+---------+---------+-----------------+
5 rows selected (0.056 seconds)
下面使用as select创建名称不同的视图,假设不需要status字段
create view ZLXX.WT_TRADE_REFUND2 (
id varchar primary key,
info.trade_id varchar,
info.amount varchar,
info.modify_time varchar
) as select INFO.TRADE_ID, INFO.AMOUNT, INFO.MODIFY_TIME from zlxx.wt_trade_refund;
会报错
0: jdbc:phoenix:> create view ZLXX.WT_TRADE_REFUND2 (
. . . . . . . . > id varchar primary key,
. . . . . . . . > info.trade_id varchar,
. . . . . . . . > info.amount varchar,
. . . . . . . . > info.modify_time varchar
. . . . . . . . > ) as select INFO.TRADE_ID, INFO.AMOUNT, INFO.MODIFY_TIME from zlxx.wt_trade_refund;
Error: ERROR 604 (42P00): Syntax error. Mismatched input. Expecting "ASTERISK", got "INFO" at line 6, column 13. (state=42P00,code=604)
org.apache.phoenix.exception.PhoenixParserException: ERROR 604 (42P00): Syntax error. Mismatched input. Expecting "ASTERISK", got "INFO" at line 6, column 13.
同一张表创建名称不同的视图:
create view ZLXX.WT_TRADE_REFUND2 (
id varchar primary key,
info.trade_id varchar,
info.amount varchar,
info.status varchar,
info.modify_time varchar
) as select * from zlxx.wt_trade_refund;
则是可以的
0: jdbc:phoenix:> select * from zlxx.wt_trade_refund2 limit 5;
+----------+------------+---------+---------+-----------------+
| ID | TRADE_ID | AMOUNT | STATUS | MODIFY_TIME |
+----------+------------+---------+---------+-----------------+
| 1005528 | T12059592 | 100 | 01 | 20200228185803 |
| 1005529 | T12059591 | 10 | 03 | 20200228185846 |
| 1005530 | T12059606 | 10 | 01 | 20200228195442 |
| 1005531 | T12059605 | 10 | 01 | 20200228195442 |
| 1005532 | noTradeId | 100 | 03 | 20200229224015 |
+----------+------------+---------+---------+-----------------+
5 rows selected (0.093 seconds)
简单查看一下Phoenix视图的描述信息(列信息)
0: jdbc:phoenix:> !describe zlxx.wt_trade_refund
+------------+--------------+------------------+--------------+------------+------------+--------------+----------------+-----------------+-----------------+-----------+----------+------+
| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | COLUMN_NAME | DATA_TYPE | TYPE_NAME | COLUMN_SIZE | BUFFER_LENGTH | DECIMAL_DIGITS | NUM_PREC_RADIX | NULLABLE | REMARKS | COLU |
+------------+--------------+------------------+--------------+------------+------------+--------------+----------------+-----------------+-----------------+-----------+----------+------+
| | ZLXX | WT_TRADE_REFUND | ID | 12 | VARCHAR | null | null | null | null | 0 | | |
| | ZLXX | WT_TRADE_REFUND | TRADE_ID | 12 | VARCHAR | null | null | null | null | 1 | | |
| | ZLXX | WT_TRADE_REFUND | AMOUNT | 12 | VARCHAR | null | null | null | null | 1 | | |
| | ZLXX | WT_TRADE_REFUND | STATUS | 12 | VARCHAR | null | null | null | null | 1 | | |
| | ZLXX | WT_TRADE_REFUND | MODIFY_TIME | 12 | VARCHAR | null | null | null | null | 1 | | |
+------------+--------------+------------------+--------------+------------+------------+--------------+----------------+-----------------+-----------------+-----------+----------+------+
简单查看一下Phoenix表元数据信息(TABLE_TYPE为VIEW)
0: jdbc:phoenix:> !tables
+------------+--------------+-------------------+---------------+----------+------------+----------------------------+-----------------+--------------+-----------------+---------------+-+
| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | TYPE_NAME | SELF_REFERENCING_COL_NAME | REF_GENERATION | INDEX_STATE | IMMUTABLE_ROWS | SALT_BUCKETS | |
+------------+--------------+-------------------+---------------+----------+------------+----------------------------+-----------------+--------------+-----------------+---------------+-+
| | SYSTEM | CATALOG | SYSTEM TABLE | | | | | | false | null | |
| | SYSTEM | FUNCTION | SYSTEM TABLE | | | | | | false | null | |
| | SYSTEM | LOG | SYSTEM TABLE | | | | | | true | 32 | |
| | SYSTEM | SEQUENCE | SYSTEM TABLE | | | | | | false | null | |
| | SYSTEM | STATS | SYSTEM TABLE | | | | | | false | null | |
| | ZLXX | WT_TRADE_REFUND | VIEW | | | | | | false | null | |
| | ZLXX | WT_TRADE_REFUND2 | VIEW | | | | | | false | null | |
+------------+--------------+-------------------+---------------+----------+------------+----------------------------+-----------------+--------------+-----------------+---------------+-+
# 在HBase里插入点数据看看Phoenix的视图能否实时更新?答案是YES!
put 'ZLXX:WT_TRADE_REFUND', '2009999', 'INFO:TRADE_ID', 'ZXR3099999'
put 'ZLXX:WT_TRADE_REFUND', '2009999', 'INFO:AMOUNT', '100'
put 'ZLXX:WT_TRADE_REFUND', '2009999', 'INFO:STATUS', '00'
put 'ZLXX:WT_TRADE_REFUND', '2009999', 'INFO:MODIFY_TIME', '20200715124015'
hbase shell端put数据
Phoenix端查询数据
总结:
1.hbase shell这边创建表时,命名空间namespace和表名称都要大写,这样方便Phoenix这边小写。
2.在hbase shell这边,命名空间与表名称之间需要注意是双引号,DataX里的全表名称也是如此。
3.Phoenix这边创建相同命名空间名称的schema。
4.Phoenix这边创建view视图映射有两种方式:as select和column_encoded_bytes=0。
附录hbase整合Phoenix的基本配置:
hbase-site.xml
phoenix.schema.isNamespaceMappingEnabled
true
phoenix.schema.mapSystemTablesToNamespace
true
【Phoenix视图映射和DataX数据同步到HBase、END】
往期推荐:
Kafka基础知识总结
Hadoop YARN:ApplicationMaster向ResourceManager注册AM源码调试
Apache Hadoop YARN:Client<-->ResourceManager源码DEBUG
Hadoop YARN:ApplicationMaster与ResourceManager交互源码解析
Hive企业级调优
HiveQL查询连续三天有销售记录的店铺
HiveQL实战蚂蚁森林低碳用户排名分析:解法一
HiveQL实战蚂蚁森林低碳用户排名分析:解法二
HiveQL实战蚂蚁森林植物申领统计分析
Hive-函数
Hive-查询
Hive-DML(Data Manipulation Language)数据操作语言
Hive-DDL(Data Definition Language)数据定义
Hive优化(整理版)
Spark Core之Shuffle解析
数据仓库开发规范