1.文档编写目的
昨天我们简单介绍了一下Apache Phoenix,请参考Cloudera Labs中的Phoenix。今天我们主要讲述如何在CDH中安装配置Phoenix,并会做一些使用示例。
1.安装及配置Phoenix
2.Phoenix的基本操作
3.使用Phoenix bulkload数据到HBase
4.使用Phoenix从HBase中导出数据到HDFS
1.CDH5.11.2
2.RedHat7.2
3.Phoenix4.7.0
1.CDH集群正常
2.HBase服务已经安装并正常运行
3.测试csv数据已准备
4.Redhat7中的httpd服务已安装并使用正常
2.在CDH集群中安装Phoenix
1.到Cloudera官网下载Phoenix的Parcel,注意选择与操作系统匹配的版本,因为本次测试使用的是Redhat7,所以选择后缀名为el7的文件。下载地址为:
http://archive.cloudera.com/cloudera-labs/phoenix/parcels/latest/
具体需要下载的三个文件地址为:
http://archive.cloudera.com/cloudera-labs/phoenix/parcels/latest/CLABS_PHOENIX-4.7.0-1.clabs_phoenix1.3.0.p0.000-el7.parcel http://archive.cloudera.com/cloudera-labs/phoenix/parcels/latest/CLABS_PHOENIX-4.7.0-1.clabs_phoenix1.3.0.p0.000-el7.parcel.sha1 http://archive.cloudera.com/cloudera-labs/phoenix/parcels/latest/manifest.json
2.将下载好的文件发布到httpd服务,可以用浏览器打开页面进行测试。
[ec2-user@ip-172-31-22-86 phoenix]$ pwd /var/www/html/phoenix [ec2-user@ip-172-31-22-86 phoenix]$ ll total 192852 -rw-r--r-- 1 root root 41 Jun 24 2016 CLABS_PHOENIX-4.7.0-1.clabs_phoenix1.3.0.p0.000-el7.parcel.sha1 -rw-r--r-- 1 root root 197466534 Jun 24 2016 CLABS_PHOENIX-4.7.0-1.clabs_phoenix1.3.0.p0.000-el7.parcel -rw-r--r-- 1 root root 4687 Jun 24 2016 manifest.json [ec2-user@ip-172-31-22-86 phoenix]$
3.从Cloudera Manager点击“Parcel”进入Parcel管理页面
点击“配置”,输入Phoenix的Parcel包http地址。
点击“保存更改“回到Parcel管理页面,发现CM已发现Phoenix的Parcel。
点击“下载”->“分配”->“激活”
4.回到CM主页,发现HBase服务需要部署客户端配置以及重启
重启HBase服务
安装完成。
3.如何在CDH集群中使用Phoenix
3.1Phoenix的基本操作
进入Phoenix的脚本命令目录
[ec2-user@ip-172-31-22-86 bin]$ cd /opt/cloudera/parcels/CLABS_PHOENIX/bin [ec2-user@ip-172-31-22-86 bin]$ ll total 16 -rwxr-xr-x 1 root root 672 Jun 24 2016 phoenix-performance.py -rwxr-xr-x 1 root root 665 Jun 24 2016 phoenix-psql.py -rwxr-xr-x 1 root root 668 Jun 24 2016 phoenix-sqlline.py -rwxr-xr-x 1 root root 674 Jun 24 2016 phoenix-utils.py
使用Phoenix登录HBase
[ec2-user@ip-172-31-22-86 bin]$ ./phoenix-sqlline.py Zookeeper not specified. Usage: sqlline.pyExample: 1. sqlline.py localhost:2181:/hbase 2. sqlline.py localhost:2181:/hbase ../examples/stock_symbol.sql
需要指定Zookeeper
[ec2-user@ip-172-31-22-86 bin]$ ./phoenix-sqlline.py ip-172-31-21-45:2181:/hbase ... sqlline version 1.1.8 0: jdbc:phoenix:ip-172-31-21-45:2181:/hbase> !tables +------------+--------------+-------------+---------------+----------+------------+--------------------+ | TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | TYPE_NAME | SELF_REFERENCING_C | +------------+--------------+-------------+---------------+----------+------------+--------------------+ | | SYSTEM | CATALOG | SYSTEM TABLE | | | | | | SYSTEM | FUNCTION | SYSTEM TABLE | | | | | | SYSTEM | SEQUENCE | SYSTEM TABLE | | | | | | SYSTEM | STATS | SYSTEM TABLE | | | | | | | ITEM | TABLE | | | | +------------+--------------+-------------+---------------+----------+------------+--------------------+ 0: jdbc:phoenix:ip-172-31-21-45:2181:/hbase>
创建一张测试表
注意:建表必须指定主键。
0: jdbc:phoenix:ip-172-31-21-45:2181:/hbase> create table hbase_test . . . . . . . . . . . . . . . . . . . . . .> ( . . . . . . . . . . . . . . . . . . . . . .> s1 varchar not null primary key, . . . . . . . . . . . . . . . . . . . . . .> s2 varchar, . . . . . . . . . . . . . . . . . . . . . .> s3 varchar, . . . . . . . . . . . . . . . . . . . . . .> s4 varchar . . . . . . . . . . . . . . . . . . . . . .> ); No rows affected (1.504 seconds)
在hbase shell中进行检查
插入一行数据。注意:Phoenix中没有insert语法,用upsert代替。参考:http://phoenix.apache.org/language/index.html
0: jdbc:phoenix:ip-172-31-21-45:2181:/hbase> upsert into hbase_test values('1','testname','testname1','testname2'); 1 row affected (0.088 seconds) 0: jdbc:phoenix:ip-172-31-21-45:2181:/hbase> select * from hbase_test; +-----+-----------+------------+------------+ | S1 | S2 | S3 | S4 | +-----+-----------+------------+------------+ | 1 | testname | testname1 | testname2 | +-----+-----------+------------+------------+ 1 row selected (0.049 seconds) 0: jdbc:phoenix:ip-172-31-21-45:2181:/hbase>
在hbase shell中进行检查
删除这行数据,delete测试
0: jdbc:phoenix:ip-172-31-21-45:2181:/hbase> delete from hbase_test where s1='1'; 1 row affected (0.018 seconds) 0: jdbc:phoenix:ip-172-31-21-45:2181:/hbase> select * from hbase_test; +-----+-----+-----+-----+ | S1 | S2 | S3 | S4 | +-----+-----+-----+-----+ +-----+-----+-----+-----+ No rows selected (0.045 seconds) 0: jdbc:phoenix:ip-172-31-21-45:2181:/hbase>
在hbase shell中进行检查
更新数据测试,注意Phoenix中没有update语法,用upsert代替。插入多条数据需要执行多条upsert语句,没办法将所有的数据都写到一个“values”后面。
0: jdbc:phoenix:ip-172-31-21-45:2181:/hbase> upsert into hbase_test values('1','testname','testname1','testname2'); 1 row affected (0.017 seconds) 0: jdbc:phoenix:ip-172-31-21-45:2181:/hbase> upsert into hbase_test values('2','testname','testname1','testname2'); 1 row affected (0.007 seconds) 0: jdbc:phoenix:ip-172-31-21-45:2181:/hbase> upsert into hbase_test values('3','testname','testname1','testname2'); 1 row affected (0.008 seconds) 0: jdbc:phoenix:ip-172-31-21-45:2181:/hbase> select * from hbase_test; +-----+-----------+------------+------------+ | S1 | S2 | S3 | S4 | +-----+-----------+------------+------------+ | 1 | testname | testname1 | testname2 | | 2 | testname | testname1 | testname2 | | 3 | testname | testname1 | testname2 | +-----+-----------+------------+------------+ 3 rows selected (0.067 seconds) 0: jdbc:phoenix:ip-172-31-21-45:2181:/hbase> upsert into hbase_test values('1','fayson','testname1','testname2'); 1 row affected (0.009 seconds) 0: jdbc:phoenix:ip-172-31-21-45:2181:/hbase> select * from hbase_test; +-----+-----------+------------+------------+ | S1 | S2 | S3 | S4 | +-----+-----------+------------+------------+ | 1 | fayson | testname1 | testname2 | | 2 | testname | testname1 | testname2 | | 3 | testname | testname1 | testname2 | +-----+-----------+------------+------------+ 3 rows selected (0.037 seconds) 0: jdbc:phoenix:ip-172-31-21-45:2181:/hbase>
在hbase shell中进行检查
批量更新测试,创建另外一张表hbase_test1,表结构与hbase_test一样,并插入五条,有两条是hbase_test中没有的(主键为4,5),有一条与hbase_test中的数据不一样(主键为1),有两条是完全一样(主键为2,3)。
0: jdbc:phoenix:ip-172-31-21-45:2181:/hbase> create table hbase_test1 . . . . . . . . . . . . . . . . . . . . . .> ( . . . . . . . . . . . . . . . . . . . . . .> s1 varchar not null primary key, . . . . . . . . . . . . . . . . . . . . . .> s2 varchar, . . . . . . . . . . . . . . . . . . . . . .> s3 varchar, . . . . . . . . . . . . . . . . . . . . . .> s4 varchar . . . . . . . . . . . . . . . . . . . . . .> ); No rows affected (1.268 seconds) 0: jdbc:phoenix:ip-172-31-21-45:2181:/hbase> 0: jdbc:phoenix:ip-172-31-21-45:2181:/hbase> upsert into hbase_test1 values('1','fayson','testname1','testname2'); 1 row affected (0.031 seconds) 0: jdbc:phoenix:ip-172-31-21-45:2181:/hbase> upsert into hbase_test1 values('2','testname','testname1','testname2'); 1 row affected (0.006 seconds) 0: jdbc:phoenix:ip-172-31-21-45:2181:/hbase> upsert into hbase_test1 values('3','testname','testname1','testname2'); 1 row affected (0.005 seconds) 0: jdbc:phoenix:ip-172-31-21-45:2181:/hbase> upsert into hbase_test1 values('4','testname','testname1','testname2'); 1 row affected (0.005 seconds) 0: jdbc:phoenix:ip-172-31-21-45:2181:/hbase> upsert into hbase_test1 values('5','testname','testname1','testname2'); 1 row affected (0.007 seconds) 0: jdbc:phoenix:ip-172-31-21-45:2181:/hbase> select * from hbase_test1; +-----+-----------+------------+------------+ | S1 | S2 | S3 | S4 | +-----+-----------+------------+------------+ | 1 | fayson | testname1 | testname2 | | 2 | testname | testname1 | testname2 | | 3 | testname | testname1 | testname2 | | 4 | testname | testname1 | testname2 | | 5 | testname | testname1 | testname2 | +-----+-----------+------------+------------+ 5 rows selected (0.038 seconds)
批量更新,我们用hbase_test1中的数据去更新hbase_test。
0: jdbc:phoenix:ip-172-31-21-45:2181:/hbase> upsert into hbase_test select * from hbase_test1; 5 rows affected (0.03 seconds) 0: jdbc:phoenix:ip-172-31-21-45:2181:/hbase> select * from hbase_test; +-----+-----------+------------+------------+ | S1 | S2 | S3 | S4 | +-----+-----------+------------+------------+ | 1 | fayson | testname1 | testname2 | | 2 | testname | testname1 | testname2 | | 3 | testname | testname1 | testname2 | | 4 | testname | testname1 | testname2 | | 5 | testname | testname1 | testname2 | +-----+-----------+------------+------------+ 5 rows selected (0.039 seconds) 0: jdbc:phoenix:ip-172-31-21-45:2181:/hbase>
批量更新发现对于已有的数据,如果值不一样,会覆盖,对于相同的数据会保持不变,对于没有的数据会直接作为新的数据插入。
3.2使用Phoenix bulkload数据到HBase
准备需要批量导入的测试数据,这里使用TPC_DS的item表数据。
[ec2-user@ip-172-31-22-86 ~]$ ll item.dat -rw-r--r-- 1 root root 28855325 Oct 3 10:23 item.dat [ec2-user@ip-172-31-22-86 ~]$ head -1 item.dat 1|AAAAAAAABAAAAAAA|1997-10-27||Powers will not get influences. Electoral ports should show low, annual chains. Now young visitors may pose now however final pages. Bitterly right children suit increasing, leading el|27.02|23.23|5003002|exportischolar #2|3|pop|5|Music|52|ableanti|N/A|3663peru009490160959|spring|Tsp|Unknown|6|ought|
因为Phoenix的bulkload只能导入csv,所以我们先把该数据的分隔符修改为逗号,并且后缀名改为.csv
[ec2-user@ip-172-31-22-86 ~]$ sed -i 's/|/,/g' item.dat [ec2-user@ip-172-31-22-86 ~]$ mv item.dat item.csv [ec2-user@ip-172-31-22-86 ~]$ ll item.csv -rw-r--r-- 1 ec2-user ec2-user 28855325 Oct 3 10:26 item.csv [ec2-user@ip-172-31-22-86 ~]$ head -1 item.csv 1,AAAAAAAABAAAAAAA,1997-10-27,,Powers will not get influences. Electoral ports should show low, annual chains. Now young visitors may pose now however final pages. Bitterly right children suit increasing, leading el,27.02,23.23,5003002,exportischolar #2,3,pop,5,Music,52,ableanti,N/A,3663peru009490160959,spring,Tsp,Unknown,6,ought,
上传该文件到HDFS
[ec2-user@ip-172-31-22-86 ~]$ hadoop fs -mkdir /fayson [ec2-user@ip-172-31-22-86 ~]$ hadoop fs -put item.csv /fayson [ec2-user@ip-172-31-22-86 ~]$ hadoop fs -ls /fayson Found 1 items -rw-r--r-- 3 ec2-user supergroup 28855325 2017-10-03 10:28 /fayson/item.csv [ec2-user@ip-172-31-22-86 ~]$
通过Phoenix创建item表,注意为了方便阅读,只创建了4个字段
0: jdbc:phoenix:ip-172-31-21-45:2181:/hbase> create table item . . . . . . . . . . . . . . . . . . . . . .> ( . . . . . . . . . . . . . . . . . . . . . .> i_item_sk varchar not null primary key, . . . . . . . . . . . . . . . . . . . . . .> i_item_id varchar, . . . . . . . . . . . . . . . . . . . . . .> i_rec_start_varchar varchar, . . . . . . . . . . . . . . . . . . . . . .> i_rec_end_date varchar . . . . . . . . . . . . . . . . . . . . . .> ); No rows affected (1.268 seconds) 0: jdbc:phoenix:ip-172-31-21-45:2181:/hbase>
执行bulkload命令导入数据
[ec2-user@ip-172-31-22-86 ~]$ HADOOP_CLASSPATH=/opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol-1.2.0-cdh5.12.1.jar:/opt/cloudera/parcels/CDH/lib/hbase/conf hadoop jar /opt/cloudera/parcels/CLABS_PHOENIX/lib/phoenix/phoenix-4.7.0-clabs-phoenix1.3.0-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool -t item -i /fayson/item.csv 17/10/03 10:32:24 INFO util.QueryUtil: Creating connection with the jdbc url: jdbc:phoenix:ip-172-31-21-45.ap-southeast-1.compute.internal,ip-172-31-22-86.ap-southeast-1.compute.internal,ip-172-31-26-102.ap-southeast-1.compute.internal:2181:/hbase; ... 17/10/03 10:32:24 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=ip-172-31-21-45.ap-southeast-1.compute.internal:2181,ip-172-31-22-86.ap-southeast-1.compute.internal:2181,ip-172-31-26-102.ap-southeast-1.compute.internal:2181 sessionTimeout=60000 watcher=hconnection-0x7a9c0c6b0x0, quorum=ip-172-31-21-45.ap-southeast-1.compute.internal:2181,ip-172-31-22-86.ap-southeast-1.compute.internal:2181,ip-172-31-26-102.ap-southeast-1.compute.internal:2181, baseZNode=/hbase 17/10/03 10:32:24 INFO zookeeper.ClientCnxn: Opening socket connection to server ip-172-31-21-45.ap-southeast-1.compute.internal/172.31.21.45:2181. Will not attempt to authenticate using SASL (unknown error) ... 17/10/03 10:32:30 INFO mapreduce.Job: Running job: job_1507035313248_0001 17/10/03 10:32:38 INFO mapreduce.Job: Job job_1507035313248_0001 running in uber mode : false 17/10/03 10:32:38 INFO mapreduce.Job: map 0% reduce 0% 17/10/03 10:32:52 INFO mapreduce.Job: map 100% reduce 0% 17/10/03 10:33:01 INFO mapreduce.Job: map 100% reduce 100% 17/10/03 10:33:01 INFO mapreduce.Job: Job job_1507035313248_0001 completed successfully 17/10/03 10:33:01 INFO mapreduce.Job: Counters: 50 ... 17/10/03 10:33:01 INFO mapreduce.AbstractBulkLoadTool: Loading HFiles from /tmp/fef0045b-8a31-4d95-985a-bee08edf2cf9 ...
在Phoenix中查询该表
0: jdbc:phoenix:ip-172-31-21-45:2181:/hbase> select * from item limit 10; +------------+-------------------+----------------------+-----------------+ | I_ITEM_SK | I_ITEM_ID | I_REC_START_VARCHAR | I_REC_END_DATE | +------------+-------------------+----------------------+-----------------+ | 1 | AAAAAAAABAAAAAAA | 1997-10-27 | | | 10 | AAAAAAAAKAAAAAAA | 1997-10-27 | 1999-10-27 | | 100 | AAAAAAAAEGAAAAAA | 1997-10-27 | 1999-10-27 | | 1000 | AAAAAAAAIODAAAAA | 1997-10-27 | 1999-10-27 | | 10000 | AAAAAAAAABHCAAAA | 1997-10-27 | 1999-10-27 | | 100000 | AAAAAAAAAKGIBAAA | 1997-10-27 | 1999-10-27 | | 100001 | AAAAAAAAAKGIBAAA | 1999-10-28 | 2001-10-26 | | 100002 | AAAAAAAAAKGIBAAA | 2001-10-27 | | | 100003 | AAAAAAAADKGIBAAA | 1997-10-27 | | | 100004 | AAAAAAAAEKGIBAAA | 1997-10-27 | 2000-10-26 | +------------+-------------------+----------------------+-----------------+ 10 rows selected (0.054 seconds) 0: jdbc:phoenix:ip-172-31-21-45:2181:/hbase>
在hbase shell中查询该表
hbase(main):002:0> scan 'ITEM', LIMIT => 10 ROW COLUMN+CELL 1 column=0:I_ITEM_ID, timestamp=1507041176470, value=AAAAAAAABAAAAAAA 1 column=0:I_REC_START_VARCHAR, timestamp=1507041176470, value=1997-10-27 1 column=0:_0, timestamp=1507041176470, value= 10 column=0:I_ITEM_ID, timestamp=1507041176470, value=AAAAAAAAKAAAAAAA 10 column=0:I_REC_END_DATE, timestamp=1507041176470, value=1999-10-27 10 column=0:I_REC_START_VARCHAR, timestamp=1507041176470, value=1997-10-27 10 column=0:_0, timestamp=1507041176470, value= ... 100004 column=0:I_REC_START_VARCHAR, timestamp=1507041176470, value=1997-10-27 100004 column=0:_0, timestamp=1507041176470, value= 10 row(s) in 0.2360 seconds
入库条数检查
条数相等,全部入库成功。
3.3使用Phoenix从HBase中导出数据到HDFS
Phoenix还提供了使用MapReduce导出数据到HDFS的功能,以pig的脚本执行。首先准备pig脚本。
[ec2-user@ip-172-31-22-86 ~]$ cat export.pig REGISTER /opt/cloudera/parcels/CLABS_PHOENIX/lib/phoenix/phoenix-4.7.0-clabs-phoenix1.3.0-client.jar; rows = load 'hbase://query/SELECT * FROM ITEM' USING org.apache.phoenix.pig.PhoenixHBaseLoader('ip-172-31-21-45:2181'); STORE rows INTO 'fayson1' USING PigStorage(',');
[ec2-user@ip-172-31-22-86 ~]$
执行该脚本
[ec2-user@ip-172-31-22-86 ~]$ pig -x mapreduce export.pig ... Counters: Total records written : 102000 Total bytes written : 4068465 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 Job DAG: job_1507035313248_0002 2017-10-03 10:45:38,905 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
导出成功后检查HDFS中的数据
[ec2-user@ip-172-31-22-86 ~]$ hadoop fs -ls /user/ec2-user/fayson1 Found 2 items -rw-r--r-- 3 ec2-user supergroup 0 2017-10-03 10:45 /user/ec2-user/fayson1/_SUCCESS -rw-r--r-- 3 ec2-user supergroup 4068465 2017-10-03 10:45 /user/ec2-user/fayson1/part-m-00000 [ec2-user@ip-172-31-22-86 ~]$ hadoop fs -cat /user/ec2-user/fayson1/part-m-00000 | head -2 1,AAAAAAAABAAAAAAA,1997-10-27, 10,AAAAAAAAKAAAAAAA,1997-10-27,1999-10-27 cat: Unable to write to output stream. [ec2-user@ip-172-31-22-86 ~]$
检查条数为10200与原始数据一致,全部导出成功。
4.总结