1、配置
1.1 如果是自己通过tar包自行搭建的hadoop集群,需要如下两步配置:
1.1.1 拷贝jar包
(1)将Hive lib路径内的hive-hbase-handler-*.jar拷贝到HBase lib路径下
(2)将HBase lib下所有jar包拷贝到Hive lib路径下
1.1.2 在Hive的配置文件hive-site.xml中增加如下属性
hbase.zookeeper.quorum
your_zookeeper_nodes_name
1.2 如果是通过Ambari搭建的Hadoop集群,则安装好Hive和HBase后,就已经整合好了,能直接使用;
2、创建内部表
CREATE TABLE hbasetbl(key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
TBLPROPERTIES ("hbase.table.name" = "xyz", "hbase.mapred.output.outputtable" = "xyz");
必须保证HBase中没有xyz这张表,否则报错;
创建好表后,测试:
2.1 Hive -> HBase
在Hive中插入数据
insert into hbasetbl values(1,'zhangsan');
hive> select * from hbasetbl;
OK
1 zhangsan
Time taken: 0.185 seconds, Fetched: 1 row(s)
在HBase中查询数据,并刷新到磁盘
hbase(main):015:0> scan 'xyz'
ROW COLUMN+CELL
1 column=cf1:val, timestamp=1554105191166, value=zhangsan
1 row(s) in 0.0280 seconds
hbase(main):016:0> flush 'xyz'
0 row(s) in 0.4490 seconds
虽然hbasetbl作为内部表,一般情况下是存放在Hive路径下的,但是已经跟HBase整合后,记为特殊情况,数据保存在HBase中;
2.2 HBase -> Hive
从HBase中插入数据
hbase(main):017:0> put 'xyz','2','cf1:val','lisi'
0 row(s) in 0.0720 seconds
从Hive中查询
hive> select * from hbasetbl;
OK
1 zhangsan
2 lisi
Time taken: 0.167 seconds, Fetched: 2 row(s)
2.3 HBase -> Hive
HBase中插入整合时没有映射的列,Hive中查询不到数据
hbase(main):018:0> put 'xyz','2','cf1:value','lisi'
0 row(s) in 0.0080 seconds
hbase(main):019:0> scan 'xyz'
ROW COLUMN+CELL
1 column=cf1:val, timestamp=1554105191166, value=zhangsan
2 column=cf1:val, timestamp=1554105536250, value=lisi
2 column=cf1:value, timestamp=1554105661416, value=lisi
2 row(s) in 0.0270 seconds
从Hive中查询
hive> select * from hbasetbl;
OK
1 zhangsan
2 lisi
Time taken: 0.162 seconds, Fetched: 2 row(s)
3、创建外部表
与内部表必须要求映射的表必须不存在相反,外部表要求映射的HBase表必须存在,否则报错,所以先创建t_order表
hbase(main):020:0> create 't_order','order'
0 row(s) in 1.3790 seconds
CREATE EXTERNAL TABLE tmp_order
(key string, id string, user_id string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,order:order_id,order:user_id")
TBLPROPERTIES ("hbase.table.name" = "t_order");
测试:
3.1 HBase -> Hive
从HBase中插入数据
hbase(main):021:0> put 't_order','111','order:order_id','1'
0 row(s) in 0.0600 seconds
hbase(main):022:0> put 't_order','111','order:user_id','1'
0 row(s) in 0.0070 seconds
hbase(main):023:0> scan 't_order'
ROW COLUMN+CELL
111 column=order:order_id, timestamp=1554105932366, value=1
111 column=order:user_id, timestamp=1554105942562, value=1
1 row(s) in 0.0530 seconds
从Hive中查询
hive> CREATE EXTERNAL TABLE tmp_order
> (key string, id string, user_id string)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,order:order_id,order:user_id")
> TBLPROPERTIES ("hbase.table.name" = "t_order");
OK
Time taken: 0.246 seconds
3.2 Hive -> HBase
从Hive中插入数据
insert into tmp_order values(2,'2222','2222');
hive> select * from tmp_order;
OK
111 1 1
2 2222 2222
Time taken: 0.175 seconds, Fetched: 2 row(s)
从HBase中查询
hbase(main):024:0> scan 't_order'
ROW COLUMN+CELL
111 column=order:order_id, timestamp=1554105932366, value=1
111 column=order:user_id, timestamp=1554105942562, value=1
2 column=order:order_id, timestamp=1554106121347, value=2222
2 column=order:user_id, timestamp=1554106121347, value=2222
2 row(s) in 0.0320 seconds
hbase(main):025:0> flush 't_order'
0 row(s) in 0.2800 seconds
刷新数据,真正的数据也是保存在HBase中的;