http://hbase.apache.org/book/quickstart.html
http://hive.apache.org/downloads.html
Hortonworks和hadoop ,hive,hbase对应版本:
HDP2.0.6/hadoop2.2.0 , hive 0.12.0, hbase 0.96.1
HDP1.3.3/hadoop1.2.0, ,hive 0.11.0, hbase 0.94.6
HDP2.1/hadoop 2.4.0, hive 0.13, hbase 0.98
HDP2.2/hadoop2.6.0, hive 0.14, hbase 0.98
Hive的安装和配置相当简单,只要从其网站(http://hive.apache.org/downloads.html)上下载并解压到装有hadoop的机器上。设置hadoop/bin的路径到系统PATH就可使hive工作。hive 自从0.11之后就同时支持hadoop 0.20系列和hadoop0.23系列
直接敲${hive-install}/bin/hive就可进入其shell
1.1 show tables:
input hive in shell or input $HIVE_HOME/bin/hive in shell,
hive> show tables;
1.2 create tales:
input hive in shell or input $HIVE_HOME/bin/hive in shell,
hive> CREATE TABLE pokes (foo INT, bar STRING);
1.3 drop table:
input hive in shell or input $HIVE_HOME/bin/hive in shell,
hive> DROP TABLE pokes;
1.4 query tables:
input hive in shell or input $HIVE_HOME/bin/hive in shell,
hive> SELECT * FROM pokes p;
please note that hive is data warehouse, no insert into value(,,,) statement, but can insert/load data from other source.
1.5 execute hsql without entering hive shell cmd:
hive -e " show tables;" --or $HIVE_HOME/bin/hive -e " show tables;"
hive -e " select * from pokes" --or $HIVE_HOME/bin/hive -e " show tables;"
1.6 execute DDL/DML file without entering hive shell cmd:
hive -f hsql.ddl
1.7 load data
1) load data from key/value file seperated by ^A (ctl+a)
create table pokes1 (id int, name string)
pokes1.data:
1^Atony
2^ASmith
2) load data from key/value file seperated by '\t' (Tab)
CREATE TABLE pokes2 ( userid INT,name STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE;
pokes2.data:
36 Smith
40 Tony
64 Huang
===================sample hsql.ddl================================
; --------------Create hive hbase table -----------------
DROP TABLE if exists hbase_drug1n2row;
CREATE EXTERNAL TABLE hbase_drug1n2row(rowid STRING,age STRING,sex STRING,bp STRING,cholesterol STRING,na STRING,k STRING,drug STRING)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,drug:age,drug:sex,drug:bp,drug:cholesterol,drug:na,drug:k,drug:drug")
TBLPROPERTIES("hbase.table.name" = "drug1n2row");
2.1 enter shell:
./bin/hbase shell
hbase(main):001:0>
2.2 show all tables:
hbase> list
2.3show table of test :
hbase> list 'test'
TABLE
test
1 row(s) in 0.0350 seconds
----------------
2.4 create table with name of test and columnfamily name of cf
hbase> create 'test', 'cf'
0 row(s) in 1.2200 seconds
----------------
=> ["test"]
2.5 put data to table
1)put data to table, key is row1 value is value1:
hbase> put 'test', 'row1', 'cf:a', 'value1'
0 row(s) in 0.1770 seconds
2)put multiple data to table, exmaple1:
hbase>put 'test', 'row1', 'cf:a', 'value1'
0 row(s) in 0.1770 seconds hbase>put 'test', 'row2', 'cf:b', 'value2'
0 row(s) in 0.0160 seconds hbase>put 'test', 'row3', 'cf:c', 'value3'
0 row(s) in 0.0260 seconds
3) put multiple data to table ,example2:
hbase> put 'test2', 'row1', 'cf:a', 'value1'
0 row(s) in 0.1770 seconds
hbase> put 'test2', 'row1', 'cf:b', 'value2'
0 row(s) in 0.0160 seconds
hbase> put 'test2', 'row2', 'cf:a', 'value3'
0 row(s) in 0.0260 seconds
2.6 scan table for all data:
hbase>scan 'test'
2.7 disable/enable table:
hbase>disable 'test'
hbase>enable 'test'
2.8 drop table:
hbase>drop table 'test'
2.9 get specific row(s):
hbase>get 'test', 'row1'
可以从hive中访问hbase 的表;也可以在创建hive表时创建与之关联的hbase表(位于hbase数据库);也可以创建基于 已存在的hbase表的hive表。
3.1 从已存在的hbase表中创建hive表:
确保hbase表test已经在hbase中创建好,列族名为cf.
export HADOOP_CLASSPATH=/etc/hbase/conf:/usr/lib/hbase/*:/usr/lib/hbase/lib/*:/usr/lib/zookeeper/zookeeper.jar:$HADOOP_CLASSPATH
进入hive shell运行如下命令:
hive> create external table hbase_test (id STRING, value STRING) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping"=":key,cf:a,cf:b") TBLPROPERTIES("hbase.table.name"="test");
会报错: org.apache.hadoop.hive.hbase.HBaseSerDe: columns has 2 elements while hbase.columns.mapping has 3 elements 这是因为hbase_test申明为2列(id, value)而与hbase表的映射为3列(:key,cf:a,cf:b) .将:key,cf:a,cf:b改为:key,cf:a 即可:
create external table hbase_test (id STRING, value STRING) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping"=":key,cf:a") TBLPROPERTIES("hbase.table.name"="test");
这样会将hbase表(这里是test)的key(hbase put 语句中表名后紧跟着的就是key名,比如put 'test', 'row1', 'cf:a', 'value1' 语句说明此数据key是row1 )映射到hive表(这里是 hbase_test)的id列, 列族cf中a列映射到hive表的value列
1)java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/MasterNotRunningException
这通常是因为hbase-*.jar没有加入到 classpath ,进入hive 命令之前将对应包加入到HADOOP_CLASSPATH.
比如,HDP1.3.3就会报这个错,执行下面:export HADOOP_CLASSPATH=/etc/hbase/conf:/usr/lib/hbase/*:/usr/lib/hbase/lib/*:/usr/lib/zookeeper/zookeeper.jar:$HADOOP_CLASSPATH 再进hive命令即可。(这样就可以将hbase安装目录/usr/lib/hbase下的jar (hbase-0.94.6.1.3.3.0-58-security.jar)引入HADOOP_CLASSPATH)
2) java.lang.NoSuchMethodError: org.apache.thrift.EncodingUtils.setBit(BIZ)B
将${hbase}/lib下的libthrift-0.8.0.jar换成libthrift-0.9.0.jar(在$HIVE/lib下有)即可。
1。hbase总汇:http://hbase.apache.org/#
2。hbase快速指南: http://hbase.apache.org/book/quickstart.html
3. hive/hbase集成:https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration
4.hive总汇: https://cwiki.apache.org/confluence/display/Hive/Home
5.hive快速指南:https://cwiki.apache.org/confluence/display/Hive/GettingStarted
6. Hortonworks HDP hive/hbase教程:
http://hortonworks.com/blog/using-hive-to-interact-with-hbase-part-2/
http://hortonworks.com/community/forums/topic/hive-external-table-pointing-to-hbase-2/