我的机器环境: hadoop2.6.0 的伪分布式 Hbase伪分布式环境
参考:hbase权威指南P240
1.启动hadoop和hbase
2.下载apache-hive-1.2.1
3.修改hive中conf下的hive-env.sh
# Set HADOOP_HOME to point to a specific hadoop install directory
HADOOP_HOME=/home/hadoop/hadoop
HBASE_HOME=/home/hadoop/hbase-1.2.2
# Hive Configuration Directory can be controlled by:
# export HIVE_CONF_DIR=
export HIVE_CLASSPATH=/home/hadoop/hbase-1.2.2/conf
# Folder containing extra ibraries required for hive compilation/execution can be controlled by:
export HIVE_AUX_JARS_PATH=/home/hadoop/hbase-1.2.2/lib
4.启动hive
备注:给通过hive给hbase建表时,如果出现下面的错误,需重新编译hive-hbase-handler-1.2.1.jar,替换hive/lib下的原jar包
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V
操作记录:
hadoop@ubuntu:~/apache-hive-1.2.1-bin/bin$ ./hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/spark-1.6.1-bin-hadoop2.6/lib/spark-assembly-1.6.1-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hbase-1.2.2/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/spark-1.6.1-bin-hadoop2.6/lib/spark-assembly-1.6.1-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hbase-1.2.2/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/home/hadoop/apache-hive-1.2.1-bin/lib/hive-common-1.2.1.jar!/hive-log4j.properties
hive> create table pokes(foo int,bar string);
OK
Time taken: 3.432 seconds
hive> load data local inpath '/home/hadoop/apache-hive-1.2.1-bin/examples/files/kv1.txt' overwrite into table pokes;
Loading data to table default.pokes
Table default.pokes stats: [numFiles=1, numRows=0, totalSize=5812, rawDataSize=0]
OK
Time taken: 1.353 seconds
hive> select * from pokes;
OK
238 val_238
86 val_86
311 val_311
27 val_27
165 val_165
409 val_409
Time taken: 1.143 seconds, Fetched: 500 row(s)
hive> create table hbase_table_1(key int,value string)
> stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> with serdeproperties("hbase.columns.mapping"=":key,cf1:val")
> tblproperties("hbase.table.name"="hbase_hive_t1");
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V
针对这个错误,网上说这是不兼容造成的,网络上提供了两种解决方案:
1.换更高版本的hive 例如2.xx 可经试验发现问题依旧没有解决
2.重新编译hive-hbase-handler-1.2.1.jar,替换hive/lib中的同名包(此方法可行)
在网上有编译好的直接下载也可以: http://download.csdn.net/download/gao634209276/9530079
hive> create table hbase_table_1(key int,value string)
> stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> with serdeproperties("hbase.columns.mapping"=":key,cf1:val")
> tblproperties("hbase.table.name"="hbase_hive_t1");
OK
Time taken: 4.788 seconds
hive>
> ;
hive> insert overwrite table hbase_table_1 select * from pokes;
Query ID = hadoop_20170117004636_520fee8b-9d6c-4b41-88a5-a58402e0b6af
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1484619043631_0001, Tracking URL = http://ubuntu:8088/proxy/application_1484619043631_0001/
Kill Command = /home/hadoop/hadoop/bin/hadoop job -kill job_1484619043631_0001
Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0
2017-01-17 00:47:53,388 Stage-0 map = 0%, reduce = 0%
2017-01-17 00:48:21,381 Stage-0 map = 100%, reduce = 0%, Cumulative CPU 6.54 sec
MapReduce Total cumulative CPU time: 6 seconds 540 msec
Ended Job = job_1484619043631_0001
MapReduce Jobs Launched:
Stage-Stage-0: Map: 1 Cumulative CPU: 7.34 sec HDFS Read: 15889 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 7 seconds 340 msec
OK
Time taken: 108.485 seconds
hive> select count(*) from pokes;
Query ID = hadoop_20170117004939_099ed588-fbb4-4b9a-ac1c-1fb6259e7d11
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapreduce.job.reduces=
Starting Job = job_1484619043631_0002, Tracking URL = http://ubuntu:8088/proxy/application_1484619043631_0002/
Kill Command = /home/hadoop/hadoop/bin/hadoop job -kill job_1484619043631_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2017-01-17 00:50:10,356 Stage-1 map = 0%, reduce = 0%
2017-01-17 00:50:30,514 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.94 sec
2017-01-17 00:50:49,055 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.38 sec
MapReduce Total cumulative CPU time: 6 seconds 380 msec
Ended Job = job_1484619043631_0002
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 6.38 sec HDFS Read: 12409 HDFS Write: 4 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 380 msec
OK
500
Time taken: 72.3 seconds, Fetched: 1 row(s)
hive> select count(*) from hbase_table_1;
Query ID = hadoop_20170117005103_2fa584c7-0c2f-4b40-bc86-093f01e35a00
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapreduce.job.reduces=
Starting Job = job_1484619043631_0003, Tracking URL = http://ubuntu:8088/proxy/application_1484619043631_0003/
Kill Command = /home/hadoop/hadoop/bin/hadoop job -kill job_1484619043631_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2017-01-17 00:51:53,774 Stage-1 map = 0%, reduce = 0%
2017-01-17 00:52:16,564 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 6.42 sec
2017-01-17 00:52:36,997 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 9.93 sec
MapReduce Total cumulative CPU time: 9 seconds 930 msec
Ended Job = job_1484619043631_0003
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 9.93 sec HDFS Read: 13551 HDFS Write: 4 SUCCESS
Total MapReduce CPU Time Spent: 9 seconds 930 msec
OK
309
Time taken: 95.345 seconds, Fetched: 1 row(s)
hive> drop table pokes;
OK
Time taken: 3.374 seconds
hive> select * from pokes;
FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'pokes'
hive> drop table hbase_table_1;
OK
Time taken: 4.64 seconds