Hive3.1.2安装及Hive与HBase的结合使用

Hive3.1.2安装及Hive与HBase的结合使用

一、说明

安装包:apache-hive-3.1.2-bin.tar.gz

二、前期准备

1、mysql 5.7

创建用户 hive

mysql > grant all privileges on *.* to 'hive'@'%' identified by '123456'; 
2、其他环境

Zookeeper:zookeeper-3.4.14

Hadoop:hadoop-3.1.2

HBase:hbase-2.2.1-bin

三、安装HIVE

1、解压安装包

2、进入conf目录

3、创建hive-env.sh文件

[admin@centos7x3 conf]$ cp hive-env.sh.template hive-env.sh
[admin@centos7x3 conf]$ vim hive-env.sh

# HADOOP_HOME=${bin}/../../hadoop
HADOOP_HOME=/opt/software/hadoop
# 这一条是用来关联Hbase的,使用这条可以不在hive-site.xmld里声明hive.aux.jars.path
export HBASE_HOME=/opt/software/hbase
4、创建hive-site.xml文件
<configuration>
    <property>
        <name>hive.metastore.schema.verificationname>
        <value>falsevalue>
    property>

    
    <property>
        <name>hive.metastore.warehouse.dirname>
        <value>/hive/warehousevalue>
    property>
    <property>
        <name>hive.exec.scratchdirname>
        <value>/hive/tmpvalue>
    property>

    <property>
        <name>hive.querylog.locationname>
        <value>/hive/logvalue>
    property>

    <property>
        <name>hive.exec.mode.local.autoname>
        <value>falsevalue>
        <description> Let Hive determine whether to run in local mode automatically description>
    property>
    <property>
        <name>hive.metastore.urisname>
        <value>thrift://node3:9083value>
        <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.description>
    property>
    <property>
        <name>javax.jdo.option.ConnectionURLname>
        <value>jdbc:mysql://node3:3306/hive?createDatabaseIfNotExist=true&useSSL=falsevalue>
    property>
    <property>
        <name>javax.jdo.option.ConnectionDriverNamename>
        <value>com.mysql.jdbc.Drivervalue>
    property>
    <property>
        <name>javax.jdo.option.ConnectionUserNamename>
        <value>hivevalue>
    property>
        <property>
        <name>javax.jdo.option.ConnectionPasswordname>
        <value>123456value>
    property>
    
    <property>
      <name>hive.cli.print.headername>
      <value>truevalue>
    property>

    
    <property>
      <name>hive.cli.print.current.dbname>
      <value>truevalue>
    property>

    
    <property>
        <name>hive.aux.jars.pathname>
        <value>
        file:///opt/software/hive/lib/hbase-client-2.2.1.jar,
        file:///opt/software/hive/lib/hbase-common-2.2.1.jar,
        file:///opt/software/hive/lib/hbase-common-2.2.1-tests.jar,
        file:///opt/software/hive/lib/hbase-server-2.2.1.jar,
        file:///opt/software/hive/lib/hbase-server-2.2.1-tests.jar,
        file:///opt/software/hive/lib/hbase-protocol-2.2.1.jar,
        file:///opt/software/hive/lib/hbase-protocol-shaded-2.2.1.jar
        value>
    property>
    
    <property>
        <name>hbase.zookeeper.quorumname>
        <value>node1,node2,node3value>
    property>
    <property>
        <name>dfs.permissions.enabledname>
        <value>falsevalue>
    property>
configuration>
5、HBase文件

如果在env里export HBASE_HOME=/opt/software/hbase 里,不用拷贝~~~~

​ 将以上《hive.aux.jars.path》里的包从hbase/lib里copy到hive/lib中

6、jdbc驱动文件

​ 将jdbc驱动文件mysql-connector-java-5.1.48.jar,放入hive/lib中,

7、初始化元数据
bin/schematool -dbType mysql -initSchema
[admin@centos7x3 hive]$ bin/schematool -dbType mysql -initSchema
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/software/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/software/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:	 jdbc:mysql://node3:3306/hive?createDatabaseIfNotExist=true
Metastore Connection Driver :	 com.mysql.jdbc.Driver
Metastore connection User:	 hive
Starting metastore schema initialization to 3.1.0
Initialization script hive-schema-3.1.0.mysql.sql

Initialization script completed

schemaTool completed
8、启动hive的metastore服务
[admin@centos7x3 hive]$ bin/hive --service metastore &

四、使用

1、直接在shell中使用hive

[admin@centos7x3 hive]$ bin/hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/software/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/software/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = 1fa22d72-4695-4fd6-a424-2ba39134c713

Logging initialized using configuration in jar:file:/opt/software/hive/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Hive Session ID = ef8cd1ad-f50c-4050-b2e6-99d5a71a9a9d
hive (default)> 

2、与hbase配合

1、内部表

直接使用hive创建表,当在hive里删除的时候,hbase里的数据同时删除;

create table htest(key string,id int,name string) 
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
with serdeproperties ("hbase.columns.mapping"=":key,user:id,user:name") 
tblproperties("hbase.table.name"="htest");

第一行:创建一个外部表,表名为htest,有3列(key,id,name)其中key为必须的rowkey虚拟列;

第二行:固定写法 ,意思表示使用hbase;

第三行:映射字段关系,列簇:列名,顺序与第一行匹配

第四行:映射表名为htest(在hbase里的表名)

在hbase查看刚刚创建的htest表

hbase(main):033:0> list 'htest'
TABLE                                                                                                                                                                  
htest                                                                                                                                                                  
1 row(s)
Took 0.0177 seconds                                                                                                                                                    
=> ["htest"]

hbase(main):034:0> desc 'htest'
Table htest is ENABLED                                                                                                                                                 
htest                                                                                                                                                                  
COLUMN FAMILIES DESCRIPTION                                                                                                                                            
{NAME => 'user', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DAT
A_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false
', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536', METADATA => {'CACHE_DATA_IN
_L1' => 'false'}}                                                                                                                                                      

1 row(s)

QUOTAS                                                                                                                                                                 
0 row(s)
Took 0.1073 seconds  

hbase插入数据

hbase(main):004:0> put 'htest','rk0001','user:id','1'
Took 0.1641 seconds                                                                                                                                                    
hbase(main):005:0> put 'htest','rk0001','user:name','zhangsan'
Took 0.0246 seconds   

hive里查看

hive (default)> 
              > select * from htest;
OK
htest.key	htest.id	htest.name

rk0001	1	zhangsan 
Time taken: 0.676 seconds, Fetched: 1 row(s)

2、外部表

外部表使用hbase里已经存在的表

CREATE EXTERNAL TABLE user_info(key string,age int,name string,Hobbies string) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,base_info:age,base_info:name,extra_info:Hobbies") 
TBLPROPERTIES ("hbase.table.name" = "user_info");

外部表创建的时候,与内部表的区别只在与 关键字 “EXTERNAL”

hive里插入数据:

hive (default)> insert into table user_info values('user0002','33','lisisi','football');

hbase里查询:

hbase(main):036:0> get 'user_info','user0002'
COLUMN                                     CELL                                                                                                                        
 base_info:age                             timestamp=1573197839558, value=33                                                                                           
 base_info:name                            timestamp=1573197839558, value=lisisi                                                                                       
 extra_info:Hobbies                        timestamp=1573197839558, value=football                                                                                     
1 row(s)
Took 0.0169 seconds  

五、Hive和HBase对比

  1. Hive

    (1) 数据仓库
    Hive的本质其实就相当于将HDFS中已经存储的文件在Mysql中做了一个双射关系,以方便使用HQL去管理查询。

    (2) 用于数据分析、清洗
    Hive适用于离线的数据分析和清洗,延迟较高。

    (3) 基于HDFS、MapReduce
    Hive存储的数据依旧在DataNode上,编写的HQL语句将转换为MapReduce代码执行。

  2. HBase

    (1) 数据库
    是一种面向列存储的非关系型数据库。

    (2) 用于存储结构化和非结构化的数据
    适用于单表非关系型数据的存储,不适合做关联查询,类似JOIN等操作。

    (3) 基于HDFS
    数据持久化存储的体现形式是Hfile,存放于DataNode中,被ResionServer以region的形式进行管理。

    (4) 延迟较低,接入在线业务使用
    面对大量的企业数据,HBase可以直线单表大量数据的存储,同时提供了高效的数据访问速度。

你可能感兴趣的:(大数据)