好玩的大数据之17:Hive安装(Hive-3.1.2)

一、简介


hive是基于Hadoop的一个数据仓库工具,用来进行数据提取、转化、加载,这是一种可以存储、查询和分析存储在Hadoop中的大规模数据的机制。hive数据仓库工具能将结构化的数据文件映射为一张数据库表,并提供SQL查询功能,能将SQL语句转变成MapReduce任务来执行。Hive的优点是学习成本低,可以通过类似SQL语句实现快速MapReduce统计,使MapReduce变得更加简单,而不必开发专门的MapReduce应用程序。hive是十分适合数据仓库的统计分析和Windows注册表文件。(摘自百度百科)

二、下载


下载地址:http://hive.apache.org/

下载得到:apache-hive-3.1.2-bin.tar.gz

Hive首页

从下面可以看到,3.1.2可以兼容Hadoop3.x.x

下载页面

选择清华下载点

选择清华下载点

选3.1.2

选3.1.2
选bin文件

三、安装


        tar zxvf apache-hive-3.1.2-bin.tar.gz -C /mylab/soft

四、配置

    这个配置是最麻烦的一个,请客观一定要耐心一点,耐心一点、再耐心一点


    1.修改环境变量

            修改~/.bashrc

                    vi ~/.bashrc

                        #hive 3.1.2

                        export HIVE_HOME=/mylab/soft/apache-hive-3.1.2-bin

                        export PATH=$PATH:$HIVE_HOME/bin                        

                        export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib

                    source ~/.bashrc

    2.为hive配置mysql

            创建新用户

                    CREATE USER 'hive'@'%' IDENTIFIED BY 'hive';

            创建数据库

                    create database hive;

            查看全部用户

                    SELECT user, host FROM mysql.user;

            用户授权

                    GRANT ALL PRIVILEGES ON *.* TO 'hive'@'%' with grant option;

            刷新权限

                    flush privileges;

            显示MySQL用户帐户权限

                    SHOW GRANTS FOR 'hive'@'%';

            测试登录

                    mysql -u hive -p hive

    3.修改hbase配置文件

        a) 创建工作目录

注:在配置文件中要用到的目录,有些我也搞不清楚干啥的,反正看到有目录的地方,我都给它配上一个目录,且都放在$HIVE_HOME/working下,并给$HIVE_HOME/working及所有子目录授予读写属性

mkdir -p $HIVE_HOME/working/exec.scratchdir

mkdir -p $HIVE_HOME/working/repl.rootdir

mkdir -p $HIVE_HOME/working/repl.cmrootdir

mkdir -p $HIVE_HOME/working/replica.functions.root

mkdir -p $HIVE_HOME/working/local.scratchdir

mkdir -p $HIVE_HOME/working/downloaded.resources

mkdir -p $HIVE_HOME/working/metastore.warehouse

mkdir -p $HIVE_HOME/working/replica.functions.root

mkdir -p $HIVE_HOME/working/operation.log

mkdir -p $HIVE_HOME/working/querylog

mkdir -p $HIVE_HOME/working/logs

chmod 777 -R $HIVE_HOME/working

   b) 复制mysql-connector-java        

        为了省事,直接复制mysql-connector-java-8.0.15.jar到$HIVE_HOME/lib

        cp /usr/share/java/mysql-connector-java-8.0.21.jar $HIVE_HOME/lib

        或者

        ln -s /usr/share/java/mysql-connector-java-8.0.21.jar $HIVE_HOME/lib/mysql-connector-java-8.0.21.jar

        要是不确定在哪个目录下,用find命令找

         sudo find / -name mysql-connector-java-8.0.21.jar

   c) 修改hive-env.sh

        这个需要从模板文件里复制

        cp $HIVE_HOME/conf/hive-env.sh.template  $HIVE_HOME/conf/hive-env.sh

        vi $HIVE_HOME/conf/hive-env.sh

            HADOOP_HOME=/mylab/soft/hadoop-3.2.1

            export HIVE_CONF_DIR=/mylab/soft/apache-hive-3.1.2-bin/conf

            export HIVE_AUX_JARS_PATH=/mylab/soft/apache-hive-3.1.2-bin/lib

   c) 修改hive-site.xml

            这个需要从模板文件里复制(这个模板文件名起的有点坑)

            cp $HIVE_HOME/conf/hive-default.xml.template   $HIVE_HOME/conf/hive-site.xml

            这个配置文件比较长,所以参见附录hive-site.xml

   d) 去掉hive-site.xml里的一个特殊字符

    hive.txn.xlock.iow

    true

   

      Ensures commands with OVERWRITE (such as INSERT OVERWRITE) acquire Exclusive locks fortransactional tables.  This ensures that inserts (w/o overwrite) running concurrently

      are not hidden by the INSERT OVERWRITE.

   

 

    把这个for 去掉

        e) 修复hive的bug

            将$HIVE_HOME/bin/hive文件放到share共享目录里,利用windows的notepad打开它,然后把所有的“file://”替换为“file:///”(三个反斜杠,为啥会有这种错误)

            改完后放回到$HIVE_HOME/bin下

        f) 解决guava库版本冲突

                ls /mylab/soft/hadoop-3.2.1/share/hadoop/common/lib/guava*

                        guava-27.0-jre.jar

                ls /mylab/soft/apache-hive-3.1.2-bin/lib/guava*

                        guava-19.0.jar

                删除掉hive下低版本的,复制Hadoop下高版本的到$HIVE_HOME/lib目录

                    rm /mylab/soft/apache-hive-3.1.2-bin/lib/guava-19.0.jar

                    cp /mylab/soft/hadoop-3.2.1/share/hadoop/common/lib/guava-27.0-jre.jar /mylab/soft/apache-hive-3.1.2-bin/lib

          到https://mvnrepository.com/artifact/com.google.guava/guava

           找最新的替换也行

        f) hive-log4j2.properties

 cp $HIVE_HOME/conf/hive-log4j2.properties.template   $HIVE_HOME/conf/hive-log4j2.properties

         g) 修改hive-exec-log4j2.properties

cp $HIVE_HOME/conf/hive-exec-log4j2.properties.template $HIVE_HOME/conf/hive-exec-log4j2.properties


五、验证


        终于可以运行一把了

        格式化数据库

                schematool -dbType mysql -initSchema

        启动hive服务

                hive --service metastore > $HIVE_HOME/working/logs/metastore.log 2>&1 &

        启动hiveserver2服务

                hive --service hiveserver2 > $HIVE_HOME/working/logs/hiveserver2.log 2>&1 &

        或者交互模式

                hive --service metastore

                hive --service hiveserver2

                jps

        查看hive进程

            普通型

                    ps -ef | grep hive 

            精简型(命令很复杂,但结果很精简,输出为:linux用户名 进程ID)    

                    ps -ef | grep hive | grep -v grep | awk '{print $1 " " $2}'

        启动客户端

                方法1

                        hive

                方法2

                                beeline -u jdbc:hive2://master:10000 -n root

                                beeline

                                        !connect  jdbc:hive2://master:10000 -n root

                                退出beeline

                                        !exit

                查看hiveserver2(默认10000端口)

                                netstat -tulnp | grep 10000

                beeline使用详见

                https://www.cnblogs.com/lenmom/p/11218807.html

                https://www.cnblogs.com/xinfang520/p/7684598.html

        停止hive服务

            只能kill掉

            普通型

                        ps -ef | grep hive

                        kill    -9 

            精简型(命令很复杂,但一剑封喉)     

                        ps -ef | grep hive | grep -v grep | awk '{print "kill -9 " $2}' | sh        

六、附录


    1.hive-site.xml

    hive.metastore.uris

    thrift://master:9083

    Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.

 

    hive.exec.scratchdir

    /mylab/soft/apache-hive-3.1.2-bin/working/scratchdir

    HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.

 

    hive.repl.rootdir

    /mylab/soft/apache-hive-3.1.2-bin/working/repl.rootdir

    HDFS root dir for all replication dumps.

 

    hive.repl.cmrootdir

    /mylab/soft/apache-hive-3.1.2-bin/working/repl.cmrootdir

    Root dir for ChangeManager, used for deleted files.

 

    hive.repl.replica.functions.root.dir

    /mylab/soft/apache-hive-3.1.2-bin/working/repl.replica.functions

    Root directory on the replica warehouse where the repl sub-system will store jars from the primary warehouse

 

    hive.exec.local.scratchdir

    /mylab/soft/apache-hive-3.1.2-bin/working/local_scratchdir

    Local scratch space for Hive jobs

 

    hive.downloaded.resources.dir

/mylab/soft/apache-hive-3.1.2-bin/working/downloaded.resources/${hive.session.id}_resources

    Temporary local directory for added resources in the remote file system.

 

    hive.metastore.warehouse.dir

    /mylab/soft/apache-hive-3.1.2-bin/working/metastore.warehouse

    location of default database for the warehouse

 

    hive.aux.jars.path

    /mylab/soft/apache-hive-3.1.2-bin/lib

    The location of the plugin jars that contain implementations of user defined functions and serdes.

 

    hive.querylog.location

    /mylab/soft/apache-hive-3.1.2-bin/working/querylog

    Location of Hive run time structured log file

 

    hive.server2.logging.operation.log.location

    /mylab/soft/apache-hive-3.1.2-bin/working/operation_log

    Top level directory where operation logs are stored if logging functionality is enabled

 

    javax.jdo.option.ConnectionURL

   

    jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true

   

      JDBC connect string for a JDBC metastore.

      To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.

      For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.

   

 

    javax.jdo.option.ConnectionDriverName

   

    com.mysql.cj.jdbc.Driver

    Driver class name for a JDBC metastore

 

    javax.jdo.option.ConnectionUserName

    hive

    Username to use against metastore database

 

    javax.jdo.option.ConnectionPassword

    hive

    password to use against metastore database

 

    hive.server2.thrift.bind.host

    master

    Bind host on which to run the HiveServer2 Thrift service.

 

                hive.server2.thrift.port

                10000

                TCP port number to listen on, default 10000

       

       

                hive.metastore.event.db.notification.api.auth

                false

       

                hive.cli.print.header

                true

       

       

                hive.cli.print.current.db

                true

       


2.参考

https://blog.csdn.net/weixin_43824520/article/details/100580557

https://blog.csdn.net/weixin_45883933/article/details/106843035

你可能感兴趣的:(好玩的大数据之17:Hive安装(Hive-3.1.2))