在hadoop上搭建hive环境

目录

      • 一、下载安装包并解压
      • 二、配置环境变量
      • 三、安装Mysql
      • 四、配置hive-site.xml
      • 五、配置hive-env.sh
      • 六、初始化数据库并启动hive
      • 七、启动和停止脚本
      • 七、问题记录

一、下载安装包并解压

在官网下载最新版的hive包,apache-hive-3.1.3-bin.tar.gz,并进行解压

tar -zvxf apache-hive-3.1.3-bin.tar.gz

二、配置环境变量

加入环境变量

#编辑文件,该文件是我们在hadoop中搭建是创建的,可以直接在里面添加
vim /etc/profile.d/hadoop.sh

export HIVE_HOME=/home/zxhy/hadoop-3.3.3/app/apache-hive-3.1.3-bin
export PATH=$PATH:$HIVE_HOME/bin

#刷新参数
source /etc/profile

三、安装Mysql

大家自行寻找教程安装就可以了,我是用docker安装的,分享一个dockerfile和compose的配置吧
dockerfile文件

# 基础镜像
FROM mysql:5.7
# author
MAINTAINER zxhy
# 修改时区
RUN ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
RUN echo 'Asia/Shanghai' >/etc/timezone

compose文件

version : '3.8'
services:
  zxhy-mysql:
    container_name: zxhy-mysql
    image: mysql:5.7
    build:
      context: ./mysql
    ports:
      - "3306:3306"
    logging:
      driver: json-file
      options:
        max-size: "20M"
        max-file: "5"
    volumes:
      - ./mysql/conf:/etc/mysql/conf.d
      - ./mysql/logs:/logs
      - ./mysql/data:/var/lib/mysql
    command: [
          'mysqld',
          '--innodb-buffer-pool-size=80M',
          '--character-set-server=utf8mb4',
          '--collation-server=utf8mb4_unicode_ci',
          '--default-time-zone=+8:00',
          '--lower-case-table-names=1'
        ]
    environment:
      MYSQL_DATABASE: 'zx-cloud'
      MYSQL_ROOT_PASSWORD: 111111

下载mysql的连接jar,复制到lib目录下/home/zxhy/hadoop-3.3.3/app/apache-hive-3.1.3-bin/lib/mysql-connector-java-8.0.27.jar

四、配置hive-site.xml

#进入配置目录
cd /home/zxhy/hadoop-3.3.3/app/apache-hive-3.1.3-bin/conf
#用默认模板复制一个配置文件
cp hive-default.xml.template hive-site.xml
#然后编辑文件
vi hive-site.xml

修改hive-site.xml文件中主要的一些配置项。

<configuration>
	<property>
		<name>hive.metastore.warehouse.dirname>
	    <value>/home/zxhy/hadoop-3.3.3/data/hive/warehousevalue>
	    <description>location of default database for the warehousedescription>
	property>
	<property>
	    <name>javax.jdo.option.ConnectionURLname>
	    <value>jdbc:mysql://192.168.1.253:3306/zx_hive?createDatabaseIfNotExist=truevalue>
	    <description>JDBC connect string for a JDBC metastoredescription>
	property>
	<property>
	    <name>javax.jdo.option.ConnectionDriverNamename>
	    <value>com.mysql.cj.jdbc.Drivervalue>
	    <description>Driver class name for a JDBC metastoredescription>
	property>
	<property>
	    <name>javax.jdo.option.ConnectionUserNamename>
	    <value>rootvalue>
	    <description>Username to use against metastore databasedescription>
	property>
	<property>
	    <name>javax.jdo.option.ConnectionPasswordname>
	    <value>111111value>
	    <description>password to use against metastore databasedescription>
	property>
	<property>
	    <name>datanucleus.schema.autoCreateAllname>
	    <value>truevalue>
	    <description>Auto creates necessary schema on a startup if one doesn't exist. Set this to false, after creating it once.To enable auto create also set hive.metastore.schema.verification=false. Auto creation is not recommended for production use cases, run schematool command instead.description>
	property>
configuration>

五、配置hive-env.sh

#用默认模板复制一个配置文件
cp hive-env.sh.template hive-env.sh
#然后编辑文件
vi hive-env.sh

如果没在/etc/profile.d/hadoop.sh里面设置过JAVA_HOME、HIVE_HOME、HADOOP_HOME等环境变量,则在文件头加上下面的环境变量

export JAVA_HOME=/opt/jdk1.8.0_202
export HIVE_HOME=/home/zxhy/hadoop-3.3.3/app/apache-hive-3.1.3-bin
export HADOOP_HOME=/home/zxhy/hadoop-3.3.3

六、初始化数据库并启动hive

#初始化数据库
schematool -dbType mysql -initSchema
#启动hive
nohup hiveserver2 -hiveconf hive.execution.engine=mr &
#使用测试连接是否正常
beeline -u jdbc:hive2://localhost:10000 -n root

七、启动和停止脚本

在之前的整体启动集群中加入hive的启动和停止脚本
在bin目录下创建启动脚本hive-start.sh,内容如下:

#! /bin/bash
while true 
do
	monitor=`hdfs dfsadmin -safemode get`
	if [[ $monitor =~ "OFF" ]]
	then
		nohup ${HIVE_HOME}/bin/hiveserver2 1>/dev/null 2>&1 &
		break
	else
		echo "hdfs Safe mode is ON"
	fi
	sleep 5
done

在bin目录下创建启动脚本hive-stop.sh,内容如下:

#!/bin/bash

jps | grep RunJar | awk '{print $1}' | xargs kill -9

修改之前的hadoop启动脚本吗,添加hive的启动和停止脚本,完整脚本如下

#!/bin/bash

if [ $# -lt 1 ]
then
    echo "No Args Input..."
    exit ;
fi

this="${BASH_SOURCE-$0}"
bin=$(cd -P -- "$(dirname -- "${this}")" >/dev/null && pwd -P)

if [[ -n "${HADOOP_HOME}" ]]; then
  HADOOP_HOME_DIR="${HADOOP_HOME}"
else
  HADOOP_HOME_DIR="${bin}/../"
fi

case $1 in
"start")
        echo " =================== 启动 hadoop集群 ==================="

        echo " --------------- 启动 hdfs ---------------"
        ssh hadoop01 "${HADOOP_HOME_DIR}/sbin/start-dfs.sh"
        echo " --------------- 启动 yarn ---------------"
        ssh hadoop01 "${HADOOP_HOME_DIR}/sbin/start-yarn.sh"
        echo " --------------- 启动 historyserver ---------------"
        ssh hadoop01 "${HADOOP_HOME_DIR}/bin/mapred --daemon start historyserver"
		echo " --------------- 启动 zookeeper ---------------"
		for host in hadoop01 hadoop02 hadoop03
		do
			echo starting $host zookeeper
			ssh $host "${ZK_HOME}/bin/zkServer.sh start"
		done
		echo " --------------- 启动 hbase ---------------"
        ssh hadoop01 "${HBASE_HOME}/bin/start-hbase.sh"
	    echo " --------------- 启动 hive ---------------"
        ssh hadoop01 "${HIVE_HOME}/bin/hive-start.sh"
;;
"stop")
        echo " =================== 关闭 hadoop集群 ==================="
		echo " --------------- 关闭 hive ---------------"
        ssh hadoop01 "jps | grep RunJar | awk '{print $1}' | xargs kill -9"
		echo " --------------- 关闭 hbase ---------------"
        ssh hadoop01 "${HBASE_HOME}/bin/stop-hbase.sh"
		echo " --------------- 关闭 zookeeper ---------------"
		for host in hadoop01 hadoop02 hadoop03
		do
			echo stopping $host zookeeper
			ssh $host "${ZK_HOME}/bin/zkServer.sh stop"
		done
        echo " --------------- 关闭 historyserver ---------------"
        ssh hadoop01 "${HADOOP_HOME_DIR}/bin/mapred --daemon stop historyserver"
        echo " --------------- 关闭 yarn ---------------"
        ssh hadoop01 "${HADOOP_HOME_DIR}/sbin/stop-yarn.sh"
        echo " --------------- 关闭 hdfs ---------------"
        ssh hadoop01 "${HADOOP_HOME_DIR}/sbin/stop-dfs.sh"
;;
*)
    echo "Input Args Error..."
;;
esac

七、问题记录

1、注意数据库名称中最好不要包含特殊符号,比如"-"或者其他特别的符号,下划线是可以的。
2、一定把连接数据库的mysql-connector-java-8.0.27.jar放入到lib中,不然连接数据库会报错。
3、jar冲突问题,比如log4j的文件冲突报错,我们把hive/lib下面的删除掉就可以了。

SLF4J: Found binding in [jar:file:/home/zxhy/hadoop-3.3.3/app/apache-hive-3.1.3-bin/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/zxhy/hadoop-3.3.3/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]

4、使用beeline -u jdbc:hive2://localhost:10000 -n root 报错:User: root is not allowed to impersonate root
修改hadoop 配置文件 etc/hadoop/core-site.xml,加入如下配置项

<property>
    <name>hadoop.proxyuser.root.hosts</name>
    <value>*</value>
</property>
<property>
    <name>hadoop.proxyuser.root.groups</name>
    <value>*</value>
</property>

5、连接报错:Name node is in safe mode.
运行命令关闭安全模式就好了

hdfs dfsadmin -safemode leave

你可能感兴趣的:(大数据,hive,hadoop,大数据)