最近想去学一下Hive,结果发现在搭建环境这一步花了好大一笔时间才搞定,然而实际上多数人在工作时是不需要自己搭建环境的。因此我把自己已经搭建好(Java&Hadoop&MySQL&Hive)环境的虚拟机分享出来供小伙伴们直接使用,同时也把搭建过程记录的内容分享在下面。
由于网盘限制,文件采用分卷压缩的形式上传。
OVF
目录下为虚拟机导出文件,需要重新配置网卡信息;VirtualBox_VMs
目录以及Virtual_Machines
目录下为分别在VirtualBox 以及 VMware Workstation 下创建的Linux虚拟机的完整工作目录,应当不需要配置网卡。系统内所有密码均为Hadoop。Hadoop采用伪分布式,所以压缩包内只有一台虚拟机。虚拟机环境在VMware Workstation 16 上搭建,理论上VMware Workstation以及Oracle VM VirtualBox均可加载。
Software | Version | Software | Version |
---|---|---|---|
Java | 1.8 | Hadoop | 2.7.1 |
MySQL | 8.0 | Hive | 2.8.3 |
sudo gedit /etc/vim/vimrc.tiny
set nocompatible
set backspace=2
/opt
wget --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.tar.gz
/opt
tar -zxvf jdk-8u131-linux-x64.tar.gz # 解压
mv jdk-8u131-linux-x64/ jdk # 改名
profile
,使用root用户 --> # <–vi /etc/profile
export JAVA_HOME=/opt/jdk
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib:$CLASSPATH
export JAVA_PATH=${JAVA_HOME}/bin:${JRE_HOME}/bin
export PATH=$PATH:${JAVA_PATH}
source /etc/profile
java -version
java version “1.8.0_131”
Java™ SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot™ 64-Bit Server VM (build 25.131-b11, mixed mode)
因为后面配置免密登录、添加权限等操作都是以hadoop用户名为例,所以这里提一下,如果不是使用hadoop用户的话记得后边追加权限时修改对应的用户名
sudo useradd -m hadoop -s /bin/bash # 创建用户名为hadoop的用户
sudo passwd hadoop # 设置hadoop用户的密码
sudo adduser hadoop sudo # 对hadoop用户追加管理员权限
sudo chown -R hadoop /opt # 为hadoop用户添加/opt目录读写权限
ssh
服务sudo apt-get install openssh-server
localhost
登录,使用hadoop用户 --> $ <–,此时应当需要输入密码才能登录ssh localhost
exit
logout
Connection to localhost closed.
~
,cd ~/.ssh/
ssh-keygen -t rsa
cat ./id_rsa.pub >> ./authorized_keys
localhost
登录,使用hadoop用户 --> $ <–,此时可以直接登录,无需输入密码ssh localhost
exit
/opt
,其他版本下载地址wget http://archive.apache.org/dist/hadoop/core/hadoop-2.7.1/hadoop-2.7.1.tar.gz
tar -zxvf hadoop-2.7.1.tar.gz # 解压
mv hadoop-2.7.1/ hadoop # 改名
rm -f hadoop-2.7.1.tar.gz # 删除下载的安装包
chown -R hadoop ./hadoop # 修改目录权限
mkdir /opt/hadoop/tmp # 创建目录
mkdir /opt/hadoop/hdfs
mkdir /opt/hadoop/hdfs/data
mkdir /opt/hadoop/hdfs/name
如果是在root用户下创建目录,则需要为hadoop用户追加读写权限chown -R hadoop /opt/hadoop
vi ~/.bash_profile
export HADOOP_HOME=/opt/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
source ~/.bash_profile
hadoop version
Hadoop 2.7.1
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 15ecc87ccf4a0228f35af08fc56de536e6ce657a
Compiled by jenkins on 2015-06-29T06:04Z
Compiled with protoc 2.5.0
From source with checksum fc0a1a23fc1868e4d5ee7fa2b28a58a
This command was run using /opt/hadoop/share/hadoop/common/hadoop-common-2.7.1.jar
以下内容均可在hadoop用户下执行 --> $ <–
vi /opt/hadoop/etc/hadoop/hadoop-env.sh
将export JAVA_HOME=${JAVA_HOME}
修改为jdk的绝对路径export JAVA_HOME=/opt/jdk
这里也可以不设置,但有时在启动NameNode时会报错
Error: JAVA_HOME is not set and could not be found.
所以还是先设置一下
vi /opt/hadoop/etc/hadoop/core-site.xml
添加如下内容
<configuration>
<property>
<name>fs.default.namename>
<value>hdfs://localhost:9000value>
<description>HDFS的URI,文件系统://namenode标识:端口号description>
property>
<property>
<name>hadoop.tmp.dirname>
<value>/opt/hadoop/tmpvalue>
<description>namenode上本地的hadoop临时文件夹description>
property>
<property>
<name>hadoop.proxyuser.hadoop.hostsname>
<value>*value>
property>
<property>
<name>hadoop.proxyuser.hadoop.groupsname>
<value>*value>
property>
configuration>
vi /opt/hadoop/etc/hadoop/hdfs-site.xml
添加如下内容
<configuration>
<property>
<name>dfs.name.dirname>
<value>/opt/hadoop/hdfs/namevalue>
<description>namenode上存储hdfs名字空间元数据 description>
property>
<property>
<name>dfs.data.dirname>
<value>/opt/hadoop/hdfs/datavalue>
<description>datanode上数据块的物理存储位置description>
property>
<property>
<name>dfs.replicationname>
<value>1value>
<description>副本个数,配置默认是3,应小于datanode机器数量description>
property>
<property>
<name>dfs.http.addressname>
<value>0.0.0.0:50070value>
property>
configuration>
vi /opt/hadoop/etc/hadoop/mapred-site.xml
添加如下内容
<configuration>
<property>
<name>mapreduce.framework.namename>
<value>yarnvalue>
property>
configuration>
vi /opt/hadoop/etc/hadoop/yarn-site.xml
添加如下内容
<configuration>
<property>
<name>yarn.nodemanager.aux-servicesname>
<value>mapreduce_shufflevalue>
property>
configuration>
格式化 NameNode
/opt/hadoop/bin/hdfs namenode -format
21/06/05 07:41:01 INFO common.Storage: Storage directory /opt/hadoop/hdfs/name has been successfully formatted.
21/06/05 07:41:01 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
21/06/05 07:41:01 INFO util.ExitUtil: Exiting with status 0
21/06/05 07:41:01 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.1.1
************************************************************/
启动 NameNode
/opt/hadoop/sbin/start-dfs.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /opt/hadoop/logs/hadoop-hadoop-namenode-ubuntu.out
localhost: starting datanode, logging to /opt/hadoop/logs/hadoop-hadoop-datanode-ubuntu.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /opt/hadoop/logs/hadoop-hadoop-secondarynamenode-ubuntu.out
启动 Yarn
/opt/hadoop/sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /opt/hadoop/logs/yarn-hadoop-resourcemanager-ubuntu.out
localhost: starting nodemanager, logging to /opt/hadoop/logs/yarn-hadoop-nodemanager-ubuntu.out
查看是否正常启动
jps
49121 NodeManager
49329 Jps
48546 DataNode
48995 ResourceManager
48730 SecondaryNameNode
48395 NameNode
/opt
wget http://archive.apache.org/dist/hive/hive-2.3.8/apache-hive-2.3.8-bin.tar.gz
tar -zxvf apache-hive-2.3.8-bin.tar.gz # 解压
mv apache-hive-2.3.8-bin/ hive # 改名
rm -f apache-hive-2.3.8-bin.tar.gz # 删除下载的tar包
chown -R hadoop /opt/hive # 为hadoop用户添加读写权限
mv /opt/hive/conf/hive-env.sh.template /opt/hive/conf/hive-env.sh
vi /opt/hive/conf/hive-env.sh
追加以下两行,即hadoop的路径以及hive的配置文件路径export HADOOP_HOME=/opt/hadoop
export HIVE_CONF_DIR=/opt/hive/conf
/opt/hadoop/sbin/start-dfs.sh
/opt/hadoop/sbin/start-yarn.sh
创建相关目录,附加相关权限,(这步必须驱动hadoop后执行)/opt/hadoop/bin/hadoop fs -mkdir /tmp
/opt/hadoop/bin/hadoop fs -mkdir -p /user/hive/warehouse
/opt/hadoop/bin/hadoop fs -chmod g+w /tmp
/opt/hadoop/bin/hadoop fs -chmod g+w /user/hive/warehouse
/opt/hive/bin/schematool -initSchema -dbType derby
/opt/hive/bin/hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]Logging initialized using configuration in jar:file:/opt/hive/lib/hive-common-2.3.8.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive>
安装MySQL
wget https://dev.mysql.com/get/mysql-apt-config_0.8.17-1_all.deb
sudo dpkg -i mysql-apt-config_0.8.17-1_all.deb
选择(其他选择OK
):
MySQL Server & Cluster (Currently selected: mysql 8.0)
mysql-8.0
sudo apt update
sudo apt install mysql-server
Use Legacy Authentication Method (Retain MySQL 5.x ...
配置Metastore到MySQL
vi /opt/hive/conf/hive-site.xml
添加一以下内容
<configuration>
<property>
<name>javax.jdo.option.ConnectionURLname>
<value>jdbc:mysql://localhost:3306/hive?useUnicode=true&characterEncoding=utf-8&useSSL=false&serverTimezone=GMT&createDatabaseIfNotExist=truevalue>
<description>JDBC connect string for a JDBC metastoredescription>
property>
<property>
<name>javax.jdo.option.ConnectionDriverNamename>
<value>com.mysql.cj.jdbc.Drivervalue>
<description>Driver class name for a JDBC metastoredescription>
property>
<property>
<name>javax.jdo.option.ConnectionUserNamename>
<value>rootvalue>
<description>username to use against metastore databasedescription>
property>
<property>
<name>javax.jdo.option.ConnectionPasswordname>
<value>hadoopvalue>
<description>password to use against metastore databasedescription>
property>
configuration>
安装驱动 /opt/hive
目录
wget https://downloads.mysql.com/archives/get/p/3/file/mysql-connector-java-8.0.11.tar.gz
tar -zxvf mysql-connector-java-8.0.11.tar.gz
mv /opt/hive/mysql-connector-java-8.0.11/mysql-connector-java-8.0.11.jar /opt/hive/lib/mysql-connector-java-8.0.11.jar
rm -f /opt/hive/mysql-connector-java-8.0.11.tar.gz
rm -rf /opt/hive/mysql-connector-java-8.0.11
初始化 /opt/hive/bin
目录
./schematool -dbType mysql -initSchema
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL: jdbc:mysql://localhost:3306/hive?useUnicode=true&characterEncoding=utf-8&useSSL=false&serverTimezone=GMT&createDatabaseIfNotExist=true
Metastore Connection Driver : com.mysql.cj.jdbc.Driver
Metastore connection User: root
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.mysql.sql
Initialization script completed
schemaTool completed
设置环境变量,使用hadoop用户 --> $ <–,(此环境变量仅对hadoop用户生效)
vi ~/.bash_profile
进行如下配置
export HIVE_HOME=/opt/hive
export PATH=$PATH:$HIVE_HOME/bin
使环境变量生效,使用hadoop用户 --> $ <–
source ~/.bash_profile
启动hive
hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]Logging initialized using configuration in jar:file:/opt/hive/lib/hive-common-2.3.8.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive>