Hadoop&Hive环境搭建(附以配置好环境的虚拟机下载链接)

最近想去学一下Hive,结果发现在搭建环境这一步花了好大一笔时间才搞定,然而实际上多数人在工作时是不需要自己搭建环境的。因此我把自己已经搭建好(Java&Hadoop&MySQL&Hive)环境的虚拟机分享出来供小伙伴们直接使用,同时也把搭建过程记录的内容分享在下面。

  • 系统下载–>百度网盘,提取码:xkuy

由于网盘限制,文件采用分卷压缩的形式上传。OVF目录下为虚拟机导出文件,需要重新配置网卡信息;VirtualBox_VMs目录以及Virtual_Machines目录下为分别在VirtualBox 以及 VMware Workstation 下创建的Linux虚拟机的完整工作目录,应当不需要配置网卡。系统内所有密码均为Hadoop。Hadoop采用伪分布式,所以压缩包内只有一台虚拟机。虚拟机环境在VMware Workstation 16 上搭建,理论上VMware Workstation以及Oracle VM VirtualBox均可加载。


  • 具体环境–>Ubuntu 20.04
Software Version Software Version
Java 1.8 Hadoop 2.7.1
MySQL 8.0 Hive 2.8.3

  • 搭建过程

    文章目录

    • 方向键乱码
    • 安装jdk1.8
      • 下载&安装
      • 配置环境
      • 测试java环境
    • Hadoop
      • 创建Hadoop用户(选做)
      • 配置ssh无密码登录
      • 下载&安装 Hadoop
      • 配置Hadoop
        • 1. 配置hadoop-env.sh,
        • 2. 配置core-site.xml
        • 3. 配置hdfs-site.xml
        • 4. 配置mapred-site.xml
        • 5. 配置yarn-site.xml
      • 启动Hadoop
    • Hive
      • Hive2.3.8安装
      • 启动Hive
        • 初始化默认derby数据库(如果使用MySQL则跳过这步)
        • 连接MySQL(8.0)数据库

方向键乱码

sudo gedit /etc/vim/vimrc.tiny
  • 进行如下设置
    set nocompatible
    set backspace=2

安装jdk1.8

下载&安装

  • 下载,使用root用户 --> # <–目录:/opt
    wget --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie"  http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.tar.gz
    
  • 解压,使用root用户 --> # <–目录:/opt
    tar -zxvf jdk-8u131-linux-x64.tar.gz    # 解压
    mv jdk-8u131-linux-x64/ jdk             # 改名
    

配置环境

  • 编辑profile使用root用户 --> # <–
    vi /etc/profile
    
  • 进行如下设置
    export JAVA_HOME=/opt/jdk
    export JRE_HOME=${JAVA_HOME}/jre
    export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib:$CLASSPATH
    export JAVA_PATH=${JAVA_HOME}/bin:${JRE_HOME}/bin
    export PATH=$PATH:${JAVA_PATH}
  • 使环境变量生效,使用root用户 --> # <–
    source /etc/profile
    

测试java环境

java -version

java version “1.8.0_131”
Java™ SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot™ 64-Bit Server VM (build 25.131-b11, mixed mode)

Hadoop

创建Hadoop用户(选做)

因为后面配置免密登录、添加权限等操作都是以hadoop用户名为例,所以这里提一下,如果不是使用hadoop用户的话记得后边追加权限时修改对应的用户名

sudo useradd -m hadoop -s /bin/bash # 创建用户名为hadoop的用户
sudo passwd hadoop                  # 设置hadoop用户的密码
sudo adduser hadoop sudo            # 对hadoop用户追加管理员权限
sudo chown -R hadoop /opt          # 为hadoop用户添加/opt目录读写权限

配置ssh无密码登录

  • 安装 ssh服务
    sudo apt-get install openssh-server
    
  • 测试从localhost登录,使用hadoop用户 --> $ <–,此时应当需要输入密码才能登录
    ssh localhost
    
    exit
    

    logout
    Connection to localhost closed.

  • 创建密钥,使用hadoop用户 --> $ <–目录:
    cd ~/.ssh/
    ssh-keygen -t rsa
    cat ./id_rsa.pub >> ./authorized_keys   
    
  • 测试从localhost登录,使用hadoop用户 --> $ <–,此时可以直接登录,无需输入密码
    ssh localhost
    
    exit
    

下载&安装 Hadoop

  • 下载,使用root用户 --> # <–目录:/opt,其他版本下载地址
    wget http://archive.apache.org/dist/hadoop/core/hadoop-2.7.1/hadoop-2.7.1.tar.gz
    tar -zxvf hadoop-2.7.1.tar.gz       # 解压
    mv hadoop-2.7.1/ hadoop             # 改名
    rm -f hadoop-2.7.1.tar.gz           # 删除下载的安装包
    chown -R hadoop ./hadoop            # 修改目录权限
    
  • 创建目录,使用hadoop用户 --> $ <–
    mkdir /opt/hadoop/tmp               # 创建目录
    mkdir /opt/hadoop/hdfs
    mkdir /opt/hadoop/hdfs/data
    mkdir /opt/hadoop/hdfs/name
    
    如果是在root用户下创建目录,则需要为hadoop用户追加读写权限
    chown -R hadoop /opt/hadoop
  • 设置环境变量,使用hadoop用户 --> $ <–,(此环境变量仅对hadoop用户生效)
    vi ~/.bash_profile
    
  • 进行如下配置
    export HADOOP_HOME=/opt/hadoop
    export PATH=$PATH:$HADOOP_HOME/bin
  • 使环境变量生效,使用hadoop用户 --> $ <–
    source ~/.bash_profile
    
  • 测试环境变量是否有效
    hadoop version
    

    Hadoop 2.7.1
    Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 15ecc87ccf4a0228f35af08fc56de536e6ce657a
    Compiled by jenkins on 2015-06-29T06:04Z
    Compiled with protoc 2.5.0
    From source with checksum fc0a1a23fc1868e4d5ee7fa2b28a58a
    This command was run using /opt/hadoop/share/hadoop/common/hadoop-common-2.7.1.jar

配置Hadoop

以下内容均可在hadoop用户下执行 --> $ <–

1. 配置hadoop-env.sh,

vi /opt/hadoop/etc/hadoop/hadoop-env.sh

export JAVA_HOME=${JAVA_HOME}修改为jdk的绝对路径export JAVA_HOME=/opt/jdk

这里也可以不设置,但有时在启动NameNode时会报错
Error: JAVA_HOME is not set and could not be found.
所以还是先设置一下

2. 配置core-site.xml

vi /opt/hadoop/etc/hadoop/core-site.xml

添加如下内容

<configuration>
    <property>
        <name>fs.default.namename>
        <value>hdfs://localhost:9000value>
        <description>HDFS的URI,文件系统://namenode标识:端口号description>
    property>
    <property>
        <name>hadoop.tmp.dirname>
        <value>/opt/hadoop/tmpvalue>
        <description>namenode上本地的hadoop临时文件夹description>
    property>
    <property>
	    <name>hadoop.proxyuser.hadoop.hostsname>
	    <value>*value>
	property>
	<property>
	    <name>hadoop.proxyuser.hadoop.groupsname>
	    <value>*value>
	property>
configuration>

3. 配置hdfs-site.xml

vi /opt/hadoop/etc/hadoop/hdfs-site.xml

添加如下内容

<configuration>
    <property>
        <name>dfs.name.dirname>
        <value>/opt/hadoop/hdfs/namevalue>
        <description>namenode上存储hdfs名字空间元数据 description> 
    property>
    
    <property>
        <name>dfs.data.dirname>
        <value>/opt/hadoop/hdfs/datavalue>
        <description>datanode上数据块的物理存储位置description>
    property>
    
    <property>
        <name>dfs.replicationname>
        <value>1value>
        <description>副本个数,配置默认是3,应小于datanode机器数量description>
    property>
    <property>
        <name>dfs.http.addressname>
        <value>0.0.0.0:50070value>
    property>
configuration>

4. 配置mapred-site.xml

vi /opt/hadoop/etc/hadoop/mapred-site.xml

添加如下内容

<configuration>
    <property>
            <name>mapreduce.framework.namename>
            <value>yarnvalue>
    property>
configuration>

5. 配置yarn-site.xml

vi /opt/hadoop/etc/hadoop/yarn-site.xml

添加如下内容

<configuration>
    <property>
        <name>yarn.nodemanager.aux-servicesname>
        <value>mapreduce_shufflevalue>
        property>
configuration>

启动Hadoop

  • 格式化 NameNode

    /opt/hadoop/bin/hdfs namenode -format
    

    21/06/05 07:41:01 INFO common.Storage: Storage directory /opt/hadoop/hdfs/name has been successfully formatted.
    21/06/05 07:41:01 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
    21/06/05 07:41:01 INFO util.ExitUtil: Exiting with status 0
    21/06/05 07:41:01 INFO namenode.NameNode: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.1.1
    ************************************************************/

  • 启动 NameNode

    /opt/hadoop/sbin/start-dfs.sh
    

    Starting namenodes on [localhost]
    localhost: starting namenode, logging to /opt/hadoop/logs/hadoop-hadoop-namenode-ubuntu.out
    localhost: starting datanode, logging to /opt/hadoop/logs/hadoop-hadoop-datanode-ubuntu.out
    Starting secondary namenodes [0.0.0.0]
    0.0.0.0: starting secondarynamenode, logging to /opt/hadoop/logs/hadoop-hadoop-secondarynamenode-ubuntu.out

  • 启动 Yarn

    /opt/hadoop/sbin/start-yarn.sh
    

    starting yarn daemons
    starting resourcemanager, logging to /opt/hadoop/logs/yarn-hadoop-resourcemanager-ubuntu.out
    localhost: starting nodemanager, logging to /opt/hadoop/logs/yarn-hadoop-nodemanager-ubuntu.out

  • 查看是否正常启动

    jps
    

    49121 NodeManager
    49329 Jps
    48546 DataNode
    48995 ResourceManager
    48730 SecondaryNameNode
    48395 NameNode

    • Web中访问http://localhost:50070可以查看 NameNode、Datanode、HDFS的相关信息。
    • Web中访问http://localhost:8088可以查看任务运行情况

Hive

Hive2.3.8安装

  • 其他版本下载地址
  • 下载&安装,使用root用户 --> # <–目录:/opt
    wget http://archive.apache.org/dist/hive/hive-2.3.8/apache-hive-2.3.8-bin.tar.gz
    tar -zxvf apache-hive-2.3.8-bin.tar.gz  # 解压
    mv apache-hive-2.3.8-bin/ hive          # 改名
    rm -f apache-hive-2.3.8-bin.tar.gz      # 删除下载的tar包
    chown -R hadoop /opt/hive               # 为hadoop用户添加读写权限
    
  • 配置,使用hadoop用户即可 --> $ <–
    mv /opt/hive/conf/hive-env.sh.template /opt/hive/conf/hive-env.sh
    vi /opt/hive/conf/hive-env.sh
    
    追加以下两行,即hadoop的路径以及hive的配置文件路径
    export HADOOP_HOME=/opt/hadoop
    export HIVE_CONF_DIR=/opt/hive/conf
    启动Hadoop,一定记得启动Hadoop
    /opt/hadoop/sbin/start-dfs.sh
    /opt/hadoop/sbin/start-yarn.sh
    
    创建相关目录,附加相关权限,(这步必须驱动hadoop后执行)
    /opt/hadoop/bin/hadoop fs -mkdir /tmp
    /opt/hadoop/bin/hadoop fs -mkdir -p /user/hive/warehouse
    /opt/hadoop/bin/hadoop fs -chmod g+w /tmp
    /opt/hadoop/bin/hadoop fs -chmod g+w /user/hive/warehouse
    

启动Hive

初始化默认derby数据库(如果使用MySQL则跳过这步)

/opt/hive/bin/schematool -initSchema -dbType derby
/opt/hive/bin/hive

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/opt/hive/lib/hive-common-2.3.8.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive>

连接MySQL(8.0)数据库

  • 安装MySQL

    wget https://dev.mysql.com/get/mysql-apt-config_0.8.17-1_all.deb
    sudo dpkg -i mysql-apt-config_0.8.17-1_all.deb
    

    选择(其他选择OK):
    MySQL Server & Cluster (Currently selected: mysql 8.0)
    mysql-8.0

    sudo apt update
    sudo apt install mysql-server
    

    Use Legacy Authentication Method (Retain MySQL 5.x ...

  • 配置Metastore到MySQL

    vi /opt/hive/conf/hive-site.xml
    

    添加一以下内容

    <configuration>
            <property>
              <name>javax.jdo.option.ConnectionURLname>
              <value>jdbc:mysql://localhost:3306/hive?useUnicode=true&characterEncoding=utf-8&useSSL=false&serverTimezone=GMT&createDatabaseIfNotExist=truevalue>
              <description>JDBC connect string for a JDBC metastoredescription>
            property>
    
            <property>
              <name>javax.jdo.option.ConnectionDriverNamename>
              <value>com.mysql.cj.jdbc.Drivervalue>
              <description>Driver class name for a JDBC metastoredescription>
            property>
    
            <property>
              <name>javax.jdo.option.ConnectionUserNamename>
              <value>rootvalue>
              <description>username to use against metastore databasedescription>
            property>
    
            <property>
              <name>javax.jdo.option.ConnectionPasswordname>
              <value>hadoopvalue>
              <description>password to use against metastore databasedescription>
            property>
    configuration>
    
    
  • 安装驱动 /opt/hive目录

    wget https://downloads.mysql.com/archives/get/p/3/file/mysql-connector-java-8.0.11.tar.gz
    tar -zxvf mysql-connector-java-8.0.11.tar.gz
    mv /opt/hive/mysql-connector-java-8.0.11/mysql-connector-java-8.0.11.jar /opt/hive/lib/mysql-connector-java-8.0.11.jar
    rm -f /opt/hive/mysql-connector-java-8.0.11.tar.gz
    rm -rf /opt/hive/mysql-connector-java-8.0.11
    
  • 初始化 /opt/hive/bin目录

    ./schematool -dbType mysql -initSchema
    

    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
    Metastore connection URL: jdbc:mysql://localhost:3306/hive?useUnicode=true&characterEncoding=utf-8&useSSL=false&serverTimezone=GMT&createDatabaseIfNotExist=true
    Metastore Connection Driver : com.mysql.cj.jdbc.Driver
    Metastore connection User: root
    Starting metastore schema initialization to 2.3.0
    Initialization script hive-schema-2.3.0.mysql.sql
    Initialization script completed
    schemaTool completed

  • 设置环境变量,使用hadoop用户 --> $ <–,(此环境变量仅对hadoop用户生效)

    vi ~/.bash_profile
    
  • 进行如下配置
    export HIVE_HOME=/opt/hive
    export PATH=$PATH:$HIVE_HOME/bin

  • 使环境变量生效,使用hadoop用户 --> $ <–

    source ~/.bash_profile
    
  • 启动hive

    hive
    

    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

    Logging initialized using configuration in jar:file:/opt/hive/lib/hive-common-2.3.8.jar!/hive-log4j2.properties Async: true
    Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    hive>

你可能感兴趣的:(笔记,大数据,hadoop,hive,mysql)