Note04:Hadoop2.7.2 完全分布式安装配置(CentOS6.8)

虚拟机环境准备

  • 3台虚拟机(一个拥有root权限的用户,关闭防火墙、静态ip、主机名称)
  • 修改/etc/hosts文件
  • 创建用户和常用文件夹
  • JDK安装配置
  • 添加脚本 xsync.shxcall.sh
  • SSH 无密登录配置
  • 集群时间同步

Hadoop安装

  • 导入已经编译过压缩包(提取码:j9oe )至/opt/software目录下(因为官方只提供32位的版本,我们需要自己将32位编译成64位)

  • 解压 hadoop 到/opt/module 目录下

[kevin@hadoop112 software]$ tar -zxvf hadoop-2.7.2.tar.gz -C /opt/module/
  • 配置 hadoop 环境变量

先获取 hadoop 路径:

[kevin@hadoop112 hadoop-2.7.2]$ pwd

/opt/module/hadoop-2.7.2

打开/etc/profile.d/myPath.sh 文件:

[kevin@hadoop112 hadoop-2.7.2]$ vim /etc/profile.d/myPath.sh

在 profie 文件最末尾添加 hadoop 路径:

##HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-2.7.2
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin

让修改后的文件生效:

[kevin@hadoop112 hadoop-2.7.2]$ source /etc/profile

重启(如果 hadoop version 可以用就不用重启):

[kevin@hadoop112 hadoop-2.7.2]$ sync
[kevin@hadoop112 hadoop-2.7.2]$ reboot

测试 hadoop 安装成功

[kevin@hadoop112 hadoop-2.7.2]$ hadoop version
Hadoop 2.7.2

集群配置

  • 集群部署规划
hadoop112 hadoop113 hadoop114
HDFS NameNode;DataNode DataNode SecondaryNameNode;DataNode
YARN NodeManager ResourceManager;NodeManager NodeManager
  • 配置集群

配置core-site.xml

    
    
        fs.defaultFS
        hdfs://hadoop112:9000
    

    
    
        hadoop.tmp.dir
        /opt/module/hadoop-2.7.2/data/tmp
    

配置hadoop-env.sh

# 修改JAVA_HOME 路径:
export JAVA_HOME=/opt/module/jdk1.8.0_241

配置hdfs-site.xml

    
    
        dfs.replication
        3
    

    
    
        dfs.namenode.secondary.http-address
        hadoop114:50090
    

配置yarn-env.sh

# 修改JAVA_HOME 路径:
export JAVA_HOME=/opt/module/jdk1.8.0_241

配置yarn-site.xml

    
    
        yarn.nodemanager.aux-services
        mapreduce_shuffle
    

    
    
        yarn.resourcemanager.hostname
        hadoop113
    
    
    
        yarn.log-aggregation-enable
        true
    

    
    
        yarn.log-aggregation.retain-seconds
        604800
    

    
    
        yarn.nodemanager.pmem-check-enabled
        false
    

    
    
        yarn.nodemanager.vmem-check-enabled
        false
    

配置mapred-env.sh

# 修改JAVA_HOME 路径:
export JAVA_HOME=/opt/module/jdk1.8.0_241

配置mapred-site.xml

[kevin@hadoop112 hadoop]$ mv mapred-site.xml.template mapred-site.xml
[kevin@hadoop112 hadoop]$ vim mapred-site.xml
    
    
        mapreduce.framework.name
        yarn
    

配置历史服务器

配置mapred-site.xml

    
    
        mapreduce.jobhistory.address
        hadaoop112:10020
    

    
    
        mapreduce.jobhistory.webapp.address
        hadaoop112:19888
    

配置日志的聚集

日志聚集概念:应用运行完成以后,将程序运行日志信息上传到HDFS系统上。

日志聚集功能好处:可以方便的查看到程序运行详情,方便开发调试。

配置yarn-site.xml (前面已经写入

    
    
        yarn.log-aggregation-enable
        true
    

    
    
        yarn.log-aggregation.retain-seconds
        604800
    

在集群上分发配置好的Hadoop配置文件

[kevin@hadoop112 hadoop]$ xsync.sh /opt/module/hadoop-2.7.2/

查看文件分发情况

[kevin@hadoop113 hadoop]$ cat /opt/module/hadoop-2.7.2/etc/hadoop/core-site.xml

如果集群是第一次启动,需要格式化NameNode(若存在data和logs文件夹,就需要删除)

[kevin@hadoop112 hadoop-2.7.2]$ hadoop namenode -format

后面若想重置HDFS,就删除data和logs文件夹,在格式化NameNode

集群单点启动(后面使用群起集群)

  • 在hadoop112上启动NameNode
[kevin@hadoop112 hadoop-2.7.2]$ hadoop-daemon.sh start namenode
[kevin@hadoop112 hadoop-2.7.2]$ jps
3461 NameNode
3608 Jps
  • 在hadoop102、hadoop103以及hadoop104上分别启动DataNode
[kevin@hadoop112 hadoop-2.7.2]$ hadoop-daemon.sh start datanode
[kevin@hadoop112 hadoop-2.7.2]$ jps
3461 NameNode
3608 Jps
3561 DataNode
[kevin@hadoop113 hadoop-2.7.2]$ hadoop-daemon.sh start datanode
[kevin@hadoop113 hadoop-2.7.2]$ jps
3190 DataNode
3279 Jps
[kevin@hadoop114 hadoop-2.7.2]$ hadoop-daemon.sh start datanode
[kevin@hadoop114 hadoop-2.7.2]$ jps
3237 Jps
3163 DataNode

群起集群配置

  • 配置slaves
[kevin@hadoop112 ~]$cd /opt/module/hadoop-2.7.2/etc/hadoop/
[kevin@hadoop112 hadoop]$ vim slaves

# 清空再添加:
hadoop112
hadoop113
hadoop114

注意:该文件中添加的内容结尾不允许有空格,文件中不允许有空行。

同步所有节点配置文件

[kevin@hadoop112 hadoop]$ xsync.sh slaves

启动HDFS(关闭就是start改为stop)

[kevin@hadoop112 hadoop-2.7.2]$ sbin/start-dfs.sh

再查看每台机器的jps,看是否HDFS是否和上面的规划一致,(不一致就是配置有问题)

启动YARN (先看下面的注意再操作)(关闭就是start改为stop)

[kevin@hadoop113 hadoop-2.7.2]$ sbin/start-yarn.sh

再查看每台机器的jps,看是否YARN是否和上面的规划一致,(不一致就是配置有问题)

注意:NameNode和ResourceManger如果不是同一台机器,不能在NameNode上启动 YARN,应该在ResouceManager所在的机器上启动YARN。这里是在hadoop113

常用Web页面:

HDFS的web端查看

http://hadoop112:50070/explorer.html#/

YARN的Web端查看

http://hadoop113:8088/cluster

SecondaryNameNode的Web端查看

http://hadoop114:50090/status.html

JobHistory的Web端查看

http://hadoop1112:19888/jobhistory

NameNode的Web端查看

http://hadoop112:8042/node


查询集群状态: (查看是否和规划的相对应)

[kevin@hadoop101 ~]$ xcall.sh jps
=================    hadoop101 的jps   ===============
1920 NodeManager
1639 DataNode
1530 NameNode
1995 JobHistoryServer
2124 Jps
=================    hadoop102 的jps   ===============
1448 ResourceManager
1561 NodeManager
1356 DataNode
1900 Jps
=================    hadoop101 的jps   ===============
1680 Jps
1363 DataNode
1430 SecondaryNameNode
1516 NodeManager
[kevin@hadoop101 ~]$

集群基本测试

准备文件 test.txt

[kevin@hadoop112 hadoop-2.7.2]$ touch test.txt
[kevin@hadoop112 hadoop-2.7.2]$ vim test.txt

kevin kevin
Spark Hive Hello

上传文件到集群

[kevin@hadoop112 hadoop-2.7.2]$ hdfs dfs -mkdir -p /user/kevin/input
[kevin@hadoop112 hadoop-2.7.2]$ hdfs dfs -put test.txt /user/kevin/input

查看HDFS的web端

http://hadoop112:50070/explorer.html#/user/kevin/input

执行简单的MapReduce程序(注意:HDFS中 /user/kevin/output文件夹不能已经创建)

[kevin@hadoop112 hadoop-2.7.2]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /user/kevin/input/ /user/kevin/output

查看输出结果

[kevin@hadoop112 hadoop-2.7.2]$ bin/hdfs dfs -cat /user/kevin/output/*

脚本

hadoop-cluster.sh
[kevin@hadoop101 ~]$ cd ~/bin
[kevin@hadoop101 bin]$ touch hadoop-cluster.sh
[kevin@hadoop101 bin]$ chmod 775 hadoop-cluster.sh

编辑 hadoop-cluster.sh

#!/bin/bash

case $1 in
"start"){

    echo "=================       hadoop102正在启动HDFS           ==============="
    ssh kevin@hadoop102 "/opt/module/hadoop-2.7.2/sbin/start-dfs.sh"
    echo "=================       hadoop103正在启动YARN           ==============="
    ssh kevin@hadoop103 "/opt/module/hadoop-2.7.2/sbin/start-yarn.sh"
    echo "=================      hadoop102正在启动历史服务器      ==============="
    ssh kevin@hadoop102 "/opt/module/hadoop-2.7.2/sbin/mr-jobhistory-daemon.sh start historyserver"
};;
"stop"){

    echo "=================      hadoop102正在关闭历史服务器      ==============="
    ssh kevin@hadoop102 "/opt/module/hadoop-2.7.2/sbin/mr-jobhistory-daemon.sh stop historyserver"
    echo "=================        hadoop103 正在关闭YARN         ==============="
    ssh kevin@hadoop103 "/opt/module/hadoop-2.7.2/sbin/stop-yarn.sh"
    echo "=================        hadoop102 正在关闭HDFS          ==============="
    ssh kevin@hadoop102 "/opt/module/hadoop-2.7.2/sbin/stop-dfs.sh"
};;
esac

使用:

# 启动集群
[kevin@hadoop101 ~]$ hadoop-cluster.sh start
# 关闭集群
[kevin@hadoop101 ~]$ hadoop-cluster.sh stop

记录:

# 启动记录
[kevin@node101 ~]$ hadoop-cluster.sh start
=================       node101正在启动HDFS           ===============
Starting namenodes on [node101]
node101: starting namenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-kevin-namenode-node101.out
node101: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-kevin-datanode-node101.out
node102: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-kevin-datanode-node102.out
node103: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-kevin-datanode-node103.out
Starting secondary namenodes [node103]
node103: starting secondarynamenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-kevin-secondarynamenode-node103.out
=================       node102正在启动YARN           ===============
starting yarn daemons
starting resourcemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-kevin-resourcemanager-node102.out
node102: starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-kevin-nodemanager-node102.out
node103: starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-kevin-nodemanager-node103.out
node101: starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-kevin-nodemanager-node101.out
=================      hadoop101正在启动历史服务器      ===============
starting historyserver, logging to /opt/module/hadoop-2.7.2/logs/mapred-kevin-historyserver-node101.out
[kevin@node101 ~]$

# 关闭记录
[kevin@node101 ~]$ hadoop-cluster.sh stop
=================       node101 正在关闭历史服务器      ===============
stopping historyserver
=================        node102 正在关闭YARN         ===============
stopping yarn daemons
stopping resourcemanager
node103: stopping nodemanager
node101: stopping nodemanager
node102: stopping nodemanager
no proxyserver to stop
=================        node101 正在关闭HDFS          ===============
Stopping namenodes on [node101]
node101: stopping namenode
node103: stopping datanode
node101: stopping datanode
node102: stopping datanode
Stopping secondary namenodes [node103]
node103: stopping secondarynamenode
[kevin@node101 ~]$

你可能感兴趣的:(Note04:Hadoop2.7.2 完全分布式安装配置(CentOS6.8))