基于CentOS7的Hadoop3.x完全分布式HA的安装部署

文章目录

  • 准备工作
  • 部署规划
  • 安装配置zookeeper
    • 下载
    • 安装
      • 目录规划
      • 将下载的二进制包移动到/usr/local/hadoop目录,解压缩文件包
      • 配置
    • 设置环境变量
    • 启动
  • 安装配置hadoop
    • 新建hadoop用户以及用户组,并赋予sudo免密码权限
    • 目录规划
    • 下载、解压
    • 配置环境变量
    • 配置
    • 复制Hadoop配置好的包到其他5台服务器
  • 启动zookeeper集群
  • 启动journalnode
  • 格式化HDFS(仅第一次启动执行)
  • 格式化zkfc(仅第一次启动执行)
  • 启动hadoop集群
  • 验证
    • web访问
    • HA的验证
  • 注意
  • 问题及解决
    • namenode无法进行主备切换
    • 运行MR提示找不到或无法加载主类org.apache.hadoop.mapreduce.v2.app.MRAppMaster
    • 运行MR提示WARN No appenders could be found for logger
    • 重启hadoop集群提示Your password has expired
    • 主节点服务器内核crash

准备工作

  1. 配置主机名
  2. 配置IP和主机名之间映射关系
  3. 配置ssh免密码登录
  4. 配置防火墙
  5. 安装jdk

部署规划

ip hostname install software process
10.62.84.37 master hadoop,zookeeper namenode,ResouceManager,ZKFC
10.62.84.38 master2 hadoop,zookeeper,mysql,hive,spark ,hue namenode,ResouceManager,mrHistoryserver,ZKFC,mysql,metastore,hiveserver2,Master,sparkHistoryserver,hue
10.62.84.39 worker1 hadoop,zookeeper,spark datanode,nodeManager,zookeeper,Journalnode ,Worker
10.62.84.40 worker2 hadoop,zookeeper,spark datanode,nodeManager,zookeeper,Journalnode,Worker
10.62.84.41 worker3 hadoop,zookeeper,spark datanode,nodeManager,zookeeper,Journalnode ,Worker
10.62.84.42 worker4 hadoop,spark datanode,nodeManager,Worker

安装配置zookeeper

下载

下载地址:https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/stable/

安装

目录规划

data
|__ zookeeper
	  	|__ data
		|      |__ myid      
		|__ logs

将下载的二进制包移动到/usr/local/hadoop目录,解压缩文件包

tar zxvf apache-zookeeper-3.5.5.tar.gz
mv apache-zookeeper-3.5.5 zookeeper

配置

cd zookeeper/conf
cp zoo_sample.cfg zoo.cfg
vi zoo.cfg

打开文件后,修改配置如下

dataDir=/data/zookeeper/data
# zookeeper的事务日志通过zoo.cfg文件中的dataLogDir配置项配置
dataLogDir=/data/zookeeper/logs

server.1=worker1:2888:3888
server.2=worker2:2888:3888
server.3=worker3:2888:3888

server后面的.数字(不能重复)是当前server节点在该zk集群中的唯一标识
=后面则是对当前server的说明,用":"分隔开,
第一段是当前server所在机器的主机名
第二段和第三段以及2181端口
2181—>zookeeper服务器开放给client连接的端口
2888—>zookeeper服务器之间进行通信的端口
3888—>zookeeper和外部进程进行通信的端口

在dataDir=/data/hadoop/zookeeper/data目录下创建空文件

touch /data/zookeeper/data/myid  
echo 3 > /data/zookeeper/data/myid

日志配置
修改zkEnv.sh

if [ "x${ZOO_LOG_DIR}" = "x" ]
then
    # 服务运行日志输出路径
    ZOO_LOG_DIR="/data/zookeeper/logs"
fi

if [ "x${ZOO_LOG4J_PROP}" = "x" ]
then
    ZOO_LOG4J_PROP="INFO,ROLLINGFILE"
fi

修改log4j.properties

zookeeper.root.logger=INFO,ROLLINGFILE
# 按照日期每天输出logs
log4j.appender.ROLLINGFILE=org.apache.log4j.DailyRollingFileAppender
#log4j.appender.ROLLINGFILE.MaxFileSize=${zookeeper.log.maxfilesize}
#log4j.appender.ROLLINGFILE.MaxBackupIndex=${zookeeper.log.maxbackupindex}

设置环境变量

vi /etc/profile
打开后,在文档最下方添加如下配置:

export ZOOKEEPER_HOME=/usr/local/hadoop/zookeeper
export PATH=$ZOOKEEPER_HOME/bin:$PATH

修改完后,保存退出,执行如下命令,使更改生效
source /etc/profile
然后将配置好的zookeeper拷贝到其他节点(worker1、worker2)
注意:修改worker1、worker2对应/data/hadoop/zookeeper/data/myid的内容
worker1:
echo 1 > /data/zookeeper/data/myid
worker2:
echo 2 > /data/zookeeper/data/myid

启动

在worker1、worker2和worker3上分别执行

zkServer.sh --config /usr/local/hadoop/zookeeper/conf start

安装配置hadoop

我们先在master服务器上解压、配置Hadoop,然后再分发到其他其它服务器上的方式来安装集群。

新建hadoop用户以及用户组,并赋予sudo免密码权限

新建hadoop用户以及用户组,并赋予sudo免密码权限
新建hadoop用户
在root权限下首先新建用户,建议用adduser命令

adduser hadoop
passwd hadoop

输入密码(hadoop)后一直按回车即可,最后输入y确定。
将hadoop用户加入到hadoop用户组
创建hadoop用户的同时也创建了hadoop用户组,下面我们把hadoop用户加入到hadoop用户组

usermod -a -G hadoop hadoop

把hadoop用户赋予root权限,让他可以使用sudo命令

chmod u+w /etc/sudoers # 修改sudoers文件的权限
vi /etc/sudoers

在  root  ALL=(ALL)  ALL  下面添加:

hadoop ALL=(root) NOPASSWD:ALL
chmod u-w /etc/sudoers # 修改sudoers文件的权限

目录规划

data
|__ hadoop
         |__ hdfs
         |      |__ name
         |      |__ data
         |__ tmp
         |__ pids
         |__ logs

下载、解压

将下载的安装包上传到/usr/local/hadoop目录,解压并重命名

cd /usr/local/hadoop
tar zxvf hadoop-3.1.0.tar.gz
mv hadoop-3.1.0 hadoop

配置环境变量

[hadoop@master hadoop]$ vi ~/.bashrc

打开后,在文档最下方添加如下配置:

# hadoop
export HADOOP_HOME=/usr/local/hadoop/hadoop
export PATH=$HADOOP_HOME/bin:$PATH
export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

修改完后,保存退出,执行如下命令,使更改生效

[hadoop@master hadoop]$ source ~/.bashrc

配置

配置hadoop运行环境
配置Hadoop JDK路径,定义集群操作用户,在hadoop-env.sh文件中添加如下内容

export JAVA_HOME=/usr/java/jdk1.8.0_111

export HDFS_NAMENODE_USER=hadoop
export HDFS_DATANODE_USER=hadoop
export HDFS_JOURNALNODE_USER=hadoop
export HDFS_ZKFC_USER=hadoop
export YARN_RESOURCEMANAGER_USER=hadoop
export YARN_NODEMANAGER_USER=hadoop

export HADOOP_PID_DIR=/data/hadoop/pids
export HADOOP_LOG_DIR=/data/hadoop/logs

配置core-site.xml

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://ns1</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>file:/data/hadoop/tmp</value>
    </property>
    <property>
        <name>hadoop.http.staticuser.user</name>
        <value>hadoop</value>
        <description>The user name to filter as, on static web filters while rendering content. </description>
    </property>
    <property>
        <name>hadoop.zk.address</name>
        <value>worker1:2181,worker2:2181,worker3:2181</value>
        <description>Host:Port of the ZooKeeper server to be used.</description>
    </property>    
    <property>
        <name>ha.zookeeper.quorum</name>
        <value>worker1:2181,worker2:2181,worker3:2181</value>
        <description>A list of ZooKeeper server addresses, separated by commas, that are to be used by the ZKFailoverController in automatic failover.</description>
    </property>
</configuration>

配置hdfs-site.xml

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/data/hadoop/hdfs/name</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/data/hadoop/hdfs/data</value>
    </property>
    <property>
        <name>dfs.nameservices</name>
        <value>ns1</value>
        <description>Comma-separated list of nameservices</description>
    </property>
    <property>
        <name>dfs.ha.namenodes.ns1</name>
        <value>nn1,nn2</value>
        <description>a comma-separated list of namenodes for nameservice ns1</description>
    </property>
    <property>
        <name>dfs.namenode.rpc-address.ns1.nn1</name>
        <value>master:8020</value>
        <description>The RPC address for namenode nn1</description>
    </property>
    <property>
        <name>dfs.namenode.http-address.ns1.nn1</name>
        <value>master:9870</value>
        <description>The address and the base port where the dfs namenode nn1 web ui will listen on</description>
    </property>
    <property>
        <name>dfs.namenode.rpc-address.ns1.nn2</name>
        <value>master2:8020</value>
        <description>The RPC address for namenode nn2</description>
    </property>
    <property>
        <name>dfs.namenode.http-address.ns1.nn2</name>
        <value>master2:9870</value>
        <description>The address and the base port where the dfs namenode nn2 web ui will listen on</description>
    </property>
    <property>
        <name>dfs.namenode.shared.edits.dir</name>
        <value>qjournal://worker1:8485;worker2:8485;worker3:8485/ns1</value>
        <description>A directory on shared storage between the multiple namenodes in an HA cluster. This directory will be written by the active and read by the standby in order to keep the namespaces synchronized</description>
    </property>
    <property>
        <name>dfs.journalnode.edits.dir</name>
        <value>/data/hadoop/journal</value>
        <description>The directory where the journal edit files are stored</description>
    </property>
    <property>
        <name>dfs.ha.automatic-failover.enabled</name>
        <value>true</value>
        <description>Whether automatic failover is enabled</description>
    </property>
    <property>
        <name>dfs.client.failover.proxy.provider.ns1</name>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
        <description>The prefix (plus a required nameservice ID) for the class name of the configured Failover proxy provider for the host</description>
    </property>
    <property>
        <name>dfs.ha.fencing.methods</name>
        <value>sshfence</value>
        <description>A list of scripts or Java classes which will be used to fence the Active NameNode during a failover.</description>
    </property>
    <property>
        <name>dfs.ha.fencing.ssh.private-key-files</name>
        <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
    <property>
        <name>dfs.ha.fencing.ssh.connect-timeout</name>
        <value>30000</value>
    </property>
</configuration>

配置mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>master2:10020</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>master2:19888</value>
    </property>
</configuration>

配置yarn-site.xml

<configuration>
    <!-- Configuring the External Shuffle Service -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle,spark_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
        <value>org.apache.spark.network.yarn.YarnShuffleService</value>
    </property>
    <property>
        <name>yarn.resourcemanager.ha.enabled</name>
        <value>true</value>
        <description>Enable RM high-availability</description>
    </property>
    <property>
        <name>yarn.resourcemanager.cluster-id</name>
        <value>rmcluster</value>
        <description>Name of the cluster</description>
    </property>
    <property>
        <name>yarn.resourcemanager.ha.rm-ids</name>
        <value>rm1,rm2</value>
        <description>The list of RM nodes in the cluster when HA is enabled</description>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname.rm1</name>
        <value>master</value>
        <description>The hostname of the rm1</description>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname.rm2</name>
        <value>master2</value>
        <description>The hostname of the rm2</description>
    </property>
    <property>
        <name>yarn.resourcemanager.recovery.enabled</name>
        <value>true</value>
        <description>Enable RM to recover state after starting</description>
    </property>
    <property>
        <name>yarn.resourcemanager.store.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
        <description>The class to use as the persistent store</description>
    </property>
    <!-- YARN-Fair Scheduler. Start -->
    <property>
        <name>yarn.resourcemanager.scheduler.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
    </property>
    <property>
        <name>yarn.scheduler.fair.allocation.file</name>
        <value>/usr/local/hadoop/hadoop/etc/hadoop/fair-scheduler.xml</value>
    </property>
    <property>
        <name>yarn.scheduler.fair.preemption</name>
        <value>true</value>
    </property>
    <property>
        <name>yarn.scheduler.fair.user-as-default-queue</name>
        <value>false</value>
        <description>default is True</description>
    </property>
    <property>
        <name>yarn.scheduler.fair.allow-undeclared-pools</name>
        <value>false</value>
        <description>default is True</description>
    </property>
    <!-- YARN-Fair Scheduler. End -->
    <!-- YARN nodemanager resource config. Start -->
    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>36864</value>
        <description>Physical memory, in MB, to be made available to running containers</description>
    </property>
    <property>
        <name>yarn.nodemanager.resource.cpu-vcores</name>
        <value>24</value>
        <description>Number of CPU cores that can be allocated for containers.</description>
    </property>
    <property>
        <name>yarn.nodemanager.vmem-pmem-ratio</name>
        <value>3</value>
    </property>
    <!-- YARN nodemanager resource config. End -->
    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>
    <property>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>604800</value>
    </property>
    <property>
        <name>yarn.log.server.url</name>
        <value>http://master2:19888/jobhistory/logs</value>
    </property>
    <property>
        <name>yarn.application.classpath</name>
        <value>/usr/local/hadoop/hadoop/etc/hadoop:/usr/local/hadoop/hadoop/share/hadoop/common/lib/*:/usr/local/hadoop/hadoop/share/hadoop/common/*:/usr/local/hadoop/hadoop/share/hadoop/hdfs:/usr/local/hadoop/hadoop/share/hadoop/hdfs/lib/*:/usr/local/hadoop/hadoop/share/hadoop/hdfs/*:/usr/local/hadoop/hadoop/share/hadoop/mapreduce/*:/usr/local/hadoop/hadoop/share/hadoop/yarn:/usr/local/hadoop/hadoop/share/hadoop/yarn/lib/*:/usr/local/hadoop/hadoop/share/hadoop/yarn/*</value>
    </property>
    <!-- 客户端通过该地址向RM提交对应用程序操作 -->
    <property>
        <name>yarn.resourcemanager.address.rm1</name>
        <value>master:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address.rm1</name>  
        <value>master:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address.rm1</name>
        <value>master:8088</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address.rm1</name>
        <value>master:8031</value>
    </property>
    <property>
        <name>yarn.resourcemanager.admin.address.rm1</name>
        <value>master:8033</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address.rm2</name>
        <value>master2:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address.rm2</name>
        <value>master2:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address.rm2</name>
        <value>master2:8088</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address.rm2</name>
        <value>master2:8031</value>
    </property>
    <property>
        <name>yarn.resourcemanager.admin.address.rm2</name>
        <value>master2:8033</value>
    </property>    
</configuration>

配置workers

[hadoop@master hadoop]$ vi workers
worker1
worker2
worker3
worker4

配置fair-scheduler
在/usr/local/hadoop/hadoop/etc/hadoop目录下新建fair-scheduler.xml文件,新增如下内容

<?xml version="1.0"?>
<allocations>
    <defaultQueueSchedulingPolicy>fair</defaultQueueSchedulingPolicy>

    <queue name="prod">
        <weight>40</weight>
    </queue>

    <queue name="dev">
        <weight>60</weight>
    </queue>

    <queuePlacementPolicy>
        <rule name="specified" create="false" />
        <rule name="primaryGroup" create="false" />
        <rule name="default" queue="dev" />
    </queuePlacementPolicy>
</allocations>

复制Hadoop配置好的包到其他5台服务器

scp -r /usr/local/hadoop/hadoop hadoop@master2:/usr/local/hadoop/
scp -r /usr/local/hadoop/hadoop hadoop@worker1:/usr/local/hadoop/
scp -r /usr/local/hadoop/hadoop hadoop@worker2:/usr/local/hadoop/
scp -r /usr/local/hadoop/hadoop hadoop@worker3:/usr/local/hadoop/
scp -r /usr/local/hadoop/hadoop hadoop@worker4:/usr/local/hadoop/

启动zookeeper集群

分别在worker1、worker2、worker3上启动zk

zkServer.sh --config /usr/local/hadoop/zookeeper/conf start

查看状态:一个leader,两个follower

zkServer.sh --config /usr/local/hadoop/zookeeper/conf status

启动journalnode

分别在worker1、worker2、worker3上启动journalnode
注:journalnode为qjournal分布式应用,用来管理edit.log文件,依赖于zk管理,所以将三个node节点放到zk上启动。

hdfs --daemon start journalnode 

运行jps命令检验,worker1、worker2、worker3上多了JournalNode进程

格式化HDFS(仅第一次启动执行)

在master上执行命令

[hadoop@master data]$ hdfs namenode -format

在master上启动namenode

hdfs --daemon start namenode

在master2上同步master namenode元数据

[hadoop@master2 data]$ hdfs namenode -bootstrapStandby

在master上关闭namenode

[hadoop@master hadoop]$ hdfs --daemon stop namenode

格式化zkfc(仅第一次启动执行)

在master上执行即可
:zkfc是用来管理两台namenode切换状态的进程。同样是依赖zk实现。当active namenode状态不正常了,该namenode上的zkfc会将这个状态发动到 zk上,standby namenode上的zkfc会查看到该不正常状态,并向active namenode通过ssh发送一条指令,kill -9 进程号,杀死该进程,并将自己重置成active,防止active假死发生脑裂事件,万一ssh发送失败,也可以启动自定义的.sh脚本文件,强制杀死active namenode进程。

在hadoop3.x中将这样的一对namenode管理关系叫做 federation(联邦)。

并且支持多个federation, 比如配置文件中起名为ns1, 则该ns1中包括 (active namenode)nn1, (standby namenode)nn2 。

[hadoop@master hadoop]$ hdfs zkfc -formatZK

启动hadoop集群

启动HDFS
先关闭journalnode(分别在worker1、worker2、worker3上执行hdfs --daemon stop journalnode),然后在master上执行

[hadoop@master hadoop]$ /usr/local/hadoop/hadoop/sbin/start-dfs.sh
Starting namenodes on [master master2]
Starting datanodes
Starting journal nodes [worker1 worker2 worker3]
Starting ZK Failover Controllers on NN hosts [master master2]

我们可以看到DFSZKFailoverController分别在master、master2上启动起来了。
启动YARN
在master上执行

[hadoop@master hadoop]$ /usr/local/hadoop/hadoop/sbin/start-yarn.sh 
Starting resourcemanagers on [ master master2]
Starting nodemanagers

启动MR的historyserver
在master2上启动MR的historyserver

[hadoop@master2 logs]$ mapred --daemon start historyserver

验证

web访问

HDFS
http://master:9870
http://master2:9870
其中一个是active,一个是standby
YARN
http://master:8088
http://master2:8088
在浏览的时候standby会重定向跳转到active对应的页面

HA的验证

namenode HA
访问
http://master:9870
http://master2:9870
其中一个是active,一个是standby
主备切换验证
在master上kill -9 namenode的进程,这时候
YARN HA
主备切换验证
在master上kill -9 resourcemanager的进程
这时可以访问http://master2:8088
然后在master上重新启动resourcemanager(yarn --daemon start resourcemanager),再访问http://master:8088时就会自动跳转到http://master2:8088

注意

  1. 使用HA的时候,不能启动SecondaryNameNode,会出错

问题及解决

namenode无法进行主备切换

问题表述
namenode HA测试过程中发现nomenode无法进行主备切换
解决方案
在master2上查看zkfc的log日志,发现提示

bash: fuser: command not found

未找到fuster程序,导致无法进行fence,可以通过如下命令来安装

yum install psmisc

注:psmisc软件包含fuster、killall、pstree三个程序,出现上述问题是由于我们在安装centos7的时候选择了最小化安装,默认是不安装psmisc

运行MR提示找不到或无法加载主类org.apache.hadoop.mapreduce.v2.app.MRAppMaster

问题描述

2019-05-29 11:09:37,100 INFO mapreduce.Job: Job job_1559046243109_0004 failed with state FAILED due to: Application application_1559046243109_0004 failed 2 times due to AM Container for appattempt_1559046243109_0004_000002 exited with  exitCode: 1
Failing this attempt.Diagnostics: [2019-05-29 11:08:53.805]Exception from container-launch.
Container id: container_e02_1559046243109_0004_02_000001
Exit code: 1

[2019-05-29 11:08:53.851]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
错误: 找不到或无法加载主类 org.apache.hadoop.mapreduce.v2.app.MRAppMaster

[2019-05-29 11:08:53.852]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
错误: 找不到或无法加载主类 org.apache.hadoop.mapreduce.v2.app.MRAppMaster

解决方案
yarn执行MapReduce任务时,找不到主类导致的,在master上运行

[hadoop@master hadoop]$ hadoop classpath
/usr/local/hadoop/hadoop/etc/hadoop:/usr/local/hadoop/hadoop/share/hadoop/common/lib/*:/usr/local/hadoop/hadoop/share/hadoop/common/*:/usr/local/hadoop/hadoop/share/hadoop/hdfs:/usr/local/hadoop/hadoop/share/hadoop/hdfs/lib/*:/usr/local/hadoop/hadoop/share/hadoop/hdfs/*:/usr/local/hadoop/hadoop/share/hadoop/mapreduce/*:/usr/local/hadoop/hadoop/share/hadoop/yarn:/usr/local/hadoop/hadoop/share/hadoop/yarn/lib/*:/usr/local/hadoop/hadoop/share/hadoop/yarn/*

将上述输出的值添加到$HADOOP_HOME/etc/hadoop/yarn-site.xml文件对应的属性 yarn.application.classpath下面

    <property>
        <name>yarn.application.classpath</name>
        <value>/usr/local/hadoop/hadoop/etc/hadoop:/usr/local/hadoop/hadoop/share/hadoop/common/lib/*:/usr/local/hadoop/hadoop/share/hadoop/common/*:/usr/local/hadoop/hadoop/share/hadoop/hdfs:/usr/local/hadoop/hadoop/share/hadoop/hdfs/lib/*:/usr/local/hadoop/hadoop/share/hadoop/hdfs/*:/usr/local/hadoop/hadoop/share/hadoop/mapreduce/*:/usr/local/hadoop/hadoop/share/hadoop/yarn:/usr/local/hadoop/hadoop/share/hadoop/yarn/lib/*:/usr/local/hadoop/hadoop/share/hadoop/yarn/*</value>
    </property>

然后重启yarn,重新跑MapReduce任务

运行MR提示WARN No appenders could be found for logger

问题描述

18/08/16 17:02:54 INFO mapreduce.Job: Job job_1534406793739_0005 failed with state FAILED due to: Application application_1534406793739_0005 failed 2 times due to AM Container for appattempt_1534406793739_0005_000002 exited with exitCode: 1
Failing this attempt.Diagnostics: [2018-08-16 17:02:48.561]Exception from container-launch.
Container id: container_e27_1534406793739_0005_02_000001
Exit code: 1
[2018-08-16 17:02:48.562]
[2018-08-16 17:02:48.574]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

解决方案
从日志可以看出,发现是由于跑AM的container退出了,并没有为任务去RM获取资源,怀疑是AM和RM通信有问题;一台是备RM,一台活动的RM,在yarn内部,当MR去活动的RM为任务获取资源的时候当然没问题,但是去备RM获取时就会出现这个问题了。
在$HADOOP_HOME/etc/hadoop/yarn-site.xml文件中新增如下配置

    <property>
        <name>yarn.resourcemanager.address.rm1</name>
        <value>master:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address.rm1</name>  
        <value>master:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address.rm1</name>
        <value>master:8088</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address.rm1</name>
        <value>master:8031</value>
    </property>
    <property>
        <name>yarn.resourcemanager.admin.address.rm1</name>
        <value>master:8033</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address.rm2</name>
        <value>master2:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address.rm2</name>
        <value>master2:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address.rm2</name>
        <value>master2:8088</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address.rm2</name>
        <value>master2:8031</value>
    </property>
    <property>
        <name>yarn.resourcemanager.admin.address.rm2</name>
        <value>master2:8033</value>
    </property>

然后重启yarn,重新跑MapReduce任务

重启hadoop集群提示Your password has expired

问题描述

[hadoop@master sbin]$ ./start-all.sh 
WARNING: Attempting to start all Apache Hadoop daemons as hadoop in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [master master2]
Starting datanodes
worker6: WARNING: Your password has expired.
worker6: Password change required but no TTY available.
worker8: WARNING: Your password has expired.
worker5: WARNING: Your password has expired.
worker8: Password change required but no TTY available.
worker5: Password change required but no TTY available.
worker9: WARNING: Your password has expired.
worker9: Password change required but no TTY available.
worker10: WARNING: Your password has expired.
worker10: Password change required but no TTY available.
worker7: WARNING: Your password has expired.
worker7: Password change required but no TTY available.
worker11: WARNING: Your password has expired.
worker14: WARNING: Your password has expired.
worker11: Password change required but no TTY available.
worker14: Password change required but no TTY available.
worker12: WARNING: Your password has expired.
worker12: Password change required but no TTY available.
worker13: WARNING: Your password has expired.
worker13: Password change required but no TTY available.
Starting journal nodes [worker1 worker2 worker3]
Starting ZK Failover Controllers on NN hosts [master master2]
Starting resourcemanagers on [ master master2]
Starting nodemanagers
worker5: WARNING: Your password has expired.
worker5: Password change required but no TTY available.
worker9: WARNING: Your password has expired.
worker6: WARNING: Your password has expired.
worker9: Password change required but no TTY available.
worker6: Password change required but no TTY available.
worker10: WARNING: Your password has expired.
worker10: Password change required but no TTY available.
worker8: WARNING: Your password has expired.
worker8: Password change required but no TTY available.
worker7: WARNING: Your password has expired.
worker7: Password change required but no TTY available.
worker11: WARNING: Your password has expired.
worker11: Password change required but no TTY available.
worker14: WARNING: Your password has expired.
worker14: Password change required but no TTY available.
worker12: WARNING: Your password has expired.
worker12: Password change required but no TTY available.
worker13: WARNING: Your password has expired.
worker13: Password change required but no TTY available.

解决方案
从错误提示信息来看,是linux用户密码过期,解决办法如下
1、查看用户密码是否已过期

[root@worker6 ~]# chage -l hadoop
最近一次密码修改时间					:9月 03, 2019
密码过期时间					:12月 02, 2019
密码失效时间					:从不
帐户过期时间						:从不
两次改变密码之间相距的最小天数		:2
两次改变密码之间相距的最大天数		:90
在密码过期之前警告的天数	:14

2、修改用户的过期时间,修改过后就可以了,无需其他的操作

[root@worker6 ~]# chage -M 3600 hadoop
[root@worker6 ~]# chage -l hadoop
最近一次密码修改时间					:9月 03, 2019
密码过期时间					:7月 12, 2029
密码失效时间					:从不
帐户过期时间						:从不
两次改变密码之间相距的最小天数		:2
两次改变密码之间相距的最大天数		:3600
在密码过期之前警告的天数	:14

主节点服务器内核crash

解决方案
在主节点服务器中重启NameNode、DFSZKFailoverController、ResourceManager进程

hdfs --daemon start namenode
hdfs --daemon start zkfc
yarn --daemon start resourcemanager

你可能感兴趣的:(hadoop)