Hadoop之——Hadoop3.x运行自带的WordCount报错Container exited with a non-zero exit code 1.

转载请注明出处:https://blog.csdn.net/l1028386804/article/details/93750832

问题:

今天,基于Hadoop3.2.0搭建了Hadoop集群,对NameNode和Yarn做了HA,但是在运行Hadoop自带的WordCount程序时报错了,具体报错信息为:

2019-06-26 16:08:50,513 INFO mapreduce.Job: Job job_1561536344763_0001 failed with state FAILED due to: Application application_1561536344763_0001 failed 2 times due to AM Container for appattempt_1561536344763_0001_000002 exited with  exitCode: 1
Failing this attempt.Diagnostics: [2019-06-26 16:08:48.218]Exception from container-launch.
Container id: container_1561536344763_0001_02_000001
Exit code: 1

[2019-06-26 16:08:48.287]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.


[2019-06-26 16:08:48.288]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.


For more detailed output, check the application tracking page: http://binghe104:8088/cluster/app/application_1561536344763_0001 Then click on links to logs of each attempt.
. Failing the application.

分析与解决:

在网上搜索了半天,基本上都说的是classpath的问题,于是我也先设置下classpath,具体操作如下:

1.查看Yarn的classpath

在命令行输入如下命令查看Yarn的classpath

-bash-4.1$ yarn classpath
/usr/local/hadoop-3.2.0/etc/hadoop:/usr/local/hadoop-3.2.0/share/hadoop/common/lib/*:/usr/local/hadoop-3.2.0/share/hadoop/common/*:/usr/local/hadoop-3.2.0/share/hadoop/hdfs:/usr/local/hadoop-3.2.0/share/hadoop/hdfs/lib/*:/usr/local/hadoop-3.2.0/share/hadoop/hdfs/*:/usr/local/hadoop-3.2.0/share/hadoop/mapreduce/lib/*:/usr/local/hadoop-3.2.0/share/hadoop/mapreduce/*:/usr/local/hadoop-3.2.0/share/hadoop/yarn:/usr/local/hadoop-3.2.0/share/hadoop/yarn/lib/*:/usr/local/hadoop-3.2.0/share/hadoop/yarn/*

注:查看对应的classpath的值

如果上述输出的类环境变量为空,继续下面的步骤。

2.修改mapred-site.xml

添加:

 
    mapreduce.application.classpath
    $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*

3.yarn-site.xml

添加:


    yarn.application.classpath
    $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*

 4.修改环境变量

sudo vim /etc/profile

在文件末尾添加如下信息:

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

然后是系统环境变量生效

source /etc/profile

但是到这里,并没有解决问题!没有解决问题!没有解决问题!没有解决问题!

最终解决:

静下心来好好分析下问题,从日志中可以看出:由于跑AM的container退出了,并没有为任务去RM获取资源,所以,这里怀疑是AM和RM通信有问题;一台是备RM,一台活动的RM,在YARN内部,当MR去活动的RM为任务获取资源的时候没问题,但是去备RM获取时就会出现这个问题。于是找到了解决问题的方向,接下来,就在yarn-site.xml进行相应的配置。

打开yarn-site.xml,添加如下配置:


	yarn.resourcemanager.address.rm1
	binghe103:8032


	yarn.resourcemanager.scheduler.address.rm1  
	binghe103:8030


	yarn.resourcemanager.webapp.address.rm1
	binghe103:8088


	yarn.resourcemanager.resource-tracker.address.rm1
	binghe103:8031


	yarn.resourcemanager.admin.address.rm1
	binghe103:8033


	yarn.resourcemanager.ha.admin.address.rm1
	binghe103:23142



	yarn.resourcemanager.address.rm2
	binghe104:8032


	yarn.resourcemanager.scheduler.address.rm2
	binghe104:8030


	yarn.resourcemanager.webapp.address.rm2
	binghe104:8088


	yarn.resourcemanager.resource-tracker.address.rm2
	binghe104:8031


	yarn.resourcemanager.admin.address.rm2
	binghe104:8033


	yarn.resourcemanager.ha.admin.address.rm2
	binghe104:23142

具体如下图所示:

Hadoop之——Hadoop3.x运行自带的WordCount报错Container exited with a non-zero exit code 1._第1张图片

整个yarn-site.xml的所有配置如下:


	
	   yarn.resourcemanager.ha.enabled
	   true
	
	
	   yarn.resourcemanager.cluster-id
	   yrc
	
	
	   yarn.resourcemanager.ha.rm-ids
	   rm1,rm2
	
	
	   yarn.resourcemanager.hostname.rm1
	   binghe103
	
	
	   yarn.resourcemanager.hostname.rm2
	   binghe104
	
	
	   yarn.resourcemanager.zk-address
	   binghe105:2181,binghe106:2181,binghe107:2181
	
	
	   yarn.nodemanager.aux-services
	   mapreduce_shuffle
	
	
	
		yarn.resourcemanager.address.rm1
		binghe103:8032
	
	
		yarn.resourcemanager.scheduler.address.rm1  
		binghe103:8030
	
	
		yarn.resourcemanager.webapp.address.rm1
		binghe103:8088
	
	
		yarn.resourcemanager.resource-tracker.address.rm1
		binghe103:8031
	
	
		yarn.resourcemanager.admin.address.rm1
		binghe103:8033
	
	
		yarn.resourcemanager.ha.admin.address.rm1
		binghe103:23142
	

	
		yarn.resourcemanager.address.rm2
		binghe104:8032
	
	
		yarn.resourcemanager.scheduler.address.rm2
		binghe104:8030
	
	
		yarn.resourcemanager.webapp.address.rm2
		binghe104:8088
	
	
		yarn.resourcemanager.resource-tracker.address.rm2
		binghe104:8031
	
	
		yarn.resourcemanager.admin.address.rm2
		binghe104:8033
	
	
		yarn.resourcemanager.ha.admin.address.rm2
		binghe104:23142
	

添加完配置之后,将yarn-site.xml上传到集群中的每台服务器上重新运行程序即可。

备注:Yarn RM的HA是部署在主机名为:binghe103和主机名为:binghe104的两台服务器上的,但是yarn-site.xml文件需要上传到集群中的每台服务器上。

 

你可能感兴趣的:(Hadoop,Hadoop生态)