Connecting to ResourceManager at /0.0.0.0:8032

今天在学习spark on yarn调优发现,从spark启动任务出现了 Connecting to ResourceManager at /0.0.0.0:8032的错误。
当前集群配置如下:单核2G内存
node002 namenode,datanode,zookeeper,zkFc,journalnode,resourcemanager,nodemanager,worker,historyserver
node003
namenode,datanode,zookeeper,zkFc,journalnode,resourcemanager,nodemanager,master,historyserver
node004
namenode,datanode,zookeeper,journalnode,nodemanager,worker,historyserver
按照上述启动集群,free -h 发现,可用的内存已经很少了,node002/node003剩余300Mb,node004剩余400Mb。

log文件

在Hadoop集群的webUI可以找到相应的log文件,发现如图提示
在这里插入图片描述

org.apache.hadoop.ipc.Server: IPC Server listener on 8033: starting
2019-09-17 14:44:42,602 INFO org.apache.hadoop.conf.Configuration: found resource yarn-site.xml at file:/opt/hadoop/etc/hadoop/yarn-site.xml
INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root	OPERATION=refreshAdminAcls	TARGET=AdminService	RESULT=SUCCESS
INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Already in standby state
2019-09-17 14:44:42,626 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root	OPERATION=transitionToStandby	TARGET=RM	RESULT=SUCCESS
2019-09-17 14:54:30,189 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler: Release request cache is cleaned up

提示ResourceManager: Already in standby state当前集群运行的情况是
node002 namenode:standby resourcemanager:active
node003 namenode:active resourcemanager:standby
spark任务在node003提交,通过node003:8088可以正常访问yarn webUI,在集群输入yarn node list发现所有的节点连接均属正常,查看yarn rmadmin -getServiceState rm2确实处于standby状态。

解决思路

日志提示在node003提交任务无法连接resourcemanager因为当前resourcemanager处于standby状态,那么交换resource manager状态会怎样?
yarn rmadmin -transitionToStandby rm1 --forcemanual
yarn rmadmin -transitionToActive rm2 --forcemanual
查看状态
yarn rmadmin -getServiceState rm2已经变成了Active,再次执行任务,得到日志信息如下,具体原因尚未知晓。

2019-09-17 15:06:08 INFO  RMProxy:98 - Connecting to ResourceManager at /0.0.0.0:8032
2019-09-17 15:06:09 INFO  Client:54 - Requesting a new application from cluster with 1 NodeManagers
2019-09-17 15:06:09 INFO  Client:54 - Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
2019-09-17 15:06:09 INFO  Client:54 - Will allocate AM container, with 896 MB memory including 384 MB overhead
2019-09-17 15:06:09 INFO  Client:54 - Setting up container launch context for our AM
2019-09-17 15:06:09 INFO  Client:54 - Setting up the launch environment for our AM container
2019-09-17 15:06:09 INFO  Client:54 - Preparing resources for our AM container
2019-09-17 15:06:13 WARN  Client:66 - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
2019-09-17 15:06:19 INFO  Client:54 - Uploading resource file:/tmp/spark-be140bd0-a19a-4642-bf5e-9e1634f4f5a2/__spark_libs__280942556981659362.zip -> hdfs://mycluster/user/root/.sparkStaging/application_1568703940458_0001/__spark_libs__280942556981659362.zip

这里没有再显示连接不上resourcemanager。

yarn ha配置文件

这是没有任何调优的文件

		
                yarn.resourcemanager.ha.enabled
                true
        
        
                yarn.resourcemanager.ha.automatic-failover.enabled
                true
        
        
                yarn.resourcemanager.ha.automatic-failover.embedded
                true
        
		 
                yarn.resourcemanager.cluster-id
                yarn-rm-cluster
        
        
                yarn.resourcemanager.ha.rm-ids
                rm1,rm2
        
        
                yarn.resourcemanager.hostname.rm1
                node002
        
		 
                yarn.resourcemanager.hostname.rm2
                node003
        
        
                yarn.resourcemanager.recovery.enabled
                true
        
        
                yarn.resourcemanager.zk.state-store.address
                node002:2181,node003:2181,node004:2181
        
		  
                yarn.resourcemanager.zk-address
                node002:2181,node003:2181,node004:2181
        
        
                yarn.resourcemanager.address.rm1
                node002:8032
        
        
                yarn.resourcemanager.address.rm2
                node003:8032
        
        
                yarn.resourcemanager.scheduler.address.rm1
                node002:8034
        
		
                yarn.resourcemanager.webapp.address.rm1
                node002:8088
        
        
                yarn.resourcemanager.scheduler.address.rm2
                node003:8034
        
        
                yarn.resourcemanager.webapp.address.rm2
                node003:8088
        
        
                yarn.nodemanager.aux-services
                mapreduce_shuffle
        
		 
                yarn.nodemanager.aux-services.mapreduce_shuffle.class
                org.apache.hadoop.mapred.ShuffleHandler
        

你可能感兴趣的:(Connecting to ResourceManager at /0.0.0.0:8032)