MRv2下作业成功完成后,jobhistory server拒绝链接,解决方案

操作系统:linux14.04

hadoop:hadoop2.2.0

问题描述:最近利用MRv2.0(YARN)进行数据挖掘,作业成功完成后,提示jobhistory server拒绝链接,如下所示:

2015-01-04 18:32:17INFO:[main] -  map 100% reduce 82%
2015-01-04 18:32:21INFO:[main] -  map 100% reduce 100%
2015-01-04 18:32:25INFO:[main] - Job job_1420333234620_0006 completed successfully
2015-01-04 18:32:37INFO:[main] - Counters: 48
	File System Counters
		FILE: Number of bytes read=151965690
		FILE: Number of bytes written=328940451
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=27573775513
		HDFS: Number of bytes written=297551633
		HDFS: Number of read operations=648
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Killed map tasks=6
		Launched map tasks=221
		Launched reduce tasks=1
		Data-local map tasks=112
		Rack-local map tasks=109
		Total time spent by all maps in occupied slots (ms)=49141351
		Total time spent by all reduces in occupied slots (ms)=3734678
	Map-Reduce Framework
		Map input records=972799356
		Map output records=5386844
		Map output bytes=297551633
		Map output materialized bytes=159468566
		Input split bytes=24510
		Combine input records=5386844
		Combine output records=5386844
		Reduce input groups=5379074
		Reduce shuffle bytes=159468566
		Reduce input records=5386844
		Reduce output records=5386844
		Spilled Records=10773688
		Shuffled Maps =215
		Failed Shuffles=0
		Merged Map outputs=215
		GC time elapsed (ms)=3006431
		CPU time spent (ms)=6609050
		Physical memory (bytes) snapshot=35531751424
		Virtual memory (bytes) snapshot=83810611200
		Total committed heap usage (bytes)=33706455040
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	com.PickPoint$Statics
		EVENT_IS_ZAIKE=5386844
		EVENT_STATUS_IS_ZAIKE=5386844
		STATUS_IS_ZAIKE=309698250
	File Input Format Counters 
		Bytes Read=27573751003
	File Output Format Counters 
		Bytes Written=297551633
2015-01-04 18:32:39INFO:[main] - Retrying connect to server: slave1/192.168.1.101:36290. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1 SECONDS)
2015-01-04 18:32:40INFO:[main] - Retrying connect to server: slave1/192.168.1.101:36290. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1 SECONDS)
2015-01-04 18:32:41INFO:[main] - Retrying connect to server: slave1/192.168.1.101:36290. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1 SECONDS)
2015-01-04 18:32:41INFO:[main] - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2015-01-04 18:32:42INFO:[main] - Retrying connect to server: master/192.168.1.100:10020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-01-04 18:32:43INFO:[main] - Retrying connect to server: master/192.168.1.100:10020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-01-04 18:32:44INFO:[main] - Retrying connect to server: master/192.168.1.100:10020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-01-04 18:32:45INFO:[main] - Retrying connect to server: master/192.168.1.100:10020. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-01-04 18:32:46INFO:[main] - Retrying connect to server: master/192.168.1.100:10020. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-01-04 18:32:47INFO:[main] - Retrying connect to server: master/192.168.1.100:10020. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-01-04 18:32:48INFO:[main] - Retrying connect to server: master/192.168.1.100:10020. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-01-04 18:32:49INFO:[main] - Retrying connect to server: master/192.168.1.100:10020. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-01-04 18:32:50INFO:[main] - Retrying connect to server: master/192.168.1.100:10020. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-01-04 18:32:51INFO:[main] - Retrying connect to server: master/192.168.1.100:10020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-01-04 18:32:51INFO:[main] - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2015-01-04 18:32:52INFO:[main] - Retrying connect to server: master/192.168.1.100:10020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-01-04 18:32:53INFO:[main] - Retrying connect to server: master/192.168.1.100:10020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-01-04 18:32:54INFO:[main] - Retrying connect to server: master/192.168.1.100:10020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-01-04 18:32:55INFO:[main] - Retrying connect to server: master/192.168.1.100:10020. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-01-04 18:32:56INFO:[main] - Retrying connect to server: master/192.168.1.100:10020. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-01-04 18:32:57INFO:[main] - Retrying connect to server: master/192.168.1.100:10020. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-01-04 18:32:58INFO:[main] - Retrying connect to server: master/192.168.1.100:10020. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-01-04 18:32:59INFO:[main] - Retrying connect to server: master/192.168.1.100:10020. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-01-04 18:33:00INFO:[main] - Retrying connect to server: master/192.168.1.100:10020. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-01-04 18:33:01INFO:[main] - Retrying connect to server: master/192.168.1.100:10020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-01-04 18:33:01INFO:[main] - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2015-01-04 18:33:02INFO:[main] - Retrying connect to server: master/192.168.1.100:10020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-01-04 18:33:03INFO:[main] - Retrying connect to server: master/192.168.1.100:10020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-01-04 18:33:04INFO:[main] - Retrying connect to server: master/192.168.1.100:10020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-01-04 18:33:05INFO:[main] - Retrying connect to server: master/192.168.1.100:10020. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-01-04 18:33:06INFO:[main] - Retrying connect to server: master/192.168.1.100:10020. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-01-04 18:33:07INFO:[main] - Retrying connect to server: master/192.168.1.100:10020. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-01-04 18:33:08INFO:[main] - Retrying connect to server: master/192.168.1.100:10020. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-01-04 18:33:09INFO:[main] - Retrying connect to server: master/192.168.1.100:10020. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-01-04 18:33:10INFO:[main] - Retrying connect to server: master/192.168.1.100:10020. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-01-04 18:33:11INFO:[main] - Retrying connect to server: master/192.168.1.100:10020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2015-01-04 18:33:12ERROR:[main] - PriviledgedActionException as:hadoop (auth:SIMPLE) cause:java.io.IOException: java.net.ConnectException: Call From master/192.168.1.100 to master:10020 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
Exception in thread "main" java.io.IOException: java.net.ConnectException: Call From master/192.168.1.100 to master:10020 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
	at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:331)
	at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:416)
	at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:522)
	at org.apache.hadoop.mapreduce.Job$1.run(Job.java:314)
	at org.apache.hadoop.mapreduce.Job$1.run(Job.java:311)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
	at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:311)
	at org.apache.hadoop.mapreduce.Job.isSuccessful(Job.java:611)
	at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1301)
	at com.PickPoint.main(PickPoint.java:111)
Caused by: java.net.ConnectException: Call From master/192.168.1.100 to master:10020 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
	at org.apache.hadoop.ipc.Client.call(Client.java:1351)
	at org.apache.hadoop.ipc.Client.call(Client.java:1300)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
	at com.sun.proxy.$Proxy10.getJobReport(Unknown Source)
	at org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getJobReport(MRClientProtocolPBClientImpl.java:133)
	at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:317)
	... 11 more
Caused by: java.net.ConnectException: 拒绝连接
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:547)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:642)
	at org.apache.hadoop.ipc.Client$Connection.access$2600(Client.java:314)
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1399)
	at org.apache.hadoop.ipc.Client.call(Client.java:1318)
	... 19 more

        经过分析,产生上述问题的原因是没有配置mapreduce.jobhistory.address(默认的为0.0.0.0:10020,需要修改),或者是JobHistory server没有启动,首先来了解一下什么是jobhistory server。

        Hadoop自带了一个历史服务器,可以通过历史服务器查看已经运行完的Mapreduce作业记录,比如用了多少个Map、用了多少个Reduce、作业提交时间、作业启动时间、作业完成时间等信息。默认情况下,Hadoop历史服务器是没有启动的,我们可以通过下面的命令来启动Hadoop历史服务器。

$ sbin/mr-jobhistory-daemon.sh   start historyserver

        这样我们就可以在相应机器的19888端口上打开历史服务器的WEB UI界面。可以查看已经运行完的作业情况。历史服务器可以单独在一台机器上启动,主要是通过以下的参数配置(mapred-site.xml):

  mapreduce.jobhistory.address
  
  master:10020



 mapreduce.jobhistory.webapp.address  
 master:19888
这里的master是我的Namenode的主机名。

你可能感兴趣的:(hadoop)