问题一、启动Hadoop-2.2.0中的yarn时,resourcemanager进程一直没有启动起来。
查看日志文件中的信息tail -n 50 yarn-dell-resourcemanager-master1.log
出现一下异常:
2016-09-09 14:41:09,341 INFO org.apache.hadoop.service.AbstractService: Service ResourceManager failed in state STARTED; cause: org.apache.hadoop.yarn.webapp.WebAppException: Error starting http server
org.apache.hadoop.yarn.webapp.WebAppException: Error starting http server
at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:262)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startWepApp(ResourceManager.java:623)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:655)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:872)
Caused by: java.net.BindException: Port in use: 192.168.1.120:8088
at org.apache.hadoop.http.HttpServer.openListener(HttpServer.java:742)
at org.apache.hadoop.http.HttpServer.start(HttpServer.java:686)
at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:257)
... 4 more
Caused by: java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:444)
at sun.nio.ch.Net.bind(Net.java:436)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
at org.apache.hadoop.http.HttpServer.openListener(HttpServer.java:738)
... 6 more
解决方法:
1. ps aux | grep -i resourcemanager
, 查看主机master中的resourcemanager的进程个数
2. 然后使用 kill -9
3. sbin目录下重启yarn即可复现进行
./stop-yarn.sh ./start-yarn.sh
在主节点master上面即可出现resourcemanager进程
问题二、有时,启动hregionserver后又挂掉了,查看Hbase启动的日志
dell@master1:/usr/local/hbase-0.98.7-hadoop2/logs$ tail -n 100 hbase-dell-regionserver-master1.log
at org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1286)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:862)
at java.lang.Thread.run(Thread.java:745)
2017-01-12 10:02:23,347 FATAL [regionserver60020] regionserver.HRegionServer: ABORTING region server master1,60020,1484186540447: Unhandled: Cannot create directory /hbase/WALs/master1,60020,1484186540447. Name node is in safe mode.
Resources are low on NN. Please add or free up more resources then turn off safe mode manually. NOTE: If you turn off safe mode before adding resources, the NN will immediately return to safe mode. Use "hdfs dfsadmin -safemode leave" to turn safe mode off.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:3355)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3330)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:724)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:502)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59598)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /hbase/WALs/master1,60020,1484186540447. Name node is in safe mode.
Resources are low on NN. Please add or free up more resources then turn off safe mode manually. NOTE: If you turn off safe mode before adding resources, the NN will immediately return to safe mode. Use "hdfs dfsadmin -safemode leave" to turn safe mode off.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:3355)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3330)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:724)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:502)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59598)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
at org.apache.hadoop.ipc.Client.call(Client.java:1347)
at org.apache.hadoop.ipc.Client.call(Client.java:1300)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy14.mkdirs(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy14.mkdirs(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:467)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:294)
at com.sun.proxy.$Proxy15.mkdirs(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2394)
at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2365)
at org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:817)
at org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:813)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:813)
at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:806)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1933)
at org.apache.hadoop.hbase.regionserver.wal.FSHLog.
at org.apache.hadoop.hbase.regionserver.wal.FSHLog.
at org.apache.hadoop.hbase.regionserver.wal.HLogFactory.createHLog(HLogFactory.java:58)
at org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateHLog(HRegionServer.java:1552)
at org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1531)
at org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1286)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:862)
at java.lang.Thread.run(Thread.java:745)
2017-01-12 10:02:23,350 FATAL [regionserver60020] regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: []
2017-01-12 10:02:23,367 INFO [regionserver60020] ipc.RpcServer: Stopping server on 60020
2017-01-12 10:02:23,368 INFO [regionserver60020] regionserver.HRegionServer: Stopping infoServer
2017-01-12 10:02:23,373 INFO [regionserver60020] mortbay.log: Stopped [email protected]:60030
2017-01-12 10:02:23,475 INFO [regionserver60020] snapshot.RegionServerSnapshotManager: Stopping RegionServerSnapshotManager abruptly.
2017-01-12 10:02:23,475 INFO [regionserver60020] regionserver.HRegionServer: aborting server master1,60020,1484186540447
2017-01-12 10:02:23,475 DEBUG [regionserver60020] catalog.CatalogTracker: Stopping catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@58465d50
2017-01-12 10:02:23,475 INFO [regionserver60020] client.HConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x358d3e5582442fb
2017-01-12 10:02:23,485 INFO [regionserver60020] zookeeper.ZooKeeper: Session: 0x358d3e5582442fb closed
2017-01-12 10:02:23,485 INFO [regionserver60020-EventThread] zookeeper.ClientCnxn: EventThread shut down
2017-01-12 10:02:23,488 INFO [regionserver60020] regionserver.HRegionServer: stopping server master1,60020,1484186540447; all regions closed.
2017-01-12 10:02:23,588 INFO [regionserver60020] regionserver.Leases: regionserver60020 closing leases
2017-01-12 10:02:23,588 INFO [regionserver60020] regionserver.Leases: regionserver60020 closed leases
2017-01-12 10:02:23,589 INFO [regionserver60020] regionserver.CompactSplitThread: Waiting for Split Thread to finish...
2017-01-12 10:02:23,589 INFO [regionserver60020] regionserver.CompactSplitThread: Waiting for Merge Thread to finish...
2017-01-12 10:02:23,589 INFO [regionserver60020] regionserver.CompactSplitThread: Waiting for Large Compaction Thread to finish...
2017-01-12 10:02:23,589 INFO [regionserver60020] regionserver.CompactSplitThread: Waiting for Small Compaction Thread to finish...
2017-01-12 10:02:23,636 INFO [regionserver60020] zookeeper.ZooKeeper: Session: 0x558d3e6026242f9 closed
2017-01-12 10:02:23,636 INFO [regionserver60020-EventThread] zookeeper.ClientCnxn: EventThread shut down
2017-01-12 10:02:23,636 INFO [regionserver60020] regionserver.HRegionServer: stopping server master1,60020,1484186540447; zookeeper connection closed.
2017-01-12 10:02:23,636 INFO [regionserver60020] regionserver.HRegionServer: regionserver60020 exiting
2017-01-12 10:02:23,636 ERROR [main] regionserver.HRegionServerCommandLine: Region server exiting
java.lang.RuntimeException: HRegionServer Aborted
at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:66)
at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:85)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2489)
2017-01-12 10:02:23,639 INFO [Thread-10] regionserver.ShutdownHook: Shutdown hook starting; hbase.shutdown.hook=true; fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@68ee3eb2
2017-01-12 10:02:23,640 INFO [Thread-10] regionserver.ShutdownHook: Starting fs shutdown hook thread.
2017-01-12 10:02:23,641 INFO [Thread-10] regionserver.ShutdownHook: Shutdown hook finished.
You have new mail in /var/mail/dell
解决方法:
1. hdfs dfsadmin -safemode leave
, 释放安全模式
2. 然后使用
启动集群中所有的regionserver参考文献:http://stackoverflow.com/questions/26704763/yarn-resourcetrackerservice-failed-in-state-started