hadoop启动报错-namenode无法启动-GC overhead limit exceeded

报错场景:凌晨4:30分钟

报错日志:2016-03-22 04:30:29,075 WARN org.apache.hadoop.ipc.Server: IPC Server handler 2 on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from 10.10.10.4
3:54994 Call#7 Retry#0: error: java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
2016-03-22 04:30:43,111 WARN org.apache.hadoop.ipc.Server: Error serializing call response for call org.apache.hadoop.hdfs.protocol.ClientProtocol.getBlockLocations fro
m 10.10.10.43:55003 Call#4 Retry#0
2016-03-22 04:30:39,756 WARN org.apache.hadoop.ipc.Server: Error serializing call response for call org.apache.hadoop.hdfs.protocol.ClientProtocol.getBlockLocations fro
m 10.10.10.43:54997 Call#4 Retry#0
java.lang.OutOfMemoryError: Java heap space
2016-03-22 04:30:34,398 FATAL org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: ReplicationMonitor thread received Runtime exception.
java.lang.OutOfMemoryError: Java heap space
2016-03-22 04:30:34,398 WARN org.apache.hadoop.ipc.Server: IPC Server handler 8 on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getBlockLocations from 10.1
0.10.43:55000 Call#4 Retry#0: error: java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
2016-03-22 04:30:34,398 ERROR org.mortbay.log: EXCEPTION
java.lang.OutOfMemoryError: Java heap space
2016-03-22 04:35:37,684 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2016-03-22 04:37:27,793 ERROR org.mortbay.log: EXCEPTION
java.lang.OutOfMemoryError: Java heap space
2016-03-22 04:37:27,793 WARN org.apache.hadoop.ipc.Server: IPC Server handler 3 on 9000, call org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReport from 1
0.10.10.148:45637 Call#5441484 Retry#0: error: java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
2016-03-22 04:37:27,793 WARN org.apache.hadoop.ipc.Server: Out of Memory in server select
java.lang.OutOfMemoryError: Java heap space
2016-03-22 04:37:27,793 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection to http://bis-newnamenode-s-02:50070/getimage?getimage=1&txid=42
4042092&storageInfo=-47:1574903840:0:CID-d29c5605-82ec-474f-950a-fd106ad23daa
2016-03-22 04:37:27,804 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:

接着重启服务,报如下的错误:GC问题

java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.nio.CharBuffer.allocate(CharBuffer.java:331)
        at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:777)
        at org.apache.hadoop.io.Text.decode(Text.java:405)
        at org.apache.hadoop.io.Text.decode(Text.java:377)
        at org.apache.hadoop.io.Text.readString(Text.java:470)
        at org.apache.hadoop.fs.permission.PermissionStatus.readFields(PermissionStatus.java:90)
        at org.apache.hadoop.fs.permission.PermissionStatus.read(PermissionStatus.java:105)
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadINode(FSImageFormat.java:682)
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadINodeWithLocalName(FSImageFormat.java:616)
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadChildren(FSImageFormat.java:453)
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadDirectoryWithSnapshot(FSImageFormat.java:495)
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadDirectoryWithSnapshot(FSImageFormat.java:504)
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadDirectoryWithSnapshot(FSImageFormat.java:504)
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadDirectoryWithSnapshot(FSImageFormat.java:504)
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadDirectoryWithSnapshot(FSImageFormat.java:504)
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadLocalNameINodesWithSnapshot(FSImageFormat.java:398)
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.load(FSImageFormat.java:339)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:823)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:664)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:633)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:264)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:787)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:568)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:443)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:491)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:684)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:669)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1254)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1320)
2016-03-22 06:54:29,716 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2016-03-22 06:54:29,758 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG

调整了mapred-site.xml的mapred.child.java.opts成2000
hadoop-env.sh的HADOOP_HEAPSIZE=2000

启动还是不行,查看报错信息,是加载FSImage太大导致,后翻备份的元数据目录(每小时备份一次),查看FSImage文件是不断增大

2016-03-22 09:07:46,848 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Number of files = 5189987
2016-03-22 09:10:03,405 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.nio.ByteBuffer.wrap(ByteBuffer.java:369)
        at java.nio.ByteBuffer.wrap(ByteBuffer.java:392)
        at org.apache.hadoop.io.Text.decode(Text.java:377)
        at org.apache.hadoop.io.Text.readString(Text.java:470)
        at org.apache.hadoop.fs.permission.PermissionStatus.readFields(PermissionStatus.java:91)
        at org.apache.hadoop.fs.permission.PermissionStatus.read(PermissionStatus.java:105)
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadINode(FSImageFormat.java:682)
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadINodeWithLocalName(FSImageFormat.java:616)
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadChildren(FSImageFormat.java:453)
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadDirectoryWithSnapshot(FSImageFormat.java:495)
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadDirectoryWithSnapshot(FSImageFormat.java:504)
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadDirectoryWithSnapshot(FSImageFormat.java:504)
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadDirectoryWithSnapshot(FSImageFormat.java:504)
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadDirectoryWithSnapshot(FSImageFormat.java:504)
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadLocalNameINodesWithSnapshot(FSImageFormat.java:398)
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.load(FSImageFormat.java:339)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:823)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:664)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:633)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:264)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:787)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:568)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:443)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:491)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:684)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:669)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1254)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1320)


最后调整了hadoop-env.sh的HADOOP_NAMENODE_INIT_HEAPSIZE,启动成功!

根据群的好友沟通,好像是小文件太多,导致的问题,一查是500多万个,果然把一些表的小文件合并或者删除了

FSImage文件由原来的600M变成了330M,减少了一半;再果断在程序里面加规则,处理小文件。

你可能感兴趣的:(hadoop)