namenode gc导致的故障一例

周末处理的一次故障，这里简单记录下。

故障现象：

6点1 分左右开始， Hadoop集群异常，所有的hdfs操作都出现问题。

几十个 job报以下错

 
           FAILED: 
          
           RuntimeException  
          
           org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): 
          
           Operation category WRITE is not supported in state standby 
          
           或者： FAILED: 
          
           RuntimeException  
          
           org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): 
          
           Operation category READ is not supported in state standby

故障原因分析:

1)看nn的日志，可以发现是写jn导致的，怀疑是网络问题导致，由于是5台jn，如果是网络问题的话，根据major的算法，需要3台写入失败才会导致出现问题，虽然jn和nn有些机器不在同一个交换机下，但是，从监控上来看，网络没有异常，排除网络问题

2)仔细查看nn的日志，和对源码的分析，发现是由于 namenode这个时间点进行了一次时间比较长的 full gc （96s ），导致写 journalnode 超时（默认是20s）， namenode进程退出。同时从jn的日志可以看出，后面是没有nn的请求过来的（因为nn已经挂掉了）。

Gc日志：

 
      
       
         
         
           2014 
           - 
           05 
           -25T05: 
           59 
           : 
           53.578  
           + 
           0800 
           : 
          
 
             
           2478553.316 
           : [Full GC [PSYoungGen: 2784K->0K(4179328K)] [PSOldGen:  
          
 
           28566944K->9208608K(28573696K)] 28569728K->9208608K(32753024K) 
          
 
           [PSPermGen: 41405K->41405K(524288K)],  
           95.6706740  
           secs] [Times: user= 
           95.62  
           sys= 
           0.00 
           , real=  
           95.67  
           secs] 
          
 
           Heap 
          
 
           PSYoungGen      total 4179328K, used 87037K [ 
           0x00007f70e5000000 
           ,  
           0x00007f71e5000000 
           ,  
           0x00007f71e5000000 
           ) 
          
 
              
           eden space 4164800K,  
           2 
           % used [ 
           0x00007f70e5000000 
           , 
           0x00007f70ea4ff590 
           , 
           0x00007f71e3330000 
           ) 
          
 
              
           from space 14528K,  
           0 
           % used [ 
           0x00007f71e3330000 
           , 
           0x00007f71e3330000 
           , 
           0x00007f71e4160000 
           ) 
          
 
              
           to   space 14336K,  
           0 
           % used [ 
           0x00007f71e4200000 
           , 
           0x00007f71e4200000 
           , 
           0x00007f71e5000000 
           ) 
          
 
           PSOldGen        total 28573696K, used 9208608K [ 
           0x00007f6a15000000 
           ,  
           0x00007f70e5000000 
           ,  
           0x00007f70e5000000 
           ) 
          
 
              
           object space 28573696K,  
           32 
           % used [ 
           0x00007f6a15000000 
           , 
           0x00007f6c470c8078 
           , 
           0x00007f70e5000000 
           ) 
          
 
           PSPermGen       total 524288K, used 41466K [ 
           0x00007f69f5000000 
           ,  
           0x00007f6a15000000 
           ,  
           0x00007f6a15000000 
           ) 
          
 
              
           object space 524288K,  
           7 
           % used [ 
           0x00007f69f5000000 
           , 
           0x00007f69f787e8f8 
           , 
           0x00007f6a15000000 
           ) 
          
 
       
 
      
    

jn日志：

 
      
       
         
         
           2014 
           - 
           05 
           - 
           25  
           05 
           : 
           58 
           : 
           45 
           , 
           432  
           INFO  
          
 
           org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing  
          
 
           edits file  
          
 
           /home/vipshop/hard_disk/nn_shared_disk/bipcluster/current/edits_inprogress_0000000001665741687 
          
 
             
           ->  
          
 
           /home/vipshop/hard_disk/nn_shared_disk/bipcluster/current/edits_0000000001665741687- 
           0000000001665777062 
          
 
           2014 
           - 
           05 
           - 
           25  
           06 
           : 
           44 
           : 
           54 
           , 
           299  
           ERROR org.apache.hadoop.hdfs.qjournal.server.JournalNode: RECEIVED SIGNAL  
           15 
           : SIGTERM 
          
 
           2014 
           - 
           05 
           - 
           25  
           06 
           : 
           44 
           : 
           54 
           , 
           302  
           INFO org.apache.hadoop.hdfs.qjournal.server.JournalNode: SHUTDOWN_MSG: 
          
 
       
 
      
    

nn错误日志：

 
      
       
         
         
           2014 
           - 
           05 
           - 
           25  
           06 
           : 
           01 
           : 
           29 
           , 
           258  
           WARN  
          
 
           org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited  
          
 
           96275  
           ms (timeout= 
           20000  
           ms)  
           for  
           a response  
           for  
           sendEdits. No responses  
          
 
           yet. 
          
 
           2014 
           - 
           05 
           - 
           25  
           06 
           : 
           01 
           : 
           29 
           , 
           259  
           WARN  
          
 
           org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host  
          
 
           machine (eg GC): pause of approximately 96145ms 
          
 
           GC pool  
           'PS MarkSweep'  
           had collection(s): count= 
           1  
           time=95670ms 
          
 
           GC pool  
           'PS Scavenge'  
           had collection(s): count= 
           1  
           time=592ms 
          
 
           2014 
           - 
           05 
           - 
           25 
          
 
             
           06 
           : 
           01 
           : 
           29 
           , 
           259  
           WARN  
          
 
           org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took  
          
 
           96281ms to send a batch of  
           2  
           edits ( 
           358  
           bytes) to remote journal  
          
 
           xxx: 
           8485 
          
 
           2014 
           - 
           05 
           - 
           25  
           06 
           : 
           01 
           : 
           29 
           , 
           259  
           WARN  
          
 
           org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took  
          
 
           96283ms to send a batch of  
           2  
           edits ( 
           358  
           bytes) to remote journal  
          
 
           xxx: 
           8485 
          
 
           2014 
           - 
           05 
           - 
           25  
           06 
           : 
           01 
           : 
           29 
           , 
           259  
           INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: 
          
 
           Number of transactions:  
           20019 
          
 
           Total time  
           for  
           transactions(ms):  
           436 
          
 
           Number of transactions batched in Syncs:  
           9441 
          
 
           Number of syncs:  
           5160 
          
 
           SyncTimes(ms):  
           41623  
           9305 
          
 
           2014 
           - 
           05 
           - 
           25 
          
 
             
           06 
           : 
           01 
           : 
           29 
           , 
           259  
           WARN  
          
 
           org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took  
          
 
           96275ms to send a batch of  
           13  
           edits ( 
           1638  
           bytes) to remote journal  
          
 
           xxx: 
           8485 
          
 
           2014 
           - 
           05 
           - 
           25  
           06 
           : 
           01 
           : 
           29 
           , 
           259  
           FATAL  
          
 
           org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: flush failed  
          
 
           for  
           required journal (JournalAndStream(mgr=QJM to [xxx: 
           8485 
           , xxx: 
           8485 
           ,  
          
 
           xxx: 
           8485 
           , xxx: 
           8485 
           , xxx: 
           8485 
           ], stream=QuorumOutputStream starting at  
          
 
           txid  
           1665777063 
           )) 
          
 
           java.io.IOException: Timed out waiting 20000ms  
           for  
           a quorum of nodes to respond. 
          
 
                    
           at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java: 
           137 
           ) 
          
 
                    
           at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java: 
           107 
           ) 
          
 
                    
           at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java: 
           113 
           ) 
          
 
                    
           at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java: 
           107 
           ) 
          
 
                    
           at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$ 
           8 
           .apply(JournalSet.java: 
           490 
           ) 
          
 
                    
           at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java: 
           350 
           ) 
          
 
                    
           at org.apache.hadoop.hdfs.server.namenode.JournalSet.access$ 
           100 
           (JournalSet.java: 
           53 
           ) 
          
 
                    
           at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java: 
           486 
           ) 
          
 
                    
           at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java: 
           581 
           ) 
          
 
                    
           at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java: 
           1879 
           ) 
          
 
                    
           at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java: 
           1845 
           ) 
          
 
                    
           at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java: 
           439 
           ) 
          
 
                    
          
 
             
           at  
          
 
           org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java: 
           207 
           ) 
          
 
                    
          
 
             
           at  
          
 
           org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$ 
           2 
           .callBlockingMethod(ClientNamenodeProtocolProtos.java: 
           44942 
           ) 
          
 
                    
           at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java: 
           453 
           ) 
          
 
                    
           at org.apache.hadoop.ipc.RPC$Server.call(RPC.java: 
           1002 
           ) 
          
 
                    
           at org.apache.hadoop.ipc.Server$Handler$ 
           1 
           .run(Server.java: 
           1752 
           ) 
          
 
                    
           at org.apache.hadoop.ipc.Server$Handler$ 
           1 
           .run(Server.java: 
           1748 
           ) 
          
 
                    
           at java.security.AccessController.doPrivileged(Native Method) 
          
 
                    
           at javax.security.auth.Subject.doAs(Subject.java: 
           396 
           ) 
          
 
                    
           at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java: 
           1438 
           ) 
          
 
                    
           at org.apache.hadoop.ipc.Server$Handler.run(Server.java: 
           1746 
           ) 
          
 
           2014 
           - 
           05 
           - 
           25 
          
 
             
           06 
           : 
           01 
           : 
           29 
           , 
           259  
           WARN  
          
 
           org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took  
          
 
           96276ms to send a batch of  
           13  
           edits ( 
           1638  
           bytes) to remote journal  
          
 
           10.201 
           . 
           202.21 
           : 
           8485 
          
 
           2014 
           - 
           05 
           - 
           25  
           06 
           : 
           01 
           : 
           29 
           , 
           263  
           INFO  
          
 
           org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock:  
          
 
           /tmp/hive-hdfs/hive_2014- 
           05 
           -25_05- 
           59 
           -16_296_5866222407565920996/_task_tmp.-ext- 
           10002 
           /_tmp.001379_0. 
          
 
             
           BP- 
           180494678 
           -xxx- 
           1366627257763  
          
 
           blk_-3717787322078480343_437088977{blockUCState=UNDER_CONSTRUCTION,  
          
 
           primaryNodeIndex=- 
           1 
           , replicas=[ReplicaUnderConstruction[xxx: 
           50010 
           |RBW],  
          
 
           ReplicaUnderConstruction[xxx: 
           50010 
           |RBW],  
          
 
           ReplicaUnderConstruction[xxx: 
           50010 
           |RBW]]} 
          
 
           2014 
           - 
           05 
           - 
           25  
           06 
           : 
           01 
           : 
           29 
           , 
           263  
          
 
           WARN  
          
 
           org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault: 
          
 
             
           Not able to place enough replicas, still in need of  
           2  
           to reach  
           3 
          
 
           For more information, please enable DEBUG log level on org.apache.commons.logging.impl.Log4JLogger 
          
 
           2014 
           - 
           05 
           - 
           25 
          
 
             
           06 
           : 
           01 
           : 
           29 
           , 
           263  
           WARN  
          
 
           org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Aborting  
          
 
           QuorumOutputStream starting at txid  
           1665777063 
          
 
           ...... 
          
 
           2014 
           - 
           05 
           - 
           25  
          
 
           06 
           : 
           01 
           : 
           29 
           , 
           268  
           WARN  
          
 
           org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took  
          
 
           96285ms to send a batch of  
           13  
           edits ( 
           1638  
           bytes) to remote journal  
          
 
           xxx: 
           8485 
          
 
           ...... 
          
 
           2014 
           - 
           05 
           - 
           25  
           06 
           : 
           01 
           : 
           29 
           , 
           324  
           INFO org.apache.hadoop.util.ExitUtil: Exiting with status  
           1 
          
 
           2014 
           - 
           05 
           - 
           25  
           06 
           : 
           01 
           : 
           29 
           , 
           364  
           INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: 
          
 
       
 
      
    

在写journalnode超时时，触发了 ExitUtil类的terminate 方法，终止当前的进程：

JournalSet类中：

 
           for  
           (JournalAndStream jas :  journals) { 
          
           try  
           { 
          
           closure.apply(jas); 
          
           }   
           catch  
           (Throwable t) { 
          
           if  
           (jas.isRequired()) { 
          
           final  
           String msg =  
           "Error: "   
           + status +   
           " failed for required journal (" 
          
           + jas +   
           ")"  
           ; 
          
           LOG .fatal(msg, t); 
          
           // If we fail on *any* of the required journals, then we must not 
          
           // continue on any of the other journals. Abort them to ensure that 
          
           // retry behavior doesn't allow them to keep going in any way. 
          
           abortAllJournals(); 
          
           // the current policy is to shutdown the NN on errors to shared edits 
          
           // dir  . There are many code paths to shared edits failures - syncs , 
          
           // roll of edits etc. All of them go through this common function 
          
           // where the isRequired() check is made. Applying exit policy here 
          
           // to catch all code paths. 
          
           terminate( 
           1 
           , msg); 
          
           }   
           else  
           { 
          
           LOG .error( 
           "Error: "   
           + status +   
           " failed for (journal "  
           + jas +  
           ")"   
           , t); 
          
           badJAS.add(jas);          
          
           } 
          
           } 
          
           }

ExitUtil类的terminate方法，调用了System.exit方法：

 
           /** 
          
           * Terminate the current process. Note that terminate is the *only* method 
          
           * that should be used to terminate the daemon processes. 
          
           * @param status exit code 
          
           * @param msg message used to create the ExitException 
          
           * @throws ExitException if System.exit is disabled for test purposes 
          
           */ 
          
           public  
           static  
           void  
           terminate( 
           int  
           status, String msg)  
           throws  
           ExitException { 
          
           LOG.info(  
           "Exiting with status "  
           + status); 
          
           if  
           (systemExitDisabled) { 
          
           ExitException ee =  
           new  
           ExitException(status, msg); 
          
           LOG.fatal(  
           "Terminate called" 
           , ee); 
          
           if  
           ( 
           null  
           == firstExitException) { 
          
           firstExitException = ee; 
          
           } 
          
           throw  
           ee; 
          
           } 
          
           System.exit(status); 
          
           }

处理方法：

重启集群，在 6:50左右恢复正常

后续解决方法：

1）调节journalnode 的写入超时时间

 
           dfs.qjournal.write-txns.timeout.ms

2）调整namenode 的java参数，提前触发 full gc，这样full gc 的时间就会小一些。

3）默认namenode的fullgc方式是parallel gc，是stw模式的，更改为cms的格式。调整namenode的启动参数：

 
           -XX:+UseCompressedOops  
          
           -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled  
          
           -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction= 
           0  
          
           -XX:+CMSParallelRemarkEnabled -XX:+DisableExplicitGC  
          
           -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction= 
           75 
          
           -XX:SoftRefLRUPolicyMSPerMB= 
           0

另外一个比较坑的是flume不会自动恢复，需要重启。。

否则会一直报java.lang.InterruptedException: sleep interrupted 错误。

本文转自菜菜光 51CTO博客，原文链接：http://blog.51cto.com/caiguangguang/1418895，如需转载请自行联系原作者

namenode gc导致的故障一例

你可能感兴趣的:(namenode gc导致的故障一例)