master.apache.org slave1.apache.org slave2.apache.org
slave1.apache.org slave2.apache.org
执行 start-dfs.sh后,发现slave1和slave2的datanode都没有启动成功(jps命令查看),日志报错如下
2015-01-31 19:47:46,714 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-1281906893-192.168.131.130-1422721940012 (storage id DS-1777570355-192.168.131.132-50010-1413772380962) service to master.apache.org/192.168.131.130:8020 java.io.IOException: Incompatible clusterIDs in /opt/soft/hadoop-2.2.0/data/tmp/dfs/data: namenode clusterID = CID-418d88d8-300b-484d-9160-3c1c2e43ef08; datanode clusterID = CID-848361e5-fb20-41e3-af15-3030ae133dae at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:391) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:191) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:219) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:837) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:808) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:280) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:222) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664) at java.lang.Thread.run(Thread.java:745) 2015-01-31 19:47:46,716 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool BP-1281906893-192.168.131.130-1422721940012 (storage id DS-1777570355-192.168.131.132-50010-1413772380962) service to master.apache.org/192.168.131.130:8020 2015-01-31 19:47:46,724 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool BP-1281906893-192.168.131.130-1422721940012 (storage id DS-1777570355-192.168.131.132-50010-1413772380962) 2015-01-31 19:47:48,725 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode 2015-01-31 19:47:48,727 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0 2015-01-31 19:47:48,729 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
从日志中可以看出,原因是因为datanode的clusterID 和 namenode的clusterID 不匹配。
打开hdfs-site.xml里配置的datanode和namenode对应的目录,分别打开current文件夹里的VERSION,可以看到clusterID项正如日志里记录的一样,确实不一致,修改datanode里VERSION文件的clusterID 与namenode里的一致,再重新启动dfs(执行start-dfs.sh)再执行jps命令可以看到datanode已正常启动。
出现该问题的原因:在第一次格式化dfs后,启动并使用了hadoop,后来又重新执行了格式化命令(hdfs namenode -format),这时namenode的clusterID会重新生成,而datanode的clusterID 保持不变。
2
2015-01-31 11:23:52,693 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join java.io.IOException: There appears to be a gap in the edit log. We expected txid 1, but got txid 52. at org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:184) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:112) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:733) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:647) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:264) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:787) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:568) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:443) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:491) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:684) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:669) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1254) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1320) 2015-01-31 11:23:52,696 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1 2015-01-31 11:23:52,698 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
3 eclipse在完全分布式模式下运行失败,job.waitForCompletion(true)返回false
取消hadoop hdfs的用户权限检查。打开conf/hdfs-site.xml,找到dfs.permissions属性修改为false(默认为true)OK了。
<property> <name>dfs.permissions</name> <value>false</value> </property>
4 eclipse 本地开发过程中,job.waitForCompletion(true)返回false的原因查找处理过程
首先配置log4j,能够在本地查看错误信息(非常重要)
在src下面新建file名为log4j.properties内容如下:
log4j.rootLogger=WARN,stdout,logfile log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender.stdout.layout=org.apache.log4j.PatternLayout log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n log4j.appender.logfile=org.apache.log4j.FileAppender log4j.appender.logfile.File=hadoop.log log4j.appender.logfile.layout=org.apache.log4j.PatternLayout log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%ns