hadoop启动时,别的进程正常,节点datanode进程启动后又自己停了,以下datanode日志:


2014-09-26 10:20:14,225 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /tmpdir/dfs/data/in_use.lock acquired by nodename
 2376@sn
2014-09-26 10:20:14,237 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool (D
atanode Uuid unassigned) service to nn/192.168.1.105:9000. Exiting.
java.io.IOException: Incompatible clusterIDs in /tmpdir/dfs/data: namenode clusterID = CID-ed60dc6d-e40c-436e-99c9-8b2c55493e68; dat
anode clusterID = CID-4bade8cd-f4b4-472f-b64a-3b101c04d952
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:477)
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:226)
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:254)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:974)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:945)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:278)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:220)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
        at java.lang.Thread.run(Thread.java:745)
2014-09-26 10:20:14,243 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool > (Datanode Uuid unassigned) service to nn/192.168.1.105:9000
2014-09-26 10:20:14,251 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool (Datanode Uuid unassigned)
2014-09-26 10:20:16,252 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2014-09-26 10:20:16,257 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0
2014-09-26 10:20:16,263 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at sn/192.168.1.106
************************************************************/

通过与下面问题【2】相同的方法解决,不知道两个问题有什么共同点



最近一直在学习hadoop,安装了好多次,启动时出现的报错也是很多,对此做个小结:

  format格式化hdfs时报错一般与  hostname解析、${dfs.name.dir}/current/VERSION、  和JAVA_HOME有关系


以下几个典型的例子:


1、Hadoop启动时,格式化hdfs时报错,

HADOOP启动时报错总结_第1张图片

HADOOP启动时报错总结_第2张图片

原因: 以前以 root的身份 format过一次,所在以 dfs.name.dir下边 产生了一些权限,  而hadoop用户又对些没有权限,所以报错,

解决方法:删除${dfs.name.dir}/current 这个current目录。 或者更 改所有者。


2、  以前hadoop启动是正常的,各进程也正常,重新格式化hdfs文件系统hadoop namenode -format后,再启动start-all.sh,也没有报错信息,但是jps发现datanode没有启动,查找datanode日志如下:

# vim hadoop-hduser-datanode-cm134.jaybing.com.log

2014-08-10 10:16:06,693 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = cm134.jaybing.com/202.106.199.38
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.2-cdh3u5
STARTUP_MSG:   build = git://ubuntu-slave02/var/lib/jenkins/workspace/CDH3u5-Full-RC/build/cdh3/hadoop20/0.20.2-cdh3u5/source -r 302
33064aaf5f2492bc687d61d72956876102109; compiled by 'jenkins' on Fri Oct  5 17:21:34 PDT 2012
************************************************************/
2014-08-10 10:16:08,098 INFO org.apache.hadoop.security.UserGroupInformation: JAAS Configuration already set up for Hadoop, not re-i
nstalling.
2014-08-10 10:16:09,453 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /ha
doop/tmp/dfs/data: namenode namespaceID = 2024141122; datanode namespaceID = 1824410798

        at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:238)
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:153)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:423)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:314)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1683)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1623)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1641)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1767)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1784)

2014-08-10 10:16:09,461 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at cm134.jaybing.com/202.106.199.38
************************************************************/


解决方法:

${dfs.data.dir}/data/current/VERSION  修改这个路径下的VERSION文件,

把VERSION文件中的namespaceID 值改成,上面错误日志中的namespaceID值,

再重启datanode应该就好了。

注意:datanode节点数有很多,要一 一全部修改;

HADOOP启动时报错总结_第3张图片

如果是测试环境 的话,直接把data文件删除了,也可以。



3、新版本2.4.1中,profile/hadoop-env.sh中均己设置 JAVA_HOME,  java -version也正常。

在0.20.2版本中 没遇到这个错误。


启动时报错:

[root@nn ~]# start-all.sh

Starting namenodes on []

localhost: Error: JAVA_HOME is not set and could not be found.

localhost: Error: JAVA_HOME is not set and could not be found.

...

starting yarn daemons
starting resourcemanager, logging to /home/lihanhui/open-source/hadoop-2.1.0-beta/logs/yarn-admin-resourcemanager-localhost.out
localhost: Error: JAVA_HOME is not set and could not be found


直接命令行执行export JAVA_HOME=/PATH/TO/JDK也无法解决问题:


最终在   hadoop-2.4.1/etc/hadoop/libexec/hadoop-config.sh  这个配置文件中搜到报错信息“JAVA_HOME is not set and could not be found

于是在这个配置文件中,  export JAVA_HOME=/PATH/JDK


问题得己解决、