Hadoop 集群问题

部署好Hadoop集群,试着启动hadoop,结果datanode起不来,日志如下:
2013-04-09 21:56:28,196 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /tmp/hadoop-hadoop/dfs/data: namenode namespaceID = 1993765830; datanode namespaceID = 1375972635
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:232)


查阅网上的资料,是个常见问题,
当重新format namenode之后,会在namenode结点生成新的namenode namespaceID,如果这时启动HDFS集群,在datanode结点会出现如下的错误提示:datanode namespaceID与namenode namespaceID不一致。
参考 http://stackoverflow.com/questions/10097246/no-data-nodes-are-started
第一种:
1. Stop the cluster
2. Delete the data directory on the problematic DataNode: the directory is specified by dfs.data.dir in conf/hdfs-site.xml; if you followed this tutorial, the relevant directory is /app/hadoop/tmp/dfs/data
我没有设置dfs.data.dir属性在hdfs-site.mxl中,只在name node机器上的core-site.xml设置了hadoop.tmp.dir属性,而我的data node机器上的core-site.xml和hdfs-site.xml都没有分别设置二者,这样name node的临时文件夹就会在配置的目录下/home/hadoop/tmp/,而data node的临时文件夹就会默认在/tmp/hadoop-<user_name>下.
3. Reformat the NameNode (NOTE: all HDFS data is lost during this process!)
4. Restart the cluster
第二种:
1. Stop the DataNode
2. Edit the value of namespaceID in /current/VERSION to match the value of the current NameNode
3. Restart the DataNode

碰到的第二个问题:
namenode上的日志:
 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hadoop cause:java.io.IOException: File /home/hadoop/tmp/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
2013-04-09 22:06:09,658 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 9000, call addBlock(/home/hadoop/tmp/mapred/system/jobtracker.info, DFSClient_NONMAPREDUCE_278324081_1, null) from 192.168.1.121:36492: error: java.io.IOException: File /home/hadoop/tmp/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
java.io.IOException: File /home/hadoop/tmp/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1

修改datanode下core-site.xml,增加hadoop.tmp.dir,使其与namenode一样。重新format, 启动cluster. 问题解决。

错误发生在调用addBlock函数时,似乎hadoop用户组没有权限做写操作。
addBlock()方法是负责分配一个新的block以及该block备份存储的datanode。
但是我并没有找到/home/hadoop/tmp/mapred/system/jobtracker.info,所以很困惑,求高人解答。

你可能感兴趣的:(hadoop)