Hadoop应用中的异常记录

1、File  could only be replicated to 0 nodes, instead of 1

      原因(1):可能是从机上也执行“$HADOOP_HOME/bin/hadoop namenode -format”命令。

      解决方法(1):只需要删除format生成的目录位置,默认为“/tmp/hadoop-username”。

2、MapReduce的参数关系图

 Hadoop应用中的异常记录_第1张图片

 

 3、Incompatible namespaceIDs in /home/hadoop/hadoop-1.0.3/data: namenode

感谢原有发帖人,内容来源:http://f.dataguru.cn/thread-24378-1-1.html

***最近发现重启虚拟机后,启动Hadoop后发现datanode无法启动,查看日志报如下错误:
ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /home/hadoop/hadoop-1.0.3/data: namenode namespaceID = 691360530; datanode namespaceID = 2008526552
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:232)
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:147)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:385)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:299)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1582)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1521)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1539)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1665)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1682)
2012-10-18 18:58:16,365 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:

分析:网上帖子说是由于namenode namespaceID 和datanode namespaceID 不一致造成原因。
解决:按照帖子的方法,删除了hadoop目录下data下的所有文件及目录,重启hadoop问题解决;
疑问:导致这个问题的原因是什么呢?每次都需要这么删除么?各位兄弟姐妹都有了解深层次原因的么?
*****************************************************************************************************************************************************

原因是因为你的hadoop.tmp.dir在/tmp目录下,而linux系统的/tmp文件夹内容能够是定时清理的,所以会导致你看hadoop使用不了了,就反复的格式化namenode会导致上述问题,也有可能是datanode长期没正常启动导致;
找了一下资料,有三个解决方案:
解決方法一:删除 datanode 的所有资料,主要指的是tmp目录和data目录,适用没存放过任何资料的HDFS;
解決方法二:修改 datanode 的 namespaceID
编辑每台 datanode 的 hadoop.tmp.dir/hadoop/hadoop-root/dfs/data/current/VERSION 把ID改为和namenode一致,重启datanode,数据会丢失;
解決方法三:修改 namenode 的 namespaceID(网上找到的)
编辑 namenode 的 hadoop.tmp.dir/hadoop/hadoop-root/dfs/name/current/VERSION 把ID改为和datanode一直,重启namenode,我测试了一下,第三种方法不行,我初步断定namespaceID生成的时候,里面可能有时间 的随机数,我在测试中改了namenode的namespaceID,让namende和datanode一直,但是重启后他会自动的核对,他重新的修改 回来,没办法,我只好采用了第二种方案,然后我仔细看了namenoe启动的日志,发现 有日志块注册的信息,注册完后,namenode发现datanode上有不属于自己的data,就发送了delete的命令

2012-10-19 16:57:20,980 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.registerDatanode: node registration from 192.168.80.84:50010 storage DS-584796903-192.168.80.84-50010-1350015221338
2012-10-19 16:57:21,142 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/192.168.80.84:50010
2012-10-19 16:57:21,618 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.registerDatanode: node registration from 192.168.80.83:50010 storage DS-942449248-192.168.80.83-50010-1350015230758
2012-10-19 16:57:21,618 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/192.168.80.83:50010
2012-10-19 16:57:21,866 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.processReport: block blk_-8214438839875239556_1105 on 192.168.80.84:50010 size 67108864 does not belong to any file.
2012-10-19 16:57:21,882 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_-8214438839875239556 is added to invalidSet of 192.168.80.84:50010
2012-10-19 16:57:21,882 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.processReport: block blk_-4821437377619945111_1112 on 192.168.80.84:50010 size 4 does not belong to any file.
2012-10-19 16:57:21,882 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_-4821437377619945111 is added to invalidSet of 192.168.80.84:500102012-10-19 16:57:21,618 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/192.168.80.83:50010

 

4、FSUtils: Waiting for dfs to exit safe mode

 

问题记录:

1、启动Hbase之前尽量关闭Hadoop的HDFS的安全模式,未关闭可能会造成Hbase在HDFS上创建文件不成功,日志记录中也会出现如下记录:

2012-04-10 21:37:01,999 INFO org.apache.hadoop.hbase.util.FSUtils: Waiting for dfs to exit safe mode...
2012-04-10 21:37:12,003 INFO org.apache.hadoop.hbase.util.FSUtils: Waiting for dfs to exit safe mode...
2012-04-10 21:37:22,006 INFO org.apache.hadoop.hbase.util.FSUtils: Waiting for dfs to exit safe mode...
2012-04-10 21:37:32,011 INFO org.apache.hadoop.hbase.util.FSUtils: Waiting for dfs to exit safe mode...
2012-04-10 21:37:42,014 INFO org.apache.hadoop.hbase.util.FSUtils: Waiting for dfs to exit safe mode...
2012-04-10 21:37:52,019 INFO org.apache.hadoop.hbase.util.FSUtils: Waiting for dfs to exit safe mode...
2012-04-10 21:38:02,022 INFO org.apache.hadoop.hbase.util.FSUtils: Waiting for dfs to exit safe mode...
2012-04-10 21:38:12,029 INFO org.apache.hadoop.hbase.util.FSUtils: Waiting for dfs to exit safe mode...
2012-04-10 21:38:22,032 INFO org.apache.hadoop.hbase.util.FSUtils: Waiting for dfs to exit safe mode...
2012-04-10 21:38:32,036 INFO org.apache.hadoop.hbase.util.FSUtils: Waiting for dfs to exit safe mode...
 解决方法:使用 hadoop dfsadmin -safemode leave 命令使Hadoop退出安全模式。

 

5、遇到连接不上hadoop.main

2014-01-28 04:50:07,968 INFO org.apache.hadoop.ipc.RPC: Server at hadoop.main/192.168.1.90:9000 not available yet, Zzzzz...
2014-01-28 04:50:09,973 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop.main/192.168.1.90:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2014-01-28 04:50:10,975 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop.main/192.168.1.90:9000. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

重新执行以下命令即可解决问题。

bin/hadoop namenode -format

 6、java.io.EOFException: Premature EOF from inputStream

MapReduce使用lzo压缩注意

 

你可能感兴趣的:(hadoop)