1:Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out
Answer:
程序里面需要打开多个文件,进行分析,系统一般默认数量是1024,(用ulimit -a可以看到)对于正常使用是够了,但是对于程序来讲,就太少了。
修改办法:
修改2个文件。
/etc/security/limits.conf
vi /etc/security/limits.conf
加上:
$cd /etc/pam.d/
$sudo vi login
添加 session required /lib/security/pam_limits.so
针对第一个问题我纠正下答案:
这是reduce预处理阶段shuffle时获取已完成的map的输出失败次数超过上限造成的,上限默认为5。引起此问题的方式可能会有很多种,比如网络连接不正常,连接超时,带宽较差以及端口阻塞等。。。通常框架内网络情况较好是不会出现此错误的。
2:Too many fetch-failures
Answer:
出现这个问题主要是结点间的连通不够全面。
3:处理速度特别的慢 出现map很快 但是reduce很慢 而且反复出现 reduce=0%
Answer:
结合第二点,然后
修改 conf/hadoop-env.sh 中的export HADOOP_HEAPSIZE=4000
4:能够启动datanode,但无法访问,也无法结束的错误
在 重新格式化一个新的分布式文件时,需要将你NameNode上所配置的dfs.name.dir这一namenode用来存放NameNode 持久存储名字空间及事务日志的本地文件系统路径删除,同时将各DataNode上的dfs.data.dir的路径 DataNode 存放块数据的本地文件系统路径的目录也删除。如本此配置就是在NameNode上删除/home/hadoop/NameData,在DataNode上 删除/home/hadoop/DataNode1和/home/hadoop/DataNode2。这是因为Hadoop在格式化一个新的分布式文件系 统时,每个存储的名字空间都对应了建立时间的那个版本(可以查看/home/hadoop /NameData/current目录下的VERSION文件,上面记录了版本信息),在重新格式化新的分布式系统文件时,最好先删除NameData 目录。必须删除各DataNode的dfs.data.dir。这样才可以使namedode和datanode记录的信息版本对应。
注意:删除是个很危险的动作,不能确认的情况下不能删除!!做好删除的文件等通通备份!!
5:java.io.IOException: Could not obtain block: blk_194219614024901469_1100 file=/user/hive/warehouse/src_20090724_log/src_20090724_log
出现这种情况大多是结点断了,没有连接上。
6:java.lang.OutOfMemoryError: Java heap space
出现这种异常,明显是jvm内存不够得原因,要修改所有的datanode的jvm内存大小。
Java -Xms1024m -Xmx4096m
一般jvm的最大内存使用应该为总内存大小的一半,我们使用的8G内存,所以设置为4096m,这一值可能依旧不是最优的值。
本主题由 admin 于 2009-11-20 10:50 置顶
顶,这样的贴子非常好,要置顶。附件是由Hadoop技术交流群中若冰的同学提供的相关资料:
(12.58 KB)
Hadoop添加节点的方法
自己实际添加节点过程:
其他备注:
较好的建议:
The right number of reduces seems to be 0.95 or 1.75 multiplied by (
mapred.tasktracker.reduce.tasks.maximum
2
The maximum number of reduce tasks that will be run
simultaneously by a task tracker.
单个node新加硬盘
1.修改需要新加硬盘的node的dfs.data.dir,用逗号分隔新、旧文件目录
2.重启dfs
同步hadoop 代码
hadoop-env.sh
用命令合并HDFS小文件
hadoop fs -getmerge
重启reduce job方法
Introduced recovery of jobs when JobTracker restarts. This facility is off by default.
Introduced config parameters “mapred.jobtracker.restart.recover”, “mapred.jobtracker.job.history.block.size”, and “mapred.jobtracker.job.history.buffer.size”.
还未验证过。
IO写操作出现问题
0-1246359584298, infoPort=50075, ipcPort=50020):Got exception while serving blk_-5911099437886836280_1292 to /172.16.100.165:
java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/
172.16.100.165:50010 remote=/172.16.100.165:50930]
at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:185)
at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:293)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:387)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:179)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:94)
at java.lang.Thread.run(Thread.java:619)
It seems there are many reasons that it can timeout, the example given in
HADOOP-3831 is a slow reading client.
解决办法:在hadoop-site.xml中设置dfs.datanode.socket.write.timeout=0试试;
My understanding is that this issue should be fixed in Hadoop 0.19.1 so that
we should leave the standard timeout. However until then this can help
resolve issues like the one you’re seeing.
HDFS退服节点的方法
目前版本的dfsadmin的帮助信息是没写清楚的,已经file了一个bug了,正确的方法如下:
附带说一下 -refreshNodes 命令的另外三种用途:
2. 添加允许的节点到列表中(添加主机名到 dfs.hosts 里来)
3. 直接去掉节点,不做数据副本备份(在 dfs.hosts 里去掉主机名)
4. 退服的逆操作——停止 exclude 里面和 dfs.hosts 里面都有的,正在进行 decomission 的节点的退服,也就是把 Decomission in progress 的节点重新变为 Normal (在 web 界面叫 in service)
hadoop 学习借鉴
解决hadoop OutOfMemoryError问题:
mapred.child.java.opts
-Xmx800M -server
With the right JVM size in your hadoop-site.xml , you will have to copy this
to all mapred nodes and restart the cluster.
或者:hadoop jar jarfile [main class] -D mapred.child.java.opts=-Xmx800M
Hadoop java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) while indexing.
when i use nutch1.0,get this error:
Hadoop java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) while indexing.
这个也很好解决:
可以删除conf/log4j.properties,然后可以看到详细的错误报告
我这儿出现的是out of memory
解决办法是在给运行主类org.apache.nutch.crawl.Crawl加上参数:-Xms64m -Xmx512m
你的或许不是这个问题,但是能看到详细的错误报告问题就好解决了
distribute cache使用
类似一个全局变量,但是由于这个变量较大,所以不能设置在config文件中,转而使用distribute cache
具体使用方法:(详见《the definitive guide》,P240)
hadoop的job显示web
There are web-based interfaces to both the JobTracker (MapReduce master) and NameNode (HDFS master) which display status pages about the state of the entire system. By default, these are located at [WWW] http://job.tracker.addr:50030/ and [WWW] http://name.node.addr:50070/.
hadoop监控
OnlyXP(52388483) 131702
用nagios作告警,ganglia作监控图表即可
status of 255 error
错误类型:
java.io.IOException: Task process exit with nonzero status of 255.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:424)
错误原因:
Set mapred.jobtracker.retirejob.interval and mapred.userlog.retain.hours to higher value. By default, their values are 24 hours. These might be the reason for failure, though I’m not sure
split size
FileInputFormat input splits: (详见 《the definitive guide》P190)
mapred.min.split.size: default=1, the smallest valide size in bytes for a file split.
mapred.max.split.size: default=Long.MAX_VALUE, the largest valid size.
dfs.block.size: default = 64M, 系统中设置为128M。
如果设置 minimum split size > block size, 会增加块的数量。(猜想从其他节点拿去数据的时候,会合并block,导致block数量增多)
如果设置maximum split size < block size, 会进一步拆分block。
split size = max(minimumSize, min(maximumSize, blockSize));
其中 minimumSize < blockSize < maximumSize.
sort by value
hadoop 不提供直接的sort by value方法,因为这样会降低mapreduce性能。
但可以用组合的办法来实现,具体实现方法见《the definitive guide》, P250
基本思想:
small input files的处理
对于一系列的small files作为input file,会降低hadoop效率。
有3种方法可以将small file合并处理:
查看files in the archive:
bin/hadoop fs -lsr har://my/files.har
skip bad records
JobConf conf = new JobConf(ProductMR.class);
conf.setJobName(“ProductMR”);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(Product.class);
conf.setMapperClass(Map.class);
conf.setReducerClass(Reduce.class);
conf.setMapOutputCompressorClass(DefaultCodec.class);
conf.setInputFormat(SequenceFileInputFormat.class);
conf.setOutputFormat(SequenceFileOutputFormat.class);
String objpath = “abc1”;
SequenceFileInputFormat.addInputPath(conf, new Path(objpath));
SkipBadRecords.setMapperMaxSkipRecords(conf, Long.MAX_VALUE);
SkipBadRecords.setAttemptsToStartSkipping(conf, 0);
SkipBadRecords.setSkipOutputPath(conf, new Path(“data/product/skip/”));
String output = “abc”;
SequenceFileOutputFormat.setOutputPath(conf, new Path(output));
JobClient.runJob(conf);
For skipping failed tasks try : mapred.max.map.failures.percent
restart 单个datanode
如果一个datanode 出现问题,解决之后需要重新加入cluster而不重启cluster,方法如下:
bin/hadoop-daemon.sh start datanode
bin/hadoop-daemon.sh start jobtracker
reduce exceed 100%
"Reduce Task Progress shows > 100% when the total size of map outputs (for a
single reducer) is high "
造成原因:
在reduce的merge过程中,check progress有误差,导致status > 100%,在统计过程中就会出现以下错误:java.lang.ArrayIndexOutOfBoundsException: 3
at org.apache.hadoop.mapred.StatusHttpServer T a s k G r a p h S e r v l e t . g e t R e d u c e A v a r a g e P r o g r e s s e s ( S t a t u s H t t p S e r v e r . j a v a : 228 ) a t o r g . a p a c h e . h a d o o p . m a p r e d . S t a t u s H t t p S e r v e r TaskGraphServlet.getReduceAvarageProgresses(StatusHttpServer.java:228) at org.apache.hadoop.mapred.StatusHttpServer TaskGraphServlet.getReduceAvarageProgresses(StatusHttpServer.java:228)atorg.apache.hadoop.mapred.StatusHttpServerTaskGraphServlet.doGet(StatusHttpServer.java:159)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
at org.mortbay.http.HttpServer.service(HttpServer.java:954)
jira地址:
counters
3中counters:
reporter.incrCounter(Temperature.MISSING, 1)
结果显示:
09/04/20 06:33:36 INFO mapred.JobClient: Air Temperature Recor
09/04/20 06:33:36 INFO mapred.JobClient: Malformed=3
09/04/20 06:33:36 INFO mapred.JobClient: Missing=66136856
3. dynamic countes:
调用方式:
reporter.incrCounter(“TemperatureQuality”, parser.getQuality(),1);
结果显示:
09/04/20 06:33:36 INFO mapred.JobClient: TemperatureQuality
09/04/20 06:33:36 INFO mapred.JobClient: 2=1246032
09/04/20 06:33:36 INFO mapred.JobClient: 1=973422173
09/04/20 06:33:36 INFO mapred.JobClient: 0=1
7: Namenode in safe mode
解决方法
bin/hadoop dfsadmin -safemode leave
8:java.net.NoRouteToHostException: No route to host
j解决方法:
sudo /etc/init.d/iptables stop
9:更改namenode后,在hive中运行select 依旧指向之前的namenode地址
这是因为:When youcreate a table, hive actually stores the location of the table (e.g.
hdfs://ip:port/user/root/…) in the SDS and DBS tables in the metastore . So when I bring up a new cluster the master has a new IP, but hive’s metastore is still pointing to the locations within the old
cluster. I could modify the metastore to update with the new IP everytime I bring up a cluster. But the easier and simpler solution was to just use an elastic IP for the master
所以要将metastore中的之前出现的namenode地址全部更换为现有的namenode地址
10:Your DataNode is started and you can create directories with bin/hadoop dfs -mkdir, but you get an error message when you try to put files into the HDFS (e.g., when you run a command like bin/hadoop dfs -put).
解决方法:
Go to the HDFS info web page (open your web browser and go to http://namenode:dfs_info_port where namenode is the hostname of your NameNode and dfs_info_port is the port you chose dfs.info.port; if followed the QuickStart on your personal computer then this URL will be http://localhost:50070). Once at that page click on the number where it tells you how many DataNodes you have to look at a list of the DataNodes in your cluster.
If it says you have used 100% of your space, then you need to free up room on local disk(s) of the DataNode(s).
If you are on Windows then this number will not be accurate (there is some kind of bug either in Cygwin’s df.exe or in Windows). Just free up some more space and you should be okay. On one Windows machine we tried the disk had 1GB free but Hadoop reported that it was 100% full. Then we freed up another 1GB and then it said that the disk was 99.15% full and started writing data into the HDFS again. We encountered this bug on Windows XP SP2.
11:Your DataNodes won’t start, and you see something like this in logs/datanode:
Incompatible namespaceIDs in /tmp/hadoop-ross/dfs/data
原因:
Your Hadoop namespaceID became corrupted. Unfortunately the easiest thing to do reformat the HDFS.
解决方法:
You need to do something like this:
bin/stop-all.sh
rm -Rf /tmp/hadoop-your-username/*
bin/hadoop namenode -format
12:You can run Hadoop jobs written in Java (like the grep example), but your HadoopStreaming jobs (such as the Python example that fetches web page titles) won’t work.
原因:
You might have given only a relative path to the mapper and reducer programs. The tutorial originally just specified relative paths, but absolute paths are required if you are running in a real cluster.
解决方法:
Use absolute paths like this from the tutorial:
bin/hadoop jar contrib/hadoop-0.15.2-streaming.jar
-mapper $HOME/proj/hadoop/multifetch.py
-reducer $HOME/proj/hadoop/reducer.py
-input urls/*
-output titles
13: 2009-01-08 10:02:40,709 ERROR metadata.Hive (Hive.java:getPartitions(499)) - javax.jdo.JDODataStoreException: Required table missing : ““PARTITIONS”” in Catalog “” Schema “”. JPOX requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable “org.jpox.autoCreateTables”
原因:就是因为在 hive-default.xml 里把 org.jpox.fixedDatastore 设置成 true 了
starting namenode, logging to /home/hadoop/HadoopInstall/hadoop/bin/…/logs/hadoop-hadoop-namenode-hadoop.out
localhost: starting datanode, logging to /home/hadoop/HadoopInstall/hadoop/bin/…/logs/hadoop-hadoop-datanode-hadoop.out
localhost: starting secondarynamenode, logging to /home/hadoop/HadoopInstall/hadoop/bin/…/logs/hadoop-hadoop-secondarynamenode-hadoop.out
localhost: Exception in thread “main” java.lang.NullPointerException
localhost: at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:130)
localhost: at org.apache.hadoop.dfs.NameNode.getAddress(NameNode.java:116)
localhost: at org.apache.hadoop.dfs.NameNode.getAddress(NameNode.java:120)
localhost: at org.apache.hadoop.dfs.SecondaryNameNode.initialize(SecondaryNameNode.java:124)
localhost: at org.apache.hadoop.dfs.SecondaryNameNode.(SecondaryNameNode.java:108)
localhost: at org.apache.hadoop.dfs.SecondaryNameNode.main(SecondaryNameNode.java:460)
14:09/08/31 18:25:45 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:Bad connect ack with firstBadLink 192.168.1.11:50010
09/08/31 18:25:45 INFO hdfs.DFSClient: Abandoning block blk_-8575812198227241296_1001
09/08/31 18:25:51 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
Bad connect ack with firstBadLink 192.168.1.16:50010
09/08/31 18:25:51 INFO hdfs.DFSClient: Abandoning block blk_-2932256218448902464_1001
09/08/31 18:25:57 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
Bad connect ack with firstBadLink 192.168.1.11:50010
09/08/31 18:25:57 INFO hdfs.DFSClient: Abandoning block blk_-1014449966480421244_1001
09/08/31 18:26:03 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
Bad connect ack with firstBadLink 192.168.1.16:50010
09/08/31 18:26:03 INFO hdfs.DFSClient: Abandoning block blk_7193173823538206978_1001
09/08/31 18:26:09 WARN hdfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable
to create new block.
at org.apache.hadoop.hdfs.DFSClient D F S O u t p u t S t r e a m . n e x t B l o c k O u t p u t S t r e a m ( D F S C l i e n t . j a v a : 2731 ) a t o r g . a p a c h e . h a d o o p . h d f s . D F S C l i e n t DFSOutputStream.nextBlockOutputStream(DFSClient.java:2731) at org.apache.hadoop.hdfs.DFSClient DFSOutputStream.nextBlockOutputStream(DFSClient.java:2731)atorg.apache.hadoop.hdfs.DFSClientDFSOutputStream.access 2000 ( D F S C l i e n t . j a v a : 1996 ) a t o r g . a p a c h e . h a d o o p . h d f s . D F S C l i e n t 2000(DFSClient.java:1996) at org.apache.hadoop.hdfs.DFSClient 2000(DFSClient.java:1996)atorg.apache.hadoop.hdfs.DFSClientDFSOutputStream$DataStreamer.run(DFSClient.java:2182)09/08/31 18:26:09 WARN hdfs.DFSClient: Error Recovery for block blk_7193173823538206978_1001
bad datanode[2] nodes == null
09/08/31 18:26:09 WARN hdfs.DFSClient: Could not get block locations. Source file “/user/umer/8GB_input”
put: Bad connect ack with firstBadLink 192.168.1.16:50010
解决方法:
I have resolved the issue:
What i did:
Windows eclispe调试hive报does not have a scheme错误可能原因
1、Hive配置文件中的“hive.metastore.local”配置项值为false,需要将它修改为true,因为是单机版
2、没有设置HIVE_HOME环境变量,或设置错误
3、 “does not have a scheme”很可能是因为找不到“hive-default.xml”。使用Eclipse调试Hive时,遇到找不到hive- default.xml的解决方法:http://bbs.hadoopor.com/thread-292-1-1.html
1、中文问题
从url中解析出中文,但hadoop中打印出来仍是乱码?我们曾经以为hadoop是不支持中文的,后来经过查看源代码,发现hadoop仅仅是不支持以gbk格式输出中文而己。
这是TextOutputFormat.class中的代码,hadoop默认的输出都是继承自FileOutputFormat来 的,FileOutputFormat的两个子类一个是基于二进制流的输出,一个就是基于文本的输出TextOutputFormat。
public class TextOutputFormat
protected static class LineRecordWriter
implements RecordWriter
private static final String utf8 = “UTF-8″;//这里被写死成了utf-8
private static final byte[] newline;
static {
try {
newline = “\n”.getBytes(utf8);
} catch (UnsupportedEncodingException uee) {
throw new IllegalArgumentException(”can’t find ” + utf8 + ” encoding”);
}
}
…
public LineRecordWriter(DataOutputStream out, String keyValueSeparator) {
this.out = out;
try {
this.keyValueSeparator = keyValueSeparator.getBytes(utf8);
} catch (UnsupportedEncodingException uee) {
throw new IllegalArgumentException(”can’t find ” + utf8 + ” encoding”);
}
}
…
private void writeObject(Object o) throws IOException {
if (o instanceof Text) {
Text to = (Text) o;
out.write(to.getBytes(), 0, to.getLength());//这里也需要修改
} else {
out.write(o.toString().getBytes(utf8));
}
}
…
}
可以看出hadoop默认的输出写死为utf-8,因此如果decode中文正确,那么将Linux客户端的character设为utf-8是可以看到中文的。因为hadoop用utf-8的格式输出了中文。
因为大多数数据库是用gbk来定义字段的,如果想让hadoop用gbk格式输出中文以兼容数据库怎么办?
我们可以定义一个新的类:
public class GbkOutputFormat
protected static class LineRecordWriter
implements RecordWriter
//写成gbk即可
private static final String gbk = “gbk”;
private static final byte[] newline;
static {
try {
newline = “\n”.getBytes(gbk);
} catch (UnsupportedEncodingException uee) {
throw new IllegalArgumentException(”can’t find ” + gbk + ” encoding”);
}
}
…
public LineRecordWriter(DataOutputStream out, String keyValueSeparator) {
this.out = out;
try {
this.keyValueSeparator = keyValueSeparator.getBytes(gbk);
} catch (UnsupportedEncodingException uee) {
throw new IllegalArgumentException(”can’t find ” + gbk + ” encoding”);
}
}
…
private void writeObject(Object o) throws IOException {
if (o instanceof Text) {
// Text to = (Text) o;
// out.write(to.getBytes(), 0, to.getLength());
// } else {
out.write(o.toString().getBytes(gbk));
}
}
…
}
然后在mapreduce代码中加入conf1.setOutputFormat(GbkOutputFormat.class)
即可以gbk格式输出中文。
2、某次正常运行mapreduce实例时,抛出错误
java.io.IOException: All datanodes xxx.xxx.xxx.xxx:xxx are bad. Aborting…
at org.apache.hadoop.dfs.DFSClient D F S O u t p u t S t r e a m . p r o c e s s D a t a n o d e E r r o r ( D F S C l i e n t . j a v a : 2158 ) a t o r g . a p a c h e . h a d o o p . d f s . D F S C l i e n t DFSOutputStream.processDatanodeError(DFSClient.java:2158) at org.apache.hadoop.dfs.DFSClient DFSOutputStream.processDatanodeError(DFSClient.java:2158)atorg.apache.hadoop.dfs.DFSClientDFSOutputStream.access 1400 ( D F S C l i e n t . j a v a : 1735 ) a t o r g . a p a c h e . h a d o o p . d f s . D F S C l i e n t 1400(DFSClient.java:1735) at org.apache.hadoop.dfs.DFSClient 1400(DFSClient.java:1735)atorg.apache.hadoop.dfs.DFSClientDFSOutputStream D a t a S t r e a m e r . r u n ( D F S C l i e n t . j a v a : 1889 ) j a v a . i o . I O E x c e p t i o n : C o u l d n o t g e t b l o c k l o c a t i o n s . A b o r t i n g … a t o r g . a p a c h e . h a d o o p . d f s . D F S C l i e n t DataStreamer.run(DFSClient.java:1889) java.io.IOException: Could not get block locations. Aborting… at org.apache.hadoop.dfs.DFSClient DataStreamer.run(DFSClient.java:1889)java.io.IOException:Couldnotgetblocklocations.Aborting…atorg.apache.hadoop.dfs.DFSClientDFSOutputStream.processDatanodeError(DFSClient.java:2143)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access 1400 ( D F S C l i e n t . j a v a : 1735 ) a t o r g . a p a c h e . h a d o o p . d f s . D F S C l i e n t 1400(DFSClient.java:1735) at org.apache.hadoop.dfs.DFSClient 1400(DFSClient.java:1735)atorg.apache.hadoop.dfs.DFSClientDFSOutputStream$DataStreamer.run(DFSClient.java:1889)
经查明,问题原因是linux机器打开了过多的文件导致。用命令ulimit -n可以发现linux默认的文件打开数目为1024,修改/ect/security/limit.conf,增加hadoop soft 65535
再重新运行程序(最好所有的datanode都修改),问题解决
3、运行一段时间后hadoop不能stop-all.sh的问题,显示报错
no tasktracker to stop ,no datanode to stop
问 题的原因是hadoop在stop的时候依据的是datanode上的mapred和dfs进程号。而默认的进程号保存在/tmp下,linux默认会每 隔一段时间(一般是一个月或者7天左右)去删除这个目录下的文件。因此删掉hadoop-hadoop-jobtracker.pid和hadoop- hadoop-namenode.pid两个文件后,namenode自然就找不到datanode上的这两个进程了。
在配置文件中的export HADOOP_PID_DIR可以解决这个问题
问题:
Incompatible namespaceIDs in /usr/local/hadoop/dfs/data: namenode namespaceID = 405233244966; datanode namespaceID = 33333244
原因:
在 每次执行hadoop namenode -format时,都会为NameNode生成namespaceID,,但是在hadoop.tmp.dir目录下的DataNode还是保留上次的 namespaceID,因为namespaceID的不一致,而导致DataNode无法启动,所以只要在每次执行hadoop namenode -format之前,先删除hadoop.tmp.dir目录就可以启动成功。请注意是删除hadoop.tmp.dir对应的本地目录,而不是HDFS 目录。
Problem: Storage directory not exist
2010-02-09 21:37:53,203 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory D:\hadoop\run\dfs_name_dir does not exist.
2010-02-09 21:37:53,203 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory D:\hadoop\run\dfs_name_dir is in an inconsistent state: storage directory does not exist or is not accessible.
solution: 是因为存储目录D:\hadoop\run\dfs_name_dir不存在,所以只需要手动创建好这个目录即可。
Problem: NameNode is not formatted
solution: 是因为HDFS还没有格式化,只需要运行hadoop namenode -format一下,然后再启动即可
bin/hadoop jps后报如下异常:
Exception in thread “main” java.lang.NullPointerException
at sun.jvmstat.perfdata.monitor.protocol.local.LocalVmManager.activeVms(LocalVmManager.java:127)
at sun.jvmstat.perfdata.monitor.protocol.local.MonitoredHostProvider.activeVms(MonitoredHostProvider.java:133)
at sun.tools.jps.Jps.main(Jps.java:45)
原因为:
系统根目录/tmp文件夹被删除了。重新建立/tmp文件夹即可。
bin/hive中出现 unable to create log directory /tmp/…也可能是这个原因
注意:已经解决了 在用put命令放到/user目录下时, 应该写/user/ 而不是 /user
INFO hdfs.DFSClient: Exception in createBlockOutputStream java.net解决办法
自己安装好Hadoop2.7.x之后,发现dfs中的/bin/hadoop fs -put命令不能够使用,报错如下:
[hadoop@master bin]$ ./hadoop fs -put …/logs/123 /wc/mytemp
16/07/27 01:29:26 INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1537)
at org.apache.hadoop.hdfs.DFSOutputStream D a t a S t r e a m e r . c r e a t e B l o c k O u t p u t S t r e a m ( D F S O u t p u t S t r e a m . j a v a : 1313 ) a t o r g . a p a c h e . h a d o o p . h d f s . D F S O u t p u t S t r e a m DataStreamer.createBlockOutputStream(DFSOutputStream.java:1313) at org.apache.hadoop.hdfs.DFSOutputStream DataStreamer.createBlockOutputStream(DFSOutputStream.java:1313)atorg.apache.hadoop.hdfs.DFSOutputStreamDataStreamer.nextBlockOutputStream(DFSOutputStream.java:1266)
at org.apache.hadoop.hdfs.DFSOutputStream D a t a S t r e a m e r . r u n ( D F S O u t p u t S t r e a m . j a v a : 449 ) 16 / 07 / 2701 : 29 : 26 I N F O h d f s . D F S C l i e n t : A b a n d o n i n g B P − 555863411 − 172.16.95.100 − 1469590594354 : b l k 1 07374182 5 1 00116 / 07 / 2701 : 29 : 26 I N F O h d f s . D F S C l i e n t : E x c l u d i n g d a t a n o d e D a t a n o d e I n f o W i t h S t o r a g e [ 172.16.95.101 : 50010 , D S − e e 00 e 1 f 8 − 5143 − 4 f 06 − 9 e f 8 − b 0 f 862 f c e 649 , D I S K ] 16 / 07 / 2701 : 29 : 26 I N F O h d f s . D F S C l i e n t : E x c e p t i o n i n c r e a t e B l o c k O u t p u t S t r e a m j a v a . n e t . N o R o u t e T o H o s t E x c e p t i o n : N o r o u t e t o h o s t a t s u n . n i o . c h . S o c k e t C h a n n e l I m p l . c h e c k C o n n e c t ( N a t i v e M e t h o d ) a t s u n . n i o . c h . S o c k e t C h a n n e l I m p l . f i n i s h C o n n e c t ( S o c k e t C h a n n e l I m p l . j a v a : 717 ) a t o r g . a p a c h e . h a d o o p . n e t . S o c k e t I O W i t h T i m e o u t . c o n n e c t ( S o c k e t I O W i t h T i m e o u t . j a v a : 206 ) a t o r g . a p a c h e . h a d o o p . n e t . N e t U t i l s . c o n n e c t ( N e t U t i l s . j a v a : 531 ) a t o r g . a p a c h e . h a d o o p . h d f s . D F S O u t p u t S t r e a m . c r e a t e S o c k e t F o r P i p e l i n e ( D F S O u t p u t S t r e a m . j a v a : 1537 ) a t o r g . a p a c h e . h a d o o p . h d f s . D F S O u t p u t S t r e a m DataStreamer.run(DFSOutputStream.java:449) 16/07/27 01:29:26 INFO hdfs.DFSClient: Abandoning BP-555863411-172.16.95.100-1469590594354:blk_1073741825_1001 16/07/27 01:29:26 INFO hdfs.DFSClient: Excluding datanode DatanodeInfoWithStorage[172.16.95.101:50010,DS-ee00e1f8-5143-4f06-9ef8-b0f862fce649,DISK] 16/07/27 01:29:26 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.net.NoRouteToHostException: No route to host at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1537) at org.apache.hadoop.hdfs.DFSOutputStream DataStreamer.run(DFSOutputStream.java:449)16/07/2701:29:26INFOhdfs.DFSClient:AbandoningBP−555863411−172.16.95.100−1469590594354:blk1073741825100116/07/2701:29:26INFOhdfs.DFSClient:ExcludingdatanodeDatanodeInfoWithStorage[172.16.95.101:50010,DS−ee00e1f8−5143−4f06−9ef8−b0f862fce649,DISK]16/07/2701:29:26INFOhdfs.DFSClient:ExceptionincreateBlockOutputStreamjava.net.NoRouteToHostException:Noroutetohostatsun.nio.ch.SocketChannelImpl.checkConnect(NativeMethod)atsun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)atorg.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)atorg.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)atorg.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1537)atorg.apache.hadoop.hdfs.DFSOutputStreamDataStreamer.createBlockOutputStream(DFSOutputStream.java:1313)
at org.apache.hadoop.hdfs.DFSOutputStream D a t a S t r e a m e r . n e x t B l o c k O u t p u t S t r e a m ( D F S O u t p u t S t r e a m . j a v a : 1266 ) a t o r g . a p a c h e . h a d o o p . h d f s . D F S O u t p u t S t r e a m DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1266) at org.apache.hadoop.hdfs.DFSOutputStream DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1266)atorg.apache.hadoop.hdfs.DFSOutputStreamDataStreamer.run(DFSOutputStream.java:449)
16/07/27 01:29:26 INFO hdfs.DFSClient: Abandoning BP-555863411-172.16.95.100-1469590594354:blk_1073741826_1002
16/07/27 01:29:26 INFO hdfs.DFSClient: Excluding datanode DatanodeInfoWithStorage[172.16.95.102:50010,DS-eea51eda-0a07-4583-9eee-acd7fc645859,DISK]
16/07/27 01:29:26 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /wc/mytemp/123.COPYING could only be replicated to 0 nodes instead of minReplication (=1). There are 2 datanode(s) running and 2 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1547)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:724)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol 2. c a l l B l o c k i n g M e t h o d ( C l i e n t N a m e n o d e P r o t o c o l P r o t o s . j a v a ) a t o r g . a p a c h e . h a d o o p . i p c . P r o t o b u f R p c E n g i n e 2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine 2.callBlockingMethod(ClientNamenodeProtocolProtos.java)atorg.apache.hadoop.ipc.ProtobufRpcEngineServer P r o t o B u f R p c I n v o k e r . c a l l ( P r o t o b u f R p c E n g i n e . j a v a : 616 ) a t o r g . a p a c h e . h a d o o p . i p c . R P C ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)atorg.apache.hadoop.ipc.RPCServer.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler 1. r u n ( S e r v e r . j a v a : 2049 ) a t o r g . a p a c h e . h a d o o p . i p c . S e r v e r 1.run(Server.java:2049) at org.apache.hadoop.ipc.Server 1.run(Server.java:2049)atorg.apache.hadoop.ipc.ServerHandler 1. r u n ( S e r v e r . j a v a : 2045 ) a t j a v a . s e c u r i t y . A c c e s s C o n t r o l l e r . d o P r i v i l e g e d ( N a t i v e M e t h o d ) a t j a v a x . s e c u r i t y . a u t h . S u b j e c t . d o A s ( S u b j e c t . j a v a : 422 ) a t o r g . a p a c h e . h a d o o p . s e c u r i t y . U s e r G r o u p I n f o r m a t i o n . d o A s ( U s e r G r o u p I n f o r m a t i o n . j a v a : 1657 ) a t o r g . a p a c h e . h a d o o p . i p c . S e r v e r 1.run(Server.java:2045) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.ipc.Server 1.run(Server.java:2045)atjava.security.AccessController.doPrivileged(NativeMethod)atjavax.security.auth.Subject.doAs(Subject.java:422)atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)atorg.apache.hadoop.ipc.ServerHandler.run(Server.java:2043)
at org.apache.hadoop.ipc.Client.call(Client.java:1475)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1459)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1255)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
put: File /wc/mytemp/123.COPYING could only be replicated to 0 nodes instead of minReplication (=1). There are 2 datanode(s) running and 2 node(s) are excluded in this operation.
[hadoop@master bin]$ service firewall
The service command supports only basic LSB actions (start, stop, restart, try-restart, reload, force-reload, status). For other actions, please try to use systemctl.
到网上找资料,好不统一找到几个解决的办法,感觉值的参考的是该文:hadoop常见错误解决办法 ,在其第14条,他给出的解决办法如下:
I have resolved the issue:
What i did:
需要补充的地方:
(1)此处CentOS7中,防火墙的方式改变,关闭防火墙改成service firewalld stop
(2)不仅是master中的firewall要关闭,所有的从节点(slave)中的防火墙也要关闭。
(3)检查所有的主机,/etc/selinux/config下的SELINUX,设置SELINUX=disable。
(4)建议关闭防火墙之后,再使用systemctl disable firewalld.service #禁止firewall开机启动
然后在进行put的时候,就成功了。截图如下:
另外一个人博主给出的解决方法如下:hadoop报错:hdfs.DFSClient: Exception in createBlockOutputStream, 大家可以尝试一下,可以这样说,新手一般都会挂掉,因为格式化namenode的时候,slave中的文件会和namenode不一致,造成DataNode节点启动失败,解决办法就是删除slave和master中所有的data节点,然后重新执行namenode格式化操作。(不只是自己水平有限还是问题不一样,这种方法初步判定为坑人)