Error: Exceeded limits on number of counters - Counters=120 Limit=120

今天用eclipse开发hadoop mapreduce程序,实现TF-IDF算法。开始测试的时候上传了10个文件一切正常,但当总的文件个数超过一百多个时eclipse开始报错,错误出现在ruduce过程中,此时输入reduce的数据是所有文件经过初步处理的合集,具体错误如下:

org.apache.hadoop.mapred.Counters$CountersExceededException: Error: Exceeded limits on number of counters - Counters=120 Limit=120
at org.apache.hadoop.mapred.Counters$Group.getCounterForName(Counters.java:316)
at org.apache.hadoop.mapred.Counters.findCounter(Counters.java:450)
at org.apache.hadoop.mapred.Task$TaskReporter.getCounter(Task.java:601)
at org.apache.hadoop.mapred.Task$TaskReporter.getCounter(Task.java:541)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.getCounter(TaskInputOutputContext.java:88)
at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs$RecordWriterWithCounter.write(MultipleOutputs.java:303)
at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write(MultipleOutputs.java:393)
at TF_IDF$Reduce2.reduce(TF_IDF.java:136)
at TF_IDF$Reduce2.reduce(TF_IDF.java:1)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:650)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:262)
14/03/25 23:30:24 INFO mapred.JobClient: Job complete: job_local_0002
14/03/25 23:30:24 INFO mapred.JobClient: Counters: 21
14/03/25 23:30:24 INFO mapred.JobClient:   FileSystemCounters
14/03/25 23:30:24 INFO mapred.JobClient:     FILE_BYTES_READ=7223523
14/03/25 23:30:24 INFO mapred.JobClient:     HDFS_BYTES_READ=4970250
14/03/25 23:30:24 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=10111898
14/03/25 23:30:24 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=2973713
14/03/25 23:30:24 INFO mapred.JobClient:   File Input Format Counters 
14/03/25 23:30:24 INFO mapred.JobClient:     Bytes Read=2973713
14/03/25 23:30:24 INFO mapred.JobClient:   Map-Reduce Framework
14/03/25 23:30:24 INFO mapred.JobClient:     Map output materialized bytes=3167273
14/03/25 23:30:24 INFO mapred.JobClient:     Map input records=96777
14/03/25 23:30:24 INFO mapred.JobClient:     Reduce shuffle bytes=0
14/03/25 23:30:24 INFO mapred.JobClient:     Spilled Records=96777
14/03/25 23:30:24 INFO mapred.JobClient:     Map output bytes=2973713
14/03/25 23:30:24 INFO mapred.JobClient:     Total committed heap usage (bytes)=991494144
14/03/25 23:30:24 INFO mapred.JobClient:     CPU time spent (ms)=0
14/03/25 23:30:24 INFO mapred.JobClient:     SPLIT_RAW_BYTES=112
14/03/25 23:30:24 INFO mapred.JobClient:     Combine input records=0
14/03/25 23:30:24 INFO mapred.JobClient:     Reduce input records=0
14/03/25 23:30:24 INFO mapred.JobClient:     Reduce input groups=0
14/03/25 23:30:24 INFO mapred.JobClient:     Combine output records=0
14/03/25 23:30:24 INFO mapred.JobClient:     Physical memory (bytes) snapshot=0
14/03/25 23:30:24 INFO mapred.JobClient:     Reduce output records=0
14/03/25 23:30:24 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=0
14/03/25 23:30:24 INFO mapred.JobClient:     Map output records=96777
14/03/25 23:30:24 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /user/zcf/output/_temporary/_attempt_local_0002_r_000000_0/200610-547-r-00000 File does not exist. Holder DFSClient_NONMAPREDUCE_568041086_1 does not have any open files.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1720)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1711)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1619)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:736)
at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)


at org.apache.hadoop.ipc.Client.call(Client.java:1107)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
at $Proxy1.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
at $Proxy1.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3686)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3546)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2749)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2989)

第一个错误的原因是map/reduce  的计数器超过最大值120,于是修改hadoop的配置,在/conf/mapred-site.xml 追加:

  <property> 
   <name>mapreduce.job.counters. limit</name>
    <value>1400</value>
  </property>
编译打包用命令行运行果然没错。那为什么第二个错误文件不存在了,我推测可能时因为hadoop会把超出counter max的部分删除掉,导致后面取数据时找不到文件。然后在eclipse中运行还是报上面的错误,于是想着是不是因为我虽然修改的配置,但eclipse并没有改变它原来关于hadoop的配置啊,于是打开eclipse中的map/reduces location 中advanted parameters査看,果然参数mapreduce.job.counters.limit依然为120,于是我删除了原有的location,重新建立hadoop location,竟然发现参数依旧错误依旧。于是想着可能eclipse根本不读取conf中的配置,因为advanted parameters 中很多参数在conf配置文件下找不到。那么advanted parameters去那儿读配置文件了,肯定是hadoop跟目录下,突然想到会不会是jar包中的配置呢,因为hadoop启动运行就是jar包文件,于是修改了hadoop-core-1.1.2.jar中的mapred-default.xml文件,然后运行果然没错了。回想下也有道理,因为eclipse是方便用户开发提供了一个运行hadoop平台,eclipse只能呆板的读取hadoop的相关配置,所以用eclipse开发mr程序要多思考才行。本人刚接触hadoop,一些言论难免有错,希望大家指正,多多交流,共同学习!

你可能感兴趣的:(eclipse,mapreduce,hadoop)