Nutch org.apache.hadoop.util.DiskChecker$DiskErrorException

今天在用nutch抓数据是总是报错:

> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local
directory for taskTracker/jobcache/job_local_0001/attempt_local_0001_m_000000_0/output/spill0.out
>         at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:335)
>         at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
>         at org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:107)
>         at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:930)
>         at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:842)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>         at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)
> 2011-07-10 19:02:25,778 FATAL crawl.Generator - Generator: java.io.IOException: Job failed!
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
>         at org.apache.nutch.crawl.Generator.generate(Generator.java:472)
>         at org.apache.nutch.crawl.Generator.run(Generator.java:618)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.nutch.crawl.Generator.main(Generator.java:581)


最后看了下网上的信息,最后发现是hadoop把/tmp文件夹给塞满了。只要把/tmp/hadoop-root/mapred 文件夹删除即可。再重新运行就不会出错了。



你可能感兴趣的:(web,generator,hadoop)