【问题】Ubuntu 12.04 nutch 2.3.1 出现问题总结

在安装使用nutch的过程中我遇到了不少问题,我使用的平台是Ubuntu 12.04 32位,nutch安装环境为jdk1.8.0_121,hbase0.98.8,solr4.10.3。

参考博客为:
1、http://blog.csdn.net/freedomboy319/article/details/44172277
2、http://blog.csdn.net/a973893384/article/details/49666063

目前已经基本安装成功,但是在抓取时还是会出现一些问题:

IndexingJob: done.
SOLR dedup -> http://localhost:8983/solr
~/lab1/NUTCH_HOME/runtime/local/bin/nutch solrdedup -D mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D mapred.reduce.tasks.speculative.execution=false -D mapred.map.tasks.speculative.execution=false -D mapred.compress.map.output=true http://localhost:8983/solr
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:~/lab1/NUTCH_HOME/runtime/local/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:~/lab1/NUTCH_HOME/runtime/local/lib/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Exception in thread "main" java.lang.RuntimeException: job failed: name=apache-nutch-2.3.1.jar, jobid=job_local365318350_0001
	at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:120)
	at org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:383)
	at org.apache.nutch.indexer.solr.SolrDeleteDuplicates.run(SolrDeleteDuplicates.java:393)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.nutch.indexer.solr.SolrDeleteDuplicates.main(SolrDeleteDuplicates.java:403)
Error running:
  ~/lab1/NUTCH_HOME/runtime/local/bin/nutch solrdedup -D mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D mapred.reduce.tasks.speculative.execution=false -D mapred.map.tasks.speculative.execution=false -D mapred.compress.map.output=true http://localhost:8983/solr
Failed with exit value 1.

经过查询发现是有SLF4J冲突文件,只要删除其中一个冲突问题就解决了,也可以正常爬到数据。

但是index还是无法建立,在同一个地方继续报错,所以需要改进

SOLR dedup -> http://localhost:8983/solr/
/home/silvia/lab1/NUTCH_HOME/runtime/local/bin/nutch solrdedup -D mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D mapred.reduce.tasks.speculative.execution=false -D mapred.map.tasks.speculative.execution=false -D mapred.compress.map.output=true http://localhost:8983/solr/
Exception in thread "main" java.lang.RuntimeException: job failed: name=apache-nutch-2.3.1.jar, jobid=job_local2020123009_0001
	at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:120)
	at org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:383)
	at org.apache.nutch.indexer.solr.SolrDeleteDuplicates.run(SolrDeleteDuplicates.java:393)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.nutch.indexer.solr.SolrDeleteDuplicates.main(SolrDeleteDuplicates.java:403)
Error running:
  /home/silvia/lab1/NUTCH_HOME/runtime/local/bin/nutch solrdedup -D mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D mapred.reduce.tasks.speculative.execution=false -D mapred.map.tasks.speculative.execution=false -D mapred.compress.map.output=true http://localhost:8983/solr/
Failed with exit value 1.

待更新。。。

你可能感兴趣的:(nutch学习)