Eclipse编译Nutch1.4

官方安装方法:http://wiki.apache.org/nutch/RunNutchInEclipse

 

本文参考:http://zettadata.blogspot.com/2011/12/eclipsenutch.html

 

1、在Eclipse中安装subclipse 1.6,如果安装subclipse 1.8,则需要安装javahl 1.7,否则会报不兼容错误。

安装完subclilpse,重启eclipse,会报如下错误:(我用的centos5.6 64位)

Failed to load JavaHL Library.
These are the errors that were encountered:
no libsvnjavahl-1 in java.library.path
no svnjavahl-1 in java.library.path
原因请查看http://subclipse.tigris.org/wiki/JavaHL#head-5ccce53a67ca6c3965de863ae91e2642eab537de

yum install subversion-javahl.x86_64 (1.6)

若不是64位系统,可用命令查询yum search subversion-javahl

 

安装后查看subversion-javahl.x86_64安装路径

rpm -ql subversion-javahl

/usr/lib64/libsvnjavahl-1.la

/usr/lib64/libsvnjavahl-1.so

/usr/lib64/libsvnjavahl-1.so.0

/usr/lib64/libsvnjavahl-1.so.0.0.0

/usr/lib64/svn-javahl

/usr/lib64/svn-javahl/include

/usr/lib64/svn-javahl/svn-javahl.jar

 

修改eclipse.ini,在-vmargs下面添加如下行:

-Djava.library.path=/usr/lib64

 

重启eclipse,会出现以下错误:

Subversion 1.6contains a bug that causes Eclipse to crash when Subversion tries to interact with the GNOME keyring via the Subversion JavaHL API. We recommend that you disable this feature so that you can use Subversion from Eclipse.

这时需要修改文件~/.subversion/config,添加如下行:

password-stores =

 

2、在Eclipse中安装IvyDE

 

3、在Eclipse中安装m2e

http://eclipse.org/m2e/download/

http://download.eclipse.org/technology/m2e/releases

 

4、安装Nutch,在Eclipse中选中File->New->Project->SVN

5、建立新的档案库位置:https://svn.apache.org/repos/asf/nutch/trunk

 

 

在Eclipse中设置Nutch环境

1、在项目中选择Nutch-->Properties,选择Java Build Path。

2、在Source中Remove Nutch/src,然后Add Folder Nutch/src/bin Nutch/src/java Nutch/src/test Nutch/src/testresources

3、展开Nutch/src/plugin,并将每个子目录中的src/java src/test勾选上

4、切换到Libraries分页,选择Add Class Folder,添加Nutch/conf

5、同样是Libraries分页,选择Add JARs,添加src/plugin/urlfilter-automaton/lib/automaton.jar 及 src/plugin/parse-swf/lib/javaswf.jar

6、同样是Libraries分页,选择Add Library,选择IvyDE Managed Dependencies,选择Nutch/ivy/ivy.xml文件,并在下一步骤中将所有的设定值(Configuration)都勾选。

7、切换至Order and Export分页,找出Nutch/conf目录,并将其移到最上方(Top)

 

设置Nutch

    请参考Nutch官方网站Wiki中的指南 http://wiki.apache.org/nutch/NutchTutorial

 

在conf/nutch-site.xml中配置

<property>
 <name>http.agent.name</name>
 <value>My Nutch Spider</value>
</property>
<property>
 <name>plugin.folders</name>
 <value>./src/plugin</value>
</property>

注意:plugin.folders的值是./src/plugin,不是../src/plugin。否则在运行crawl类时会报以下错误:

Exception in thread "main" java.io.IOException: Job failed!

 

 

新建目录urls,在目录中新建文件seed.txt,内容为http://nutch.apache.org/

 

修改conf/regex-urlfilter.txt,把

 

# accept anything else
+.

改成

   +^http://([a-z0-9]*\.)*nutch.apache.org/ 


    请确认在$NUTCH_HOME/conf/nutch-site.xml设置"plugin.folders"属性值为"../src/plugin"

 

为项目添加ivy依赖:

右键 属性->Java Build Path->Libraries->Add Library...->IvyIDE Managed Dependencies

 

使用Ant编译Nutch

 

 

 

你可能感兴趣的:(eclipse)