使用ant驱动nutch crawl

转自:http://nhy520.javaeye.com/blog/393804

 

在Windows下运行Nutch,很简单,只要你能执行Crawl这个类就行,写一个Ant脚本放在Nuthc的根目录下执行它就OK,内容如下

<project name="nutch-crawl" default="crawl" basedir="."> <property name="lib.dir" location="lib"/> <property name="conf.dir" location="conf"/> <path id="project.classpath"> <fileset dir="." includes="nutch-*.jar"/> <fileset dir="lib" /> <pathelement path="."/> <pathelement path="${conf.dir}"/> </path> <target name="crawl" > <echo>crwaling starting</echo> <property name="JVM.extra.args" value="-Xmx512m" /> <java classname="org.apache.nutch.crawl.Crawl" classpathref="project.classpath" fork="true"> <jvmarg line="${JVM.extra.args}"/> <arg value="D:/nutch/urls"/> <arg value="-dir"/> <arg value="D:/nutch/crawl"/> <arg value="-depth"/> <arg value="3"/> <arg value="-threads"/> <arg value="15"/> </java> <echo>crwaling finished</echo> </target> </project>

 

启动bulid.xml批处理文件run.bat(放在Nuthc的根目录,假若工程放在D:盘下)

@echo off   
D:

cd nutch
ant
pause

 

你可能感兴趣的:(java,windows,ant,脚本,Path)