failed with: java.lang.NullPointerException

failed with: java.lang.NullPointerException



需要在nutch的配置文件 'conf/nutch-site.xml'. 里设置如下,不然就报上面的错误了。



当然在crawl-urlfilter.txt里面也要相应于 urls/url.txt里的域名进行设置。





<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>



<!-- Put site-specific property overrides in this file. -->



<configuration>

<property>

<name>http.agent.name</name>

<value>MySearch</value>

<description>My Search Engine</description>

</property>



<property>

<name>http.agent.description</name>

<value></value>

<description>Further description of our bot- this text is used in

the User-Agent header. It appears in parenthesis after the agent name.

</description>

</property>



<property>

<name>http.agent.url</name>

<value></value>

<description>A URL to advertise in the User-Agent header. This will

appear in parenthesis after the agent name. Custom dictates that this

should be a URL of a page explaining the purpose and behavior of this

crawler.

</description>

</property>



<property>

<name>http.agent.email</name>

<value></value>

<description>An email address to advertise in the HTTP 'From' request

header and User-Agent header. A good practice is to mangle this

address (e.g. 'info at example dot com') to avoid spamming.

</description>

</property>



</configuration>

 

你可能感兴趣的:(failed with: java.lang.NullPointerException)