Nutch中添加特定域(field)搜索方法

1. WEB-INF/classes/custom-fields.xml里添加: 

  <entry key="field1.name">title</entry> <entry key="field1.indexed">yes</entry> <entry key="field1.stored">yes</entry> <entry key="field1.tokenized">yes</entry> <entry key="field1.boost">2.0</entry> <entry key="field1.multi">false</entry> <entry key="field2.name">content</entry> <entry key="field2.indexed">yes</entry> <entry key="field2.stored">no</entry> <entry key="field2.tokenized">yes</entry> <entry key="field2.boost">1.0</entry> <entry key="field2.multi">false</entry> 

要和自己建索引时候的设置一致

 

 

2. plugin/query-custom/plugin.xml里修改: 
<extension id="org.apache.nutch.searcher.custom" name="Nutch Custom Field Query Filter" point="org.apache.nutch.searcher.QueryFilter"> <implementation id="CustomQueryFilter" class="org.apache.nutch.searcher.custom.CustomFieldQueryFilter"> <parameter name="fields" value="lang,content,title" /> </implementation> </extension>

 

3.在nutch-default.xml中添加插件query-custom:

<property> <name>plugin.includes</name> <value>protocol-http|urlfilter-regex|parse-(text|html|js)|index-(basic|anchor)|query-(basic|site|url|custom)|response-(json|xml)|summary-lucene|scoring-opic|urlnormalizer-(pass|regex|basic)</value> <description> </description> </property>
4. 重启tomcat 
之后可以使用content:XXX或者title:XXX只搜索content或者title了。

 

你可能感兴趣的:(tomcat,filter,basic,query,regex,extension)