个人技术博客:http://demi-panda.com
搜索引擎相关开源框架,一开始是lucene,这两天看了看solr,下了一个最新版本,配置了下,遇到一些问题。解决了一些,还有一些没有解决,这里将我的一些已经解决的问题,分享给大家。
1、下载solr1.4 http://apache.freelamp.com/lucene/solr/ (注:这里有及时solr的最新版本)
2、下载IKAnalyzer3.2.3Stable http://code.google.com/p/ik-analyzer/downloads/list (注:这里有IKAnalyzer及时的最新版本,也可附件直接下载)
3、1.4以前的版本不知道是否要扩展BaseTokenizerFactory 1.4的版本必须扩展BaseTokenizerFactorypackage com.analysis.util;
import java.io.Reader;
import org.apache.lucene.analysis.Tokenizer;
import org.apache.solr.analysis.BaseTokenizerFactory;
import org.wltea.analyzer.lucene.IKAnalyzer;
/**
* 中文分词
* @author Denghaiping
* @date 2010-8-14
*/
public class ChineseTokenizerFactory extends BaseTokenizerFactory
{
/**
* 重写父类方法
*/
public Tokenizer create(Reader input) {
return (Tokenizer)new IKAnalyzer().tokenStream("text", input);
}
}
5、然后修改schema.xml,粗体为修改部分
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<!-- 默认配置
<tokenizer class="solr.WhitespaceTokenizerFactory"/> -->
<!-- 添加IKAnalyzer分词 -->
<tokenizer class="com.analysis.util.ChineseTokenizerFactory" isMaxWordLength="false"/>
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<!-- Case insensitive stop word removal.
add enablePositionIncrements=true in both the index and query
analyzers to leave a 'gap' for more accurate phrase queries.
-->
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
</analyzer>
<analyzer type="query">
<!-- 默认配置
<tokenizer class="solr.WhitespaceTokenizerFactory"/> -->
<!-- 添加IKAnalyzer -->
<tokenizer class="com.analysis.util.ChineseTokenizerFactory" isMaxWordLength="true"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
</analyzer>
</fieldType>
6、将它打包放入solr.war中同时还有IK的jar包。如果你不想打包,请去附件下载已经打好的包。或者直接放IK的jar包与所打的包放入apache-tomcat-6.0.26\webapps\solr\WEB-INF\lib下