Lucene中文分词

今天闲着没事弄了下IK Analyzer,由于mydict.dic和/mydict.dic纠结了半天,加“/”扩展分词和停用词都不起作用,特此记录下

lucene的中文分词不好用,所以从https://code.google.com/p/ik-analyzer/downloads/detail?name=IKAnalyzer2012_u6.zip&can=2&q=这里下载了  IKAnalyzer2012_u6.zip

@Test
public void testChinese2(){
//单字分词
String content = "我爱北京天安门";
try {
this.testAnalyzer(new CJKAnalyzer(Version.LUCENE_36), content);
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
@Test
public void testChinese3(){
//单字分词
String content = "lucene 利用IK Analyzer对持之一行进行分词";
try {
this.testAnalyzer(new IKAnalyzer(), content);
} catch (Exception e) {
e.printStackTrace();
}
}
public void testAnalyzer(Analyzer analyzer,String content)throws Exception{
TokenStream tokenStream = analyzer.tokenStream("content", new StringReader(content));
tokenStream.addAttribute(CharTermAttribute.class);
while(tokenStream.incrementToken()){
CharTermAttribute termAttribute = tokenStream.getAttribute(CharTermAttribute.class);
System.out.println(termAttribute.toString());
}
}
Lucene中文分词

IKAnalyzer.cfg.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">  
<properties>  
<comment>IK Analyzer 扩展配置</comment>
<!--用户可以在这里配置自己的扩展字典 -->
<entry key="ext_dict">mydict.dic</entry> 

<!--用户可以在这里配置自己的扩展停止词字典
<entry key="ext_stopwords">stopword.dic</entry> 
-->
</properties>
@linliangyi

你可能感兴趣的:(java,开源,Lucene,IKAnalyzer)