Lucene 6.5.0整合IKAnalyzer至Maven

1.   下载IKAnalyzer6.5.0.jar
度盘链接:http://pan.baidu.com/s/1jH4NY66

2.   打开cmd, 执行以下命令手动将jar包安装到本地maven仓库:

mvn install:install-file -Dfile=C:\Users\Karn\Desktop\大三下\信息检索\Project\IKAnalyzer6.5\IKAnalyzer6.5.0.jar-DgroupId=com.lucene -DartifactId=ikAnalyzer -Dversion=6.5.0 -Dpackaging=jar-DgeneratePom=true

其中,-Dfile=jar包的完整路径。我的jar包存放路径是C:\Users\Karn\Desktop\大三下\信息检索\Project\IKAnalyzer6.5\IKAnalyzer6.5.0.jar


3.   安装完成后,在pom.xml里加上以下依赖代码即可:


 com.lucene
 ikAnalyzer
  6.5.0

注意:

在lucene 4.6.0以上版本使用IKAnalyzer时可能会出现以下异常:

java.lang.illegalstateexception:tokenstream contract violation: reset()/close() call missing, reset() calledmultiple times, or subclass does not call super.reset(). please see javadocs oftokenstream class for more information about the correct consuming workflow.


发现是lucene从4.6.0开始tokenstream使用方法更改的问题,在使用incrementtoken方法前必须调用reset方法,详见api http://lucene.apache.org/core/4_6_0/core/index.html


theworkflow of the new tokenstream api is as follows:

1.     instantiationof tokenstream/tokenfilters which add/get attributes to/fromthe attributesource.

2.     the consumercalls reset().

3.     the consumer retrievesattributes from the stream and stores local references to all attributes itwants to access.

4.     the consumercalls incrementtoken() until it returns false consuming theattributes after each call.

5.     the consumercalls end() so that any end-of-stream operations can be performed.

6.     the consumercalls close() to release any resource when finished usingthe tokenstream.

更改代码为如下运行正常

IKAnalyzer analyzer = newIKAnalyzer(true);
StringReader reader = new StringReader(news);
TokenStream ts = analyzer.tokenStream("", reader);
CharTermAttribute term = ts.getAttribute(CharTermAttribute.class);
ts.reset();
while(ts.incrementToken()){
   System.out.print(term.toString()+"|");
}
analyzer.close();
reader.close();




你可能感兴趣的:(Lucene)