编程点滴.LUCENE高亮代码

我们使用搜索引擎(如谷歌,百度)都会在检索结果页高亮显示检索词.这种高亮显示很醒目,能够让我们迅速的关注到我们需要的信息上.

image

Lucene 的contrib已经包含了这样的功能模块

Highlighter

在检索结果中实现高亮的代码:

public void testHits() throws Exception {

IndexSearcher searcher = new IndexSearcher(TestUtil.getBookIndexDirectory());

TermQuery query = new TermQuery(new Term("title", "action"));

TopDocs hits = searcher.search(query, 10);

QueryScorer scorer = new QueryScorer(query, "title");

Highlighter highlighter = new Highlighter(scorer);

highlighter.setTextFragmenter(

new SimpleSpanFragmenter(scorer));

Analyzer analyzer = new SimpleAnalyzer();

for (int i = 0; i < hits.scoreDocs.length; i++) {

Document doc = searcher.doc(hits.scoreDocs[i].doc);

String title = doc.get("title");

TokenStream stream = TokenSources.getAnyTokenStream(searcher.getIndexReader(),

hits.scoreDocs[i].doc,

"title",

doc,

analyzer);

String fragment =

highlighter.getBestFragment(stream, title);

System.out.println(fragment);

}

}

//输出

//JUnit in <B>Action</B>

//Lucene in <B>Action</B>

//Tapestry in <B>Action</B>

 

FastVectorHighlighter

顾名思义,FastVectorHighlighter是一个快速的高亮工具,相对于Highlighter它有三个好处:

1.FastVectorHighlighter can support fields that are tokenized by n-gram tokenizers. Highlighter cannot support such fields very well.

2.FastVectorHighlighter 可以输出不同颜色的高亮.

3.FastVectorHighlighter 可以对词组高亮.(如检索lazy dog,FastVectorHighlighter<b>lazy dog</b>,而Highlighter则是<b>dog</b>)

image

FastVectorHighlighter代码:

public class FastVectorHighlighterSample {

static final String[] DOCS = { // #A

"the quick brown fox jumps over the lazy dog", // #A

"the quick gold fox jumped over the lazy black dog", // #A

"the quick fox jumps over the black dog", // #A

"the red fox jumped over the lazy dark gray dog" // #A

};

static final String QUERY = "quick OR fox OR \"lazy dog\"~1"; // #B

static final String F = "f";

static Directory dir = new RAMDirectory();

static Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);

public static void main(String[] args) throws Exception {

if (args.length != 1) {

System.err.println("Usage: FastVectorHighlighterSample <filename>");

System.exit(-1);

}

makeIndex(); // #C

searchIndex(args[0]); // #D

}

static void makeIndex() throws IOException {

IndexWriter writer = new IndexWriter(dir, analyzer, true, MaxFieldLength.LIMITED);



for(String d : DOCS){

Document doc = new Document();

doc.add(new Field(F, d, Store.YES, Index.ANALYZED,

TermVector.WITH_POSITIONS_OFFSETS));

writer.addDocument(doc);

}

writer.close();

}

static void searchIndex(String filename) throws Exception {

QueryParser parser = new QueryParser(Version.LUCENE_CURRENT,

F, analyzer);

Query query = parser.parse(QUERY);

FastVectorHighlighter highlighter = getHighlighter(); // #F

FieldQuery fieldQuery = highlighter.getFieldQuery(query); // #G

IndexSearcher searcher = new IndexSearcher(dir);

TopDocs docs = searcher.search(query, 10);

FileWriter writer = new FileWriter(filename);

writer.write("<html>");

writer.write("<body>");

writer.write("<p>QUERY : " + QUERY + "</p>");

for(ScoreDoc scoreDoc : docs.scoreDocs) {

String snippet = highlighter.getBestFragment( // #H

fieldQuery, searcher.getIndexReader(), // #H

scoreDoc.doc, F, 100 ); // #H

if (snippet != null) { // #I

writer.write(scoreDoc.doc + " : " + snippet + "<br/>"); // #I

}

}

writer.write("</body></html>");

writer.close();

searcher.close();

}

static FastVectorHighlighter getHighlighter() {

FragListBuilder fragListBuilder = new SimpleFragListBuilder(); // #J

FragmentsBuilder fragmentBuilder = // #K

new ScoreOrderFragmentsBuilder( // #K

BaseFragmentsBuilder.COLORED_PRE_TAGS, // #K

BaseFragmentsBuilder.COLORED_POST_TAGS); // #K

return new FastVectorHighlighter(true, true, // #L

fragListBuilder, fragmentBuilder); // #L

}

}

#A 示例文档

#B 示例查询语句

#C 创建索引

#D 检索并打印结果

#E Store.YES 并且 TermVector.WITH_POSITIONS_OFFSETS

#F 获得一个 FastVectorHighlighter实例

#G 创建FieldQuery

#H 高亮片断

#I 打印高亮后片断

#J 创建 SimpleFragListBuilder

#K 创建多颜色标签 ScoreOrderFragmentsBuilder

#L 创建 FastVectorHighlighter 实例

 

LUCENE.NET QQ交流群(81361051)

你可能感兴趣的:(Lucene)