边学边记(二) 索引查询

lucene 索引查询 步骤 首选我们要打开 索引的所在目录 索引目录可以是物理磁盘目录也可以是RAM 内存中的索引目录

Eclipse的全文检索好像也是lucene做的 具体的没去看过代码

Ecilpse中的Maven的插件页使用了lucene对依赖库做了索引。

根据定位的索引目录 创建IndexReader 来读取index了,具体方法:

 

IndexReader open(final Directory directory, boolean readOnly)

参数一就是第一步当中我们打开的索引目录了,第二个参数也很明白了 一般为true吧 单纯的读取器一般也不会来修改index信息吧

 

第三部就是创建索引检索器了IndexSearcher

lucene提供了三个构造方法来实例化检索器

/** Creates a searcher searching the index in the named * directory, with readOnly=true * @throws CorruptIndexException if the index is corrupt * @throws IOException if there is a low-level IO error * @param path directory where IndexReader will be opened */ public IndexSearcher(Directory path) throws CorruptIndexException, IOException { this(IndexReader.open(path, true), true); } /** Creates a searcher searching the index in the named * directory. You should pass readOnly=true, since it * gives much better concurrent performance, unless you * intend to do write operations (delete documents or * change norms) with the underlying IndexReader. * @throws CorruptIndexException if the index is corrupt * @throws IOException if there is a low-level IO error * @param path directory where IndexReader will be opened * @param readOnly if true, the underlying IndexReader * will be opened readOnly */ public IndexSearcher(Directory path, boolean readOnly) throws CorruptIndexException, IOException { this(IndexReader.open(path, readOnly), true); } /** Creates a searcher searching the provided index. */ public IndexSearcher(IndexReader r) { this(r, false); } /** Expert: directly specify the reader, subReaders and * their docID starts. * * <p><b>NOTE:</b> This API is experimental and * might change in incompatible ways in the next * release.</font></p> */ public IndexSearcher(IndexReader reader, IndexReader[] subReaders, int[] docStarts) { this.reader = reader; this.subReaders = subReaders; this.docStarts = docStarts; closeReader = false; }

 

看这几个公开的构造方法最终都是用的其私有构造方法

private IndexSearcher(IndexReader r, boolean closeReader)

从代码中得知 closeReader 在外部引入reader的时候是false 其他的指明了directory的都是为true以便由searcher来管理reader

 

编写检索的基类Searcher

 

/**************** * *Create Class:Searcher.java *Author:a276202460 *Create at:2010-5-25 */ package com.rich.lucene.searcher; import java.io.File; import java.text.DateFormat; import java.text.SimpleDateFormat; import java.util.Calendar; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.index.IndexReader; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.TopDocs; import org.apache.lucene.search.TopScoreDocCollector; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; import com.rich.lucene.util.IndexKeys; public class Searcher { private DateFormat df = new SimpleDateFormat("MM/dd/yyyy"); public Directory getDirectory() throws Exception { Directory directory = null; directory = FSDirectory.open(new File(IndexKeys.TXT_INDEX_BASE_DIR + "//index2")); return directory; } public IndexReader getReader(Directory directory) throws Exception { return IndexReader.open(directory, true); } public Analyzer getAnalyzer() throws Exception { return new StandardAnalyzer(Version.LUCENE_CURRENT); } public void Close(Directory directory, IndexReader reader) throws Exception { if (reader != null) reader.close(); if (directory != null) directory.close(); } public void Search(Query query) throws Exception { Directory directory = null; IndexReader reader = null; try { //获得索引目录 directory = this.getDirectory(); //创建索引的读取器 reader = this.getReader(directory); //根据读取器创建查询器 IndexSearcher searcher = new IndexSearcher(reader); /* 结果集合容器 一般必须将所有的查询结果都查询出来,如果期望的查询容器设置为 * 10000 甚至更多那么你就该想想内存的使用了 单纯的定义一个几万的空数组内存的 * 分配也不可小看。 * * TopScoreDocCollector .create(int numHits, boolean docsScoredInOrder) 第一个就是期望的最多查询结果数 第二个参数:是否按照doc的得分对结果排序 */ TopScoreDocCollector results = TopScoreDocCollector .create(10, true); long starttime = System.currentTimeMillis(); /* * 根据Query将查询结果放入到results集合中,results集合查询出来的信息 * 有总的结果数 TotalHits 和score 得分的前N个结果的doc的索引 N即为results * 集合中执行的结果数和真实结果数种的Max值 */ searcher.search(query, results); System.out.println(">>>>>>>>>>>> User Query:" + query); System.out.println(">>>>>>>>>>>> Query Use:" + (System.currentTimeMillis() - starttime) + "ms"); System.out.println(">>>>>>>>>>>> Total Result:" + results.getTotalHits()); TopDocs topdocs = results.topDocs(); ScoreDoc[] docs = topdocs.scoreDocs; Document doc = null; for (int i = 0; i < docs.length; i++) { doc = searcher.doc(docs[i].doc); System.out.println("************************************"); System.out.println("Result Score:" + docs[i].score +"Result Type:" + doc.get(IndexKeys.TEST_TYPE) + " Result Biz:" + doc.get(IndexKeys.TEST_BIZ) + "/nResult No:" + doc.get(IndexKeys.TEST_NUMBERIC) + " Result StartDate:" + getDate(doc.get(IndexKeys.TEST_STARTDATE)) + " Result EndDate:" + getDate(doc.get(IndexKeys.TEST_ENDDATE))); System.out.println("----Result Content:" + doc.get(IndexKeys.TEST_CONTENT) + "---"); System.out.println("************************************"); } } finally { this.Close(directory, reader); } } private String getDate(String longstr) { long time = Long.parseLong(longstr); Calendar c = Calendar.getInstance(); c.setTimeInMillis(time); return df.format(c.getTime()); } }

 

好了有了基类中的Search(Query query) 方法我们就可以测试各种的query 规则了

lucene的所有Query

边学边记(二) 索引查询_第1张图片

 

 

 

 

边学边记(二) 索引查询_第2张图片

 

API里两处subclass list 慢慢看吧 好些单词都不知道 还得查字典 那是以后的事了

你可能感兴趣的:(exception,calendar,Lucene,query,全文检索,Path)