hanyuanbo

Lucene 3.0.2 代码分析

持续更新

1 Document 和 Field

2 IndexWriter

3 IndexReader

4 Lucene中的倒排实现

5 IndexSearcher

6 Analyzer

7 Directory

8 Query、Sort和Filter

9 Lucene中的Ranking算法以及改进

1. Document 和 Field

Document和Field在索引创建的过程中必不可少。而Document和Field可以理解成传统的关系型数据库中的记录和字段的关系，而字段可以有很多个，那么Document中可以添加很多个Field，方便满足各种不同的查询。如Field可以是文件内容、文件名称、创建时间或者是修改时间等等。而Field中的属性有：是否存储(this.isStored = store.isStored()) 是否索引( this.isIndexed = index.isIndexed())、是否分词(this.isTokenized = index.isAnalyzed())，根据不同的需要来进行选择。如文档内容不需要存储，但需要被索引。根据底层的源代码知道有一些限制的，比如不能有这样一个个Field，既不index也不store。

Document中的主要方法就是对Field的增删查操作，3.0.2中的主要API如下：

 void	add(Fieldable field) 
          Adds a field to a document.
 String	get(String name) 
          Returns the string value of the field with the given name if any exist in this document, or null.
 Field	getField(String name) 
          Returns a field with the given name if any exist in this document, or null.
 List<Fieldable>	getFields() 
          Returns a List of all the fields in a document.
 Field[]	getFields(String name) 
          Returns an array of Fields with the given name.
 void	removeField(String name) 
          Removes field with the specified name from the document.
 void	removeFields(String name) 
          Removes all fields with the given name from the document.
 String	toString() 
          Prints the fields of a document for human consumption.
 ...

在Field中，主要的两个构造函数如下，帮助理解Field属性(可以自行查看源文件进行阅读)


  /**
   * Create a field by specifying its name, value and how it will
   * be saved in the index.
   * 
   * @param name The name of the field
   * @param internName Whether to .intern() name or not
   * @param value The string to process
   * @param store Whether <code>value</code> should be stored in the index
   * @param index Whether the field should be indexed, and if so, if it should
   *  be tokenized before indexing 
   * @param termVector Whether term vector should be stored
   * @throws NullPointerException if name or value is <code>null</code>
   * @throws IllegalArgumentException in any of the following situations:
   * <ul> 
   *  <li>the field is neither stored nor indexed</li> 
   *  <li>the field is not indexed but termVector is <code>TermVector.YES</code></li>
   * </ul> 
   */ 
  public Field(String name, boolean internName, String value, Store store, Index index, TermVector termVector) {
    if (name == null)
      throw new NullPointerException("name cannot be null");
    if (value == null)
      throw new NullPointerException("value cannot be null");
    if (name.length() == 0 && value.length() == 0)
      throw new IllegalArgumentException("name and value cannot both be empty");
    if (index == Index.NO && store == Store.NO)
      throw new IllegalArgumentException("it doesn't make sense to have a field that "
         + "is neither indexed nor stored");
    if (index == Index.NO && termVector != TermVector.NO)
      throw new IllegalArgumentException("cannot store term vector information "
         + "for a field that is not indexed");
          
    if (internName) // field names are optionally interned
      name = StringHelper.intern(name);
    
    this.name = name; 
    
    this.fieldsData = value;

    this.isStored = store.isStored();
   
    this.isIndexed = index.isIndexed();
    this.isTokenized = index.isAnalyzed();
    this.omitNorms = index.omitNorms();
    if (index == Index.NO) {
      this.omitTermFreqAndPositions = false;
    }    

    this.isBinary = false;

    setStoreTermVector(termVector);
  }

 /**
   * Create a tokenized and indexed field that is not stored, optionally with 
   * storing term vectors.  The Reader is read only when the Document is added to the index,
   * i.e. you may not close the Reader until {@link IndexWriter#addDocument(Document)}
   * has been called.
   * 
   * @param name The name of the field
   * @param reader The reader with the content
   * @param termVector Whether term vector should be stored
   * @throws NullPointerException if name or reader is <code>null</code>
   */ 
  public Field(String name, Reader reader, TermVector termVector) {
    if (name == null)
      throw new NullPointerException("name cannot be null");
    if (reader == null)
      throw new NullPointerException("reader cannot be null");
    
    this.name = StringHelper.intern(name);        // field names are interned
    this.fieldsData = reader;
    
    this.isStored = false;
    this.isIndexed = true;
    this.isTokenized = true;
    this.isBinary = false;
    
    setStoreTermVector(termVector);
  }

而其他的构造函数也只是调用这两个个主要的构造函数。如几个比较常用的构造函数;

  public Field(String name, String value, Store store, Index index) {
    this(name, value, store, index, TermVector.NO);
  }

  public Field(String name, Reader reader) {
    this(name, reader, TermVector.NO);
  }

不过读读源代码中Field中的三个静态枚举变量Store、Index和TermVector的话，可以更清楚的理解Field中各个属性值是如何设置的（而以前的版本是三个静态常量内部类）。

2. IndexWriter
可以参考我之前的一个博客： http://hanyuanbo.iteye.com/blog/812135
下面这段摘自JavaDoc中IndexWriter的前三段：

引用

An IndexWriter creates and maintains an index.

The create argument to the constructor determines whether a new index is created, or whether an existing index is opened. Note that you can open an index with create=true even while readers are using the index. The old readers will continue to search the "point in time" snapshot they had opened, and won't see the newly created index until they re-open. There are also constructors with no create argument which will create a new index if there is not already an index at the provided path and otherwise open the existing index.

In either case, documents are added with addDocument and removed with deleteDocuments(Term) or deleteDocuments(Query). A document can be updated with updateDocument (which just deletes and then adds the entire document). When finished adding, deleting and updating documents, close should be called.

(其中有一点说明了如果没有指明是否是创建还是追加index的时候，采取不存在则创建，存在则打开已经存在的index策略)

引用

Expert: IndexWriter allows an optional IndexDeletionPolicy implementation to be specified.

Expert: IndexWriter allows you to separately change the MergePolicy and the MergeScheduler.

之下的五个构造函数中Expert有三个，正常用另外两个就够了。

IndexWriter(Directory d, Analyzer a, boolean create, IndexDeletionPolicy deletionPolicy, IndexWriter.MaxFieldLength mfl)	Expert: constructs an IndexWriter with a custom IndexDeletionPolicy, for the index in d.
IndexWriter(Directory d, Analyzer a, IndexDeletionPolicy deletionPolicy, IndexWriter.MaxFieldLength mfl)	Expert: constructs an IndexWriter with a custom IndexDeletionPolicy, for the index in d, first creating it if it does not already exist.
IndexWriter(Directory d, Analyzer a, IndexDeletionPolicy deletionPolicy, IndexWriter.MaxFieldLength mfl, IndexCommit commit)	Expert: constructs an IndexWriter on specific commit point, with a custom IndexDeletionPolicy, for the index in d.
IndexWriter(Directory d, Analyzer a, IndexWriter.MaxFieldLength mfl)	Constructs an IndexWriter for the index in d, first creating it if it does not already exist.
IndexWriter(Directory d, Analyzer a, boolean create, IndexWriter.MaxFieldLength mfl)	Constructs an IndexWriter for the index in d.

而实际上在源代码中，都调用了一个私有的init的方法。


private void init(Directory d, Analyzer a, final boolean create,  
                    IndexDeletionPolicy deletionPolicy, int maxFieldLength,
                    IndexingChain indexingChain, IndexCommit commit)
    throws CorruptIndexException, LockObtainFailedException, IOException {
        ...//在以前的版本中，是调用了一个私有的构造函数。
}

在IndexWriter中，用来创建index的方法

void addDocument(Document doc)	Adds a document to this index.
void addDocument(Document doc, Analyzer analyzer)	Adds a document to this index, using the provided analyzer instead of the value of getAnalyzer().

3. IndexReader

帮助来重新处理索引文件。包括更新、删除等操作。构造函数有如下：

static IndexReader open(Directory directory)	Returns a IndexReader reading the index in the given Directory, with readOnly=true.
static IndexReader open(Directory directory, boolean readOnly)	Returns an IndexReader reading the index in the given Directory.
static IndexReader open(Directory directory, IndexDeletionPolicy deletionPolicy, boolean readOnly)	Expert: returns an IndexReader reading the index in the given Directory, with a custom IndexDeletionPolicy.
static IndexReader open(Directory directory, IndexDeletionPolicy deletionPolicy, boolean readOnly, int termInfosIndexDivisor)	Expert: returns an IndexReader reading the index in the given Directory, with a custom IndexDeletionPolicy.
static IndexReader open(IndexCommit commit, boolean readOnly)	Expert: returns an IndexReader reading the index in the given IndexCommit.
static IndexReader open(IndexCommit commit, IndexDeletionPolicy deletionPolicy, boolean readOnly)	Expert: returns an IndexReader reading the index in the given Directory, using a specific commit and with a custom IndexDeletionPolicy.
static IndexReader open(IndexCommit commit, IndexDeletionPolicy deletionPolicy, boolean readOnly, int termInfosIndexDivisor)	Expert: returns an IndexReader reading the index in the given Directory, using a specific commit and with a custom IndexDeletionPolicy.

里面会涉及到Term这个类，Term类的构造函数很简单，如下：

Term(String fld)	Constructs a Term with the given field and empty text.
Term(String fld, String txt)	Constructs a Term with the given field and text.

在IndexReader中常用到的，而且好理解的方法如下：

Document document(int n)	Returns the stored fields of the nth Document in this index.
abstract int numDocs()	Returns the number of documents in this index.
abstract TermDocs termDocs()	Returns an unpositioned TermDocs enumerator.
TermDocs termDocs(Term term)	Returns an enumeration of all the documents which contain term.
abstract TermPositions termPositions()	Returns an unpositioned TermPositions enumerator.
TermPositions termPositions(Term term)	Returns an enumeration of all the documents which contain term.
abstract TermEnum terms()	Returns an enumeration of all the terms in the index.
abstract TermEnum terms(Term t)	Returns an enumeration of all terms starting at a given term.
void deleteDocument(int docNum)	Deletes the document numbered docNum.
int deleteDocuments(Term term)	Deletes all documents that have a given term indexed.

如下代码帮助理解如何操作IndexReader对其中的Term进行访问，并进行删除操作(但进行删除的时候，切记要记得将reader关掉)

package com.eric.lucene;

import java.io.File;
import java.io.IOException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.index.TermDocs;
import org.apache.lucene.index.TermPositions;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.LockObtainFailedException;
import org.apache.lucene.util.Version;

public class IndexReaderTest {
	private File path ;
	
	
	public IndexReaderTest(String path) {
		this.path = new File(path);
	}

	public void createIndex(){
		try {
			IndexWriter writer = new IndexWriter(FSDirectory.open(this.path),new StandardAnalyzer(
					Version.LUCENE_30), IndexWriter.MaxFieldLength.LIMITED);
			Document doc1 = new Document();
			Document doc2 = new Document();
			Document doc3 = new Document();
			doc1.add(new Field("bookname", "thinking in java -- java 4", Field.Store.YES, Field.Index.ANALYZED));
			doc2.add(new Field("bookname", "java core 2", Field.Store.YES, Field.Index.ANALYZED));
			doc3.add(new Field("bookname", "thinking in c++", Field.Store.YES, Field.Index.ANALYZED));
			writer.addDocument(doc1);
			writer.addDocument(doc2);
			writer.addDocument(doc3);
			writer.close();
		} catch (CorruptIndexException e) {
			e.printStackTrace();
		} catch (LockObtainFailedException e) {
			e.printStackTrace();
		} catch (IOException e) {
			e.printStackTrace();
		}
	}
	
	public void test1(){
		try {
			IndexReader reader = IndexReader.open(FSDirectory.open(this.path));
			System.out.println("version:\t" + reader.getVersion());
			int num = reader.numDocs();
			for(int i=0;i<num;i++){
				Document doc = reader.document(i);
				System.out.println(doc);
			}
			
			Term term = new Term("bookname","java");
			TermDocs docs = reader.termDocs(term);
			while(docs.next()){
				System.out.print("doc num:\t" + docs.doc() + "\t\t");
				System.out.println("frequency:\t" + docs.freq());
			}
			
			reader.close();
			
		} catch (CorruptIndexException e) {
			e.printStackTrace();
		} catch (IOException e) {
			e.printStackTrace();
		}
	}
//	version:	1289906350314
//	Document<stored,indexed,tokenized<bookname:thinking in java -- java 4>>
//	Document<stored,indexed,tokenized<bookname:java core 2>>
//	Document<stored,indexed,tokenized<bookname:thinking in c++>>
//	doc num:	0		frequency:	2
//	doc num:	1		frequency:	1
	
	public void test2(){
		try {
			IndexReader reader = IndexReader.open(FSDirectory.open(this.path));
			System.out.println("version:\t" + reader.getVersion());
			
			Term term = new Term("bookname","java");
			TermPositions pos = reader.termPositions(term);
			while(pos.next()){
				System.out.print("frequency: " + pos.freq() + "\t");
				for(int i=0;i<pos.freq();i++){
					System.out.print("pos: " + pos.nextPosition() + "\t");
				}
				System.out.println();
			}
			reader.close();
			
		} catch (CorruptIndexException e) {
			e.printStackTrace();
		} catch (IOException e) {
			e.printStackTrace();
		}
	}
//	version:	1289906350314
//	frequency: 2	pos: 2	pos: 3	
//	frequency: 1	pos: 0
//	第二次的时候没有调用createIndex() 所以版本号还是相同的
	
	public void delete1(){
		try {
			IndexReader reader = IndexReader.open(FSDirectory.open(this.path), false);//必须指定readonly 为 false
			System.out.println("version:\t" + reader.getVersion());
			System.out.println("num:\t" + reader.numDocs());
			reader.deleteDocument(2);//删除c++的那个Document
			reader.close();
			
			
			reader = IndexReader.open(FSDirectory.open(this.path), false);
			System.out.println("version:\t" + reader.getVersion());
			System.out.println("num:\t" + reader.numDocs());
			reader.close();
			
		} catch (CorruptIndexException e) {
			e.printStackTrace();
		} catch (IOException e) {
			e.printStackTrace();
		}
	}
//	version:	1289906350314
//	num:	3
//	version:	1289906350315
//	num:	2

	public void delete2(){
		try {
			IndexReader reader = IndexReader.open(FSDirectory.open(this.path), false);//必须指定readonly 为 false
			System.out.println("version:\t" + reader.getVersion());
			System.out.println("num:\t" + reader.numDocs());
			Term term = new Term("bookname","java");
			reader.deleteDocuments(term);//删除java的Document
			reader.close();
			
			
			reader = IndexReader.open(FSDirectory.open(this.path), false);
			System.out.println("version:\t" + reader.getVersion());
			System.out.println("num:\t" + reader.numDocs());
			reader.close();
			
		} catch (CorruptIndexException e) {
			e.printStackTrace();
		} catch (IOException e) {
			e.printStackTrace();
		}
	}
//	version:	1289906350315
//	num:	2
//	version:	1289906350316
//	num:	0

	
	public static void main(String[] args) {
		String path = "E:\\indexReaderTest";
		IndexReaderTest test = new IndexReaderTest(path);
//		test.createIndex();
//		test.test1();
//		test.test2();
//		test.delete1();
		test.delete2();
	}
}

注释：
先调用

String path = "E:\\indexReaderTest";
IndexReaderTest test = new IndexReaderTest(path);
test.createIndex();
test.test1();

然后再调用：

String path = "E:\\indexReaderTest";
IndexReaderTest test = new IndexReaderTest(path);
test.test2();

然后再调用：

String path = "E:\\indexReaderTest";
IndexReaderTest test = new IndexReaderTest(path);
test.delete1();

然后再调用：

String path = "E:\\indexReaderTest";
IndexReaderTest test = new IndexReaderTest(path);
test.delete2();

4. Lucene中的倒排实现
以下的这个博客，简单的说明了倒排索引的原理。
http://jackyrong.iteye.com/blog/238940
附件中的《Lucene 3.0 原理与代码分析完整版.pdf》的前面有介绍信息检索的基本原理，大概也就几页，很容易理解，Lucene只是对这个原理进行了自己的实现，对于理解Lucene的倒排索引的建立有很大帮助。
通过阅读源代码可以找到在IndexWriter中有个静态的常量static final IndexingChain DefaultIndexingChain，如下：

  static final IndexingChain DefaultIndexingChain = new IndexingChain() {

    @Override
    DocConsumer getChain(DocumentsWriter documentsWriter) {
      /*
      This is the current indexing chain:

      DocConsumer / DocConsumerPerThread
        --> code: DocFieldProcessor / DocFieldProcessorPerThread
          --> DocFieldConsumer / DocFieldConsumerPerThread / DocFieldConsumerPerField
            --> code: DocFieldConsumers / DocFieldConsumersPerThread / DocFieldConsumersPerField
              --> code: DocInverter / DocInverterPerThread / DocInverterPerField
                --> InvertedDocConsumer / InvertedDocConsumerPerThread / InvertedDocConsumerPerField
                  --> code: TermsHash / TermsHashPerThread / TermsHashPerField
                    --> TermsHashConsumer / TermsHashConsumerPerThread / TermsHashConsumerPerField
                      --> code: FreqProxTermsWriter / FreqProxTermsWriterPerThread / FreqProxTermsWriterPerField
                      --> code: TermVectorsTermsWriter / TermVectorsTermsWriterPerThread / TermVectorsTermsWriterPerField
                --> InvertedDocEndConsumer / InvertedDocConsumerPerThread / InvertedDocConsumerPerField
                  --> code: NormsWriter / NormsWriterPerThread / NormsWriterPerField
              --> code: StoredFieldsWriter / StoredFieldsWriterPerThread / StoredFieldsWriterPerField
    */

    // Build up indexing chain:

      final TermsHashConsumer termVectorsWriter = new TermVectorsTermsWriter(documentsWriter);
      final TermsHashConsumer freqProxWriter = new FreqProxTermsWriter();

      final InvertedDocConsumer  termsHash = new TermsHash(documentsWriter, true, freqProxWriter,
                                                           new TermsHash(documentsWriter, false, termVectorsWriter, null));
      final NormsWriter normsWriter = new NormsWriter();
      final DocInverter docInverter = new DocInverter(termsHash, normsWriter);
      return new DocFieldProcessor(documentsWriter, docInverter);
    }
  };

这里的注释清晰的给出了整个处理的链是怎样进行的。在Doc文档中是没有这些invertXXX类的说明，必须到源文件中进行阅读。

5. IndexSearcher
Searcher中的接口实现与类继承关系如下(摘自API文档。简单的使用方法参见我之前的一个博客 http://hanyuanbo.iteye.com/blog/812135)

引用

org.apache.lucene.search
Class Searcher
java.lang.Object
        org.apache.lucene.search.Searcher
All Implemented Interfaces:
        Closeable, Searchable
Direct Known Subclasses:
        IndexSearcher, MultiSearcher

其中用到的search函数有很多重载版本，以下摘自API文档。

void search(Query query, Collector results)	Lower-level search API.
void search(Query query, Filter filter, Collector results)	Lower-level search API.
TopDocs search(Query query, Filter filter, int n)	Finds the top n hits for query, applying filter if non-null.
TopFieldDocs search(Query query, Filter filter, int n, Sort sort)	Search implementation with arbitrary sorting.
TopDocs search(Query query, int n)	Finds the top n hits for query.
abstract void search(Weight weight, Filter filter, Collector results)	Lower-level search API.
abstract TopDocs search(Weight weight, Filter filter, int n)	Expert: Low-level search implementation.
abstract TopFieldDocs search(Weight weight, Filter filter, int n, Sort sort)	Expert: Low-level search implementation with arbitrary sorting.

还有两个非常有用的函数(在Searcher中为抽象方法，具体实现在子类中)

abstract Document doc(int i)	Returns the stored fields of document i.
Explanation explain(Weight weight, int doc)	Expert: low-level implementation method Returns an Explanation that describes how doc scored against weight.

在源代码中的Searcher抽象类中的search函数的重载版本如下：

/** Search implementation with arbitrary sorting.  Finds
   * the top <code>n</code> hits for <code>query</code>, applying
   * <code>filter</code> if non-null, and sorting the hits by the criteria in
   * <code>sort</code>.
   * 
   * <p>NOTE: this does not compute scores by default; use
   * {@link IndexSearcher#setDefaultFieldSortScoring} to
   * enable scoring.
   *
   * @throws BooleanQuery.TooManyClauses
   */
  public TopFieldDocs search(Query query, Filter filter, int n,
                             Sort sort) throws IOException {
    return search(createWeight(query), filter, n, sort);
  }

  /** Lower-level search API.
  *
  * <p>{@link Collector#collect(int)} is called for every matching document.
  *
  * <p>Applications should only use this if they need <i>all</i> of the
  * matching documents.  The high-level search API ({@link
  * Searcher#search(Query, int)}) is usually more efficient, as it skips
  * non-high-scoring hits.
  * <p>Note: The <code>score</code> passed to this method is a raw score.
  * In other words, the score will not necessarily be a float whose value is
  * between 0 and 1.
  * @throws BooleanQuery.TooManyClauses
  */
 public void search(Query query, Collector results)
   throws IOException {
   search(createWeight(query), null, results);
 }

  /** Lower-level search API.
   *
   * <p>{@link Collector#collect(int)} is called for every matching
   * document.
   * <br>Collector-based access to remote indexes is discouraged.
   *
   * <p>Applications should only use this if they need <i>all</i> of the
   * matching documents.  The high-level search API ({@link
   * Searcher#search(Query, Filter, int)}) is usually more efficient, as it skips
   * non-high-scoring hits.
   *
   * @param query to match documents
   * @param filter if non-null, used to permit documents to be collected.
   * @param results to receive hits
   * @throws BooleanQuery.TooManyClauses
   */
  public void search(Query query, Filter filter, Collector results)
  throws IOException {
    search(createWeight(query), filter, results);
  }

  /** Finds the top <code>n</code>
   * hits for <code>query</code>, applying <code>filter</code> if non-null.
   *
   * @throws BooleanQuery.TooManyClauses
   */
  public TopDocs search(Query query, Filter filter, int n)
    throws IOException {
    return search(createWeight(query), filter, n);
  }

  /** Finds the top <code>n</code>
   * hits for <code>query</code>.
   *
   * @throws BooleanQuery.TooManyClauses
   */
  public TopDocs search(Query query, int n)
    throws IOException {
    return search(query, null, n);
  }
  ...
  abstract public void search(Weight weight, Filter filter, Collector results) throws IOException;

实际上的search函数在Searcher类中并没有实现，留在了子类中来实现，而且最终使用的函数都是

search(Weight weight, Filter filter, Collector results)

版本的。其他传入的query参数的搜索函数，都隐含的调用了createWeight(query)方法。

至于到IndexSearcher类中，搜索函数主要有两个(其他的重载版本，都调用了两个中的一个)

  @Override
  public void search(Weight weight, Filter filter, Collector collector)
      throws IOException {
    
    if (filter == null) {
      for (int i = 0; i < subReaders.length; i++) { // search each subreader
        collector.setNextReader(subReaders[i], docStarts[i]);
        Scorer scorer = weight.scorer(subReaders[i], !collector.acceptsDocsOutOfOrder(), true);
        if (scorer != null) {
          scorer.score(collector);
        }
      }
    } else {
      for (int i = 0; i < subReaders.length; i++) { // search each subreader
        collector.setNextReader(subReaders[i], docStarts[i]);
        searchWithFilter(subReaders[i], weight, filter, collector);
      }
    }
  }

  ...

private void searchWithFilter(IndexReader reader, Weight weight,
      final Filter filter, final Collector collector) throws IOException {
  ...
}

可以看到，在其中最主要的区别是是否使用了Filter来进行搜索。而对于有返回类型的search函数，也是调用了上面所说的两个中的一个，只是在结尾返回了

 return (TopFieldDocs) collector.topDocs();

而对于简单的使用，调用前面Searcher抽象类(父类)中申明的函数即可。

而在其中还使用到了其他的类来进行辅助搜索，有：

QueryParser

Query

TopScoreDocCollector

TopDocs

ScoreDoc

Document

需要注意的是其中的那个TopScoreDocCollector类，用来存储搜索的结果。这个类的继承关系如下(摘自API文档)：

引用

org.apache.lucene.search
    Class TopScoreDocCollector
java.lang.Object
org.apache.lucene.search.Collector
      org.apache.lucene.search.TopDocsCollector<ScoreDoc>
          org.apache.lucene.search.TopScoreDocCollector

其中比较常用的函数包括(摘自API文档)：

int getTotalHits()	The total number of documents that matched this query.
TopDocs topDocs()	Returns the top docs that were collected by this collector.
TopDocs topDocs(int start)	Returns the documents in the rage [start ..
TopDocs topDocs(int start, int howMany)	Returns the documents in the rage [start ..

而其中的topDocs()的返回类型TopDocs类中，有如下两个属性

ScoreDoc[] scoreDocs	The top hits for the query.
int totalHits	The total number of hits for the query.

而其中的ScoreDoc类中有两个属性，如下：

int doc	Expert: A hit document's number.
float score	Expert: The score of this document for the query.

这样便可以得到doc(文档号)和score(得分)

6. Analyzer
在Lucene 3.0.2中的Analyzer实现中，集成结构如下(摘自API文档)：
org.apache.lucene.analysis
    Class Analyzer

java.lang.Object    org.apache.lucene.analysis.Analyzer
All Implemented Interfaces:    Closeable
Direct Known Subclasses:ArabicAnalyzer, BrazilianAnalyzer, ChineseAnalyzer, CJKAnalyzer, CollationKeyAnalyzer, CzechAnalyzer, DutchAnalyzer, FrenchAnalyzer, GermanAnalyzer, GreekAnalyzer, ICUCollationKeyAnalyzer, KeywordAnalyzer, PatternAnalyzer, PerFieldAnalyzerWrapper, PersianAnalyzer, QueryAutoStopWordAnalyzer, RussianAnalyzer, ShingleAnalyzerWrapper, SimpleAnalyzer, SmartChineseAnalyzer, SnowballAnalyzer, StandardAnalyzer, StopAnalyzer, ThaiAnalyzer, WhitespaceAnalyzer

其中出现了很多对于特定语言的实现，如ChineseAnalyzer，RussianAnalyzer等等。目前我只用到了StandardAnalyzer和IKAnalyzer(是IKAnalyzer自己的实现，不是Lucene的一部分)。其他的可以进行下尝试，如果做某种特定语言的分析的话。

在StandardAnalyzer中主要有两个属性：

  private Set<?> stopSet;
  private final Version matchVersion;

其中还有三个属性，分别是：

/**
   * Specifies whether deprecated acronyms should be replaced with HOST type.
   * See {@linkplain https://issues.apache.org/jira/browse/LUCENE-1068}
   */
  private final boolean replaceInvalidAcronym,enableStopPositionIncrements;

  /** An unmodifiable set containing some common English words that are usually not
  useful for searching. */
  public static final Set<?> STOP_WORDS_SET = StopAnalyzer.ENGLISH_STOP_WORDS_SET;

我在设置属性的时候，基本上使用了Version.LUCENE_30，所以这两个属性不考虑，也没做进一步研究。其中的STOP_WORDS_SET是为了来虑词操作的，而StopAnalyzer.ENGLISH_STOP_WORDS_SET; 的内容如下：
//StopAnalyzer.java

  public static final Set<?> ENGLISH_STOP_WORDS_SET;
  
  static {
    final List<String> stopWords = Arrays.asList(
      "a", "an", "and", "are", "as", "at", "be", "but", "by",
      "for", "if", "in", "into", "is", "it",
      "no", "not", "of", "on", "or", "such",
      "that", "the", "their", "then", "there", "these",
      "they", "this", "to", "was", "will", "with"
    );
    final CharArraySet stopSet = new CharArraySet(stopWords.size(), false);
    stopSet.addAll(stopWords);  
    ENGLISH_STOP_WORDS_SET = CharArraySet.unmodifiableSet(stopSet); 
  }

在StandardAnalyzer中构造函数有四个版本，如下：

StandardAnalyzer(Version matchVersion)	Builds an analyzer with the default stop words (STOP_WORDS_SET).
StandardAnalyzer(Version matchVersion, File stopwords)	Builds an analyzer with the stop words from the given file.
StandardAnalyzer(Version matchVersion, Reader stopwords)	Builds an analyzer with the stop words from the given reader.
StandardAnalyzer(Version matchVersion, Set<?> stopWords)	Builds an analyzer with the given stop words.

主要的构造函数如下(其他版本的构造函数都是重载这个构造函数)

  /** Builds an analyzer with the given stop words.
   * @param matchVersion Lucene version to match See {@link
   * <a href="#version">above</a>}
   * @param stopWords stop words */
  public StandardAnalyzer(Version matchVersion, Set<?> stopWords) {
    stopSet = stopWords;
    setOverridesTokenStreamMethod(StandardAnalyzer.class);
    enableStopPositionIncrements = StopFilter.getEnablePositionIncrementsVersionDefault(matchVersion);
    replaceInvalidAcronym = matchVersion.onOrAfter(Version.LUCENE_24);
    this.matchVersion = matchVersion;
  }

replaceInvalidAcronym和enableStopPositionIncrements不考虑在内(我尽量使用Version.LUCENE_30版本，)，其他它只做了种boolean值的判断(这在Version的enum中进行的一个compareTo比较)，在之后做进一步的考虑。排除掉这两个变量，其中做的工作就是将版本号和stopWords赋值。版本号由Version枚举来选择，而stopWords有默认的集合，不过这个Lucene做了很好的扩展，可以指定自己的stopWords集合或者文件。将里面的关于扩展这个stopWords的代码拿出来。将集合赋值到stopWords中在上面的构造函数中已经说明，指定到指定的文件中的内容如下：

  /** Builds an analyzer with the stop words from the given file.
   * @see WordlistLoader#getWordSet(File)
   * @param matchVersion Lucene version to match See {@link
   * <a href="#version">above</a>}
   * @param stopwords File to read stop words from */
  public StandardAnalyzer(Version matchVersion, File stopwords) throws IOException {
    this(matchVersion, WordlistLoader.getWordSet(stopwords));
  }

在WordlistLoader类中，方法定义如下：

  /**
   * Loads a text file and adds every line as an entry to a HashSet (omitting
   * leading and trailing whitespace). Every line of the file should contain only
   * one word. The words need to be in lowercase if you make use of an
   * Analyzer which uses LowerCaseFilter (like StandardAnalyzer).
   *
   * @param wordfile File containing the wordlist
   * @return A HashSet with the file's words
   */
  public static HashSet<String> getWordSet(File wordfile) throws IOException {
    HashSet<String> result = new HashSet<String>();
    FileReader reader = null;
    try {
      reader = new FileReader(wordfile);
      result = getWordSet(reader);
    }
    finally {
      if (reader != null)
        reader.close();
    }
    return result;
  }

 /**
   * Reads lines from a Reader and adds every line as an entry to a HashSet (omitting
   * leading and trailing whitespace). Every line of the Reader should contain only
   * one word. The words need to be in lowercase if you make use of an
   * Analyzer which uses LowerCaseFilter (like StandardAnalyzer).
   *
   * @param reader Reader containing the wordlist
   * @return A HashSet with the reader's words
   */
  public static HashSet<String> getWordSet(Reader reader) throws IOException {
    HashSet<String> result = new HashSet<String>();
    BufferedReader br = null;
    try {
      if (reader instanceof BufferedReader) {
        br = (BufferedReader) reader;
      } else {
        br = new BufferedReader(reader);
      }
      String word = null;
      while ((word = br.readLine()) != null) {
        result.add(word.trim());
      }
    }
    finally {
      if (br != null)
        br.close();
    }
    return result;
  }

可以看出，只要指定到一个文件上，且文件中的stopWords是每个词占用一行，这样StandardAnalyzer便可以加载这个文件中的stopWords。

7. Directory
Directory的继承关系如下(摘自API文档)：
org.apache.lucene.store     Class Directory
java.lang.Object    org.apache.lucene.store.Directory
All Implemented Interfaces:    Closeable
Direct Known Subclasses:DbDirectory, FileSwitchDirectory, FSDirectory, JEDirectory, RAMDirectory

其中用到的比较多的是RAMDirectory和FSDirectory。RAMDirectory是将索引存储在内存中(如果数据量很大，用RAMDirectory将是可怕的，会有OutOfMemoryErr: Heap space error)，FSDirectory是将索引文件存储到本地硬盘中。大致意思是这样，具体的实现起来的时候，一定要注意IndexWriter和IndexReader操作时，所指向的是同一个Directory，否则将会出现error(这个是RAMDirectory的不指向同一个Directory的错误)：no segments* file found in org.apache.lucene.store.RAMDirectory@765291: files: []

Lucene中的Directory中的方法，大多是对文件进行的操作，这也就是对java的io进行了进一步的封装而已，也比较容易理解。而在索引创建的过程中，需要用到Directory的实例，其中将FSDirectory和RAMDirectory中比较常用的方法列举如下：

下面是FSDirectory中创建的方法(摘自API文档)，因为构造函数是protected类型，不能直接实例化，调用静态方法来得到具体类的reference。

static FSDirectory open(File path)	Creates an FSDirectory instance, trying to pick the best implementation given the current environment.
static FSDirectory open(File path, LockFactory lockFactory)	Just like open(File), but allows you to also specify a custom LockFactory.

具体的实现如下：

 /** Creates an FSDirectory instance, trying to pick the
   *  best implementation given the current environment.
   *  The directory returned uses the {@link NativeFSLockFactory}.
   *
   *  <p>Currently this returns {@link NIOFSDirectory}
   *  on non-Windows JREs and {@link SimpleFSDirectory}
   *  on Windows.
   *
   * <p><b>NOTE</b>: this method may suddenly change which
   * implementation is returned from release to release, in
   * the event that higher performance defaults become
   * possible; if the precise implementation is important to
   * your application, please instantiate it directly,
   * instead. On 64 bit systems, it may also good to
   * return {@link MMapDirectory}, but this is disabled
   * because of officially missing unmap support in Java.
   * For optimal performance you should consider using
   * this implementation on 64 bit JVMs.
   *
   * <p>See <a href="#subclasses">above</a> */
  public static FSDirectory open(File path) throws IOException {
    return open(path, null);
  }

  /** Just like {@link #open(File)}, but allows you to
   *  also specify a custom {@link LockFactory}. */
  public static FSDirectory open(File path, LockFactory lockFactory) throws IOException {
    /* For testing:
    MMapDirectory dir=new MMapDirectory(path, lockFactory);
    dir.setUseUnmap(true);
    return dir;
    */

    if (Constants.WINDOWS) {
      return new SimpleFSDirectory(path, lockFactory);
    } else {
      return new NIOFSDirectory(path, lockFactory);
    }
  }

可以看出，如果不指明锁的话，会将锁设置为null，如果显示设置了锁，会根据操作系统的不同，而分别返回不同的LockFactory的实现。目前还没有对其进行使用，以后如果遇到，会进一步追加对LockFactory的理解。

8. Query、Sort 和 Filter
在搜索中，Query是必须要用到的类。而且是需要深入理解下的东西。Query的继承关系如下(摘自API文档)(而Query中常用到Term类，在上面有提到)：

引用

org.apache.lucene.search
Class Query

java.lang.Object
    org.apache.lucene.search.Query
All Implemented Interfaces:
    Serializable, Cloneable
Direct Known Subclasses:
    BooleanQuery, BoostingQuery, ConstantScoreQuery, CustomScoreQuery, DisjunctionMaxQuery, FilteredQuery, FuzzyLikeThisQuery, MatchAllDocsQuery, MoreLikeThisQuery, MultiPhraseQuery, MultiTermQuery, PhraseQuery, SpanQuery, TermQuery, ValueSourceQuery

下面一一介绍。
TermQuery
构造函数只有一个。使用比较简单

TermQuery(Term t)

Constructs a query for the term t.

TermQuery query = new TermQuery(new Term("bookname","java"));

BooleanQuery
(摘自API文档)

引用

A Query that matches documents matching boolean combinations of other queries, e.g. TermQuerys, PhraseQuerys or other BooleanQuerys.

在BooleanQuery中，有两个构造函数，如下(摘自API文档)：

BooleanQuery()	Constructs an empty boolean query.
BooleanQuery(boolean disableCoord)	Constructs an empty boolean query.

其中需要注意的属性包括(我可能没有用到那么多，将知道的列举如下)

private static int maxClauseCount = 1024;//最大数量限制。默认是1024
this.disableCoord = disableCoord;//第二个构造函数中。是用来在search中的Similarity类中使用的
protected int minNrShouldMatch = 0;//在setMinimumNumberShouldMatch(int)函数中
private ArrayList<BooleanClause> clauses = new ArrayList<BooleanClause>();//用来存放BooleanClause的容器

常用到的函数包括：

void add(BooleanClause clause)	Adds a clause to a boolean query.
void add(Query query, BooleanClause.Occur occur)	Adds a clause to a boolean query.

而BooleanClause类简单但很有用(对于BooleanQuery来说)。代码没多少，重要的只是其中的那个静态枚举变量和两个属性。

 public static enum Occur {

    /** Use this operator for clauses that <i>must</i> appear in the matching documents. */
    MUST     { @Override public String toString() { return "+"; } },

    /** Use this operator for clauses that <i>should</i> appear in the 
     * matching documents. For a BooleanQuery with no <code>MUST</code> 
     * clauses one or more <code>SHOULD</code> clauses must match a document 
     * for the BooleanQuery to match.
     * @see BooleanQuery#setMinimumNumberShouldMatch
     */
    SHOULD   { @Override public String toString() { return "";  } },

    /** Use this operator for clauses that <i>must not</i> appear in the matching documents.
     * Note that it is not possible to search for queries that only consist
     * of a <code>MUST_NOT</code> clause. */
    MUST_NOT { @Override public String toString() { return "-"; } };

  }

  /** The query whose matching documents are combined by the boolean query.
   */
  private Query query;
  private Occur occur;

  /** Constructs a BooleanClause.
  */ 
  public BooleanClause(Query query, Occur occur) {
    this.query = query;
    this.occur = occur;
    
  }

PhraseQuery
在PhraseQuery中，构造函数只有一个，如下：

PhraseQuery()

Constructs an empty phrase query.

这主要是用到了其中的属性，所以构造了一个空的PhraseQuery对象。其中的属性包括：

  private String field;//field在这个PhraseQuery中必须是相同的
  private ArrayList<Term> terms = new ArrayList<Term>(4);//来存储Term的集合
  private ArrayList<Integer> positions = new ArrayList<Integer>(4);//来存储位置的集合
  private int maxPosition = 0;//maxPosition
  private int slop = 0;//用来说明Term之间距离的变量。如果为0，则表示是一个phrase

其中用到的主要函数有：

    public void setSlop(int s) { slop = s; }
/**
   * Adds a term to the end of the query phrase.
   * The relative position of the term is the one immediately after the last term added.
   */
  public void add(Term term) {
    int position = 0;
    if(positions.size() > 0)
        position = positions.get(positions.size()-1).intValue() + 1;

    add(term, position);
  }

  /**
   * Adds a term to the end of the query phrase.
   * The relative position of the term within the phrase is specified explicitly.
   * This allows e.g. phrases with more than one term at the same position
   * or phrases with gaps (e.g. in connection with stopwords).
   * 
   * @param term
   * @param position
   */
  public void add(Term term, int position) {
      if (terms.size() == 0)
          field = term.field();
      else if (term.field() != field)
          throw new IllegalArgumentException("All phrase terms must be in the same field: " + term);//field必须相同

      terms.add(term);
      positions.add(Integer.valueOf(position));
      if (position > maxPosition) maxPosition = position;
  }

WildcardQuery(继承自MultiTermQuery)
WindcardQuery的使用非常简单，只有一个构造函数(引自API文档)：

WildcardQuery(Term term)

构造函数如下：

/** Implements the wildcard search query. Supported wildcards are <code>*</code>, which
 * matches any character sequence (including the empty one), and <code>?</code>,
 * which matches any single character. Note this query can be slow, as it
 * needs to iterate over many terms. In order to prevent extremely slow WildcardQueries,
 * a Wildcard term should not start with one of the wildcards <code>*</code> or
 * <code>?</code>.
 * 
 * <p>This query uses the {@link
 * MultiTermQuery#CONSTANT_SCORE_AUTO_REWRITE_DEFAULT}
 * rewrite method.
 *
 * @see WildcardTermEnum */
public class WildcardQuery extends MultiTermQuery {
  private boolean termContainsWildcard;//如果含有*或者?，则为true
  private boolean termIsPrefix;//如果只含有*且*在最后。为了来处理仅仅含有*且在最后的这种情况，来提高检索速度。因为使用WildcardQuery，速度有慢很多
  protected Term term;
    
  public WildcardQuery(Term term) {
    this.term = term;
    String text = term.text();
    this.termContainsWildcard = (text.indexOf('*') != -1)
        || (text.indexOf('?') != -1);
    this.termIsPrefix = termContainsWildcard 
        && (text.indexOf('?') == -1) 
        && (text.indexOf('*') == text.length() - 1);
  }

...

}

可以看出，WildcardQuery只支持*和?两种，

PrefixQuery(继承自MultiTermQuery)
构造函数同样只有一个，如下(摘自API文档)：

PrefixQuery(Term prefix)

Constructs a query for terms starting with prefix.

FuzzyQuery(继承自MultiTermQuery)
用来实现相思查询。构造函数如下(摘自API文档)：

FuzzyQuery(Term term)	Calls FuzzyQuery(term, 0.5f, 0).
FuzzyQuery(Term term, float minimumSimilarity)	Calls FuzzyQuery(term, minimumSimilarity, 0).
FuzzyQuery(Term term, float minimumSimilarity, int prefixLength)	Create a new FuzzyQuery that will match terms with a similarity of at least minimumSimilarity to term.

实现如下：

  public final static float defaultMinSimilarity = 0.5f;
  public final static int defaultPrefixLength = 0;
  
  private float minimumSimilarity;
  private int prefixLength;
  private boolean termLongEnough = false;
  
  protected Term term;
  
  /**
   * Create a new FuzzyQuery that will match terms with a similarity 
   * of at least <code>minimumSimilarity</code> to <code>term</code>.
   * If a <code>prefixLength</code> &gt; 0 is specified, a common prefix
   * of that length is also required.
   * 
   * @param term the term to search for
   * @param minimumSimilarity a value between 0 and 1 to set the required similarity
   *  between the query term and the matching terms. For example, for a
   *  <code>minimumSimilarity</code> of <code>0.5</code> a term of the same length
   *  as the query term is considered similar to the query term if the edit distance
   *  between both terms is less than <code>length(term)*0.5</code>
   * @param prefixLength length of common (non-fuzzy) prefix
   * @throws IllegalArgumentException if minimumSimilarity is &gt;= 1 or &lt; 0
   * or if prefixLength &lt; 0
   */
  public FuzzyQuery(Term term, float minimumSimilarity, int prefixLength) throws IllegalArgumentException {
    this.term = term;
    
    if (minimumSimilarity >= 1.0f)
      throw new IllegalArgumentException("minimumSimilarity >= 1");
    else if (minimumSimilarity < 0.0f)
      throw new IllegalArgumentException("minimumSimilarity < 0");
    if (prefixLength < 0)
      throw new IllegalArgumentException("prefixLength < 0");
    
    if (term.text().length() > 1.0f / (1.0f - minimumSimilarity)) {
      this.termLongEnough = true;
    }
    
    this.minimumSimilarity = minimumSimilarity;
    this.prefixLength = prefixLength;
    rewriteMethod = SCORING_BOOLEAN_QUERY_REWRITE;
  }
  
  /**
   * Calls {@link #FuzzyQuery(Term, float) FuzzyQuery(term, minimumSimilarity, 0)}.
   */
  public FuzzyQuery(Term term, float minimumSimilarity) throws IllegalArgumentException {
      this(term, minimumSimilarity, defaultPrefixLength);
  }

  /**
   * Calls {@link #FuzzyQuery(Term, float) FuzzyQuery(term, 0.5f, 0)}.
   */
  public FuzzyQuery(Term term) {
    this(term, defaultMinSimilarity, defaultPrefixLength);
  }

...

}

可以看出，minimumSimilarity在0到1之间，prefixLength>=0。其中Similarity用到了
levenshtein算法。此返回两个字符串之间的 Levenshtein 距离。Levenshtein 距离，又称编辑距离，指的是两个字符串之间，由一个转换成另一个所需的最少编辑操作次数。许可的编辑操作包括将一个字符替换成另一个字符，插入一个字符，删除一个字符。例如把 kitten 转换为 sitting：　　
sitten （k→s） sittin （e→i） sitting （→g）
levenshtein() 函数给每个操作（替换、插入和删除）相同的权重。不过，您可以通过设置可选的 insert、replace、delete 参数，来定义每个操作的代价。

TermRangeQuery(继承自MultiTermQuery)
TermRangeQuery的构造函数有两个(引自API文档)，默认的是String的比较，不过可以添加自己的比较器(自己实现一个comparator类)。(可能有它的原因，这个field传入的不是由Term来提供的，二是直接由自己的String字符串来提供)

TermRangeQuery(String field, String lowerTerm, String upperTerm, boolean includeLower, boolean includeUpper)	Constructs a query selecting all terms greater/equal than lowerTerm but less/equal than upperTerm.
TermRangeQuery(String field, String lowerTerm, String upperTerm, boolean includeLower, boolean includeUpper, Collator collator)	Constructs a query selecting all terms greater/equal than lowerTerm but less/equal than upperTerm.

NumericRangeQuery(继承自MultiTermQuery)
(摘自API文档)(我没有来改变precisionStep NumericUtils.PRECISION_STEP_DEFAULT (4).没有进行深入研究)

引用

A Query that matches numeric values within a specified range. To use this, you must first index the numeric values using NumericField (expert: NumericTokenStream). If your terms are instead textual, you should use TermRangeQuery. NumericRangeFilter is the filter equivalent of this query.

所以在你想使用NumericRangeQuery的时候，需要用NumericField来创建索引。在API文档中有说明，这个NumericRangeQuery类和NumericField在以后可能会不兼容现在的版本，如下：

引用

NOTE: This API is experimental and might change in incompatible ways in the next release.

这个Query可以用一下的8个静态成员函数来创建：

static NumericRangeQuery<Double> newDoubleRange(String field, Double min, Double max, boolean minInclusive, boolean maxInclusive)	Factory that creates a NumericRangeQuery, that queries a double range using the default precisionStep NumericUtils.PRECISION_STEP_DEFAULT (4).
static NumericRangeQuery<Double> newDoubleRange(String field, int precisionStep, Double min, Double max, boolean minInclusive, boolean maxInclusive)	Factory that creates a NumericRangeQuery, that queries a double range using the given precisionStep.
static NumericRangeQuery<Float> newFloatRange(String field, Float min, Float max, boolean minInclusive, boolean maxInclusive)	Factory that creates a NumericRangeQuery, that queries a float range using the default precisionStep NumericUtils.PRECISION_STEP_DEFAULT (4).
static NumericRangeQuery<Float> newFloatRange(String field, int precisionStep, Float min, Float max, boolean minInclusive, boolean maxInclusive)	Factory that creates a NumericRangeQuery, that queries a float range using the given precisionStep.
static NumericRangeQuery<Integer> newIntRange(String field, Integer min, Integer max, boolean minInclusive, boolean maxInclusive)	Factory that creates a NumericRangeQuery, that queries a int range using the default precisionStep NumericUtils.PRECISION_STEP_DEFAULT (4).
static NumericRangeQuery<Integer> newIntRange(String field, int precisionStep, Integer min, Integer max, boolean minInclusive, boolean maxInclusive)	Factory that creates a NumericRangeQuery, that queries a int range using the given precisionStep.
static NumericRangeQuery<Long> newLongRange(String field, int precisionStep, Long min, Long max, boolean minInclusive, boolean maxInclusive)	Factory that creates a NumericRangeQuery, that queries a long range using the given precisionStep.
static NumericRangeQuery<Long> newLongRange(String field, Long min, Long max, boolean minInclusive, boolean maxInclusive)	Factory that creates a NumericRangeQuery, that queries a long range using the default precisionStep NumericUtils.PRECISION_STEP_DEFAULT (4).

而NumericFiled中的构造函数有四个，如下(摘自API文档)：

NumericField(String name)	Creates a field for numeric values using the default precisionStep NumericUtils.PRECISION_STEP_DEFAULT (4).
NumericField(String name, Field.Store store, boolean index)	Creates a field for numeric values using the default precisionStep NumericUtils.PRECISION_STEP_DEFAULT (4).
NumericField(String name, int precisionStep)	Creates a field for numeric values with the specified precisionStep.
NumericField(String name, int precisionStep, Field.Store store, boolean index)	Creates a field for numeric values with the specified precisionStep.

里面可以用到如下函数：

NumericField setDoubleValue(double value)	Initializes the field with the supplied double value.
NumericField setFloatValue(float value)	Initializes the field with the supplied float value.
NumericField setIntValue(int value)	Initializes the field with the supplied int value.
NumericField setLongValue(long value)	Initializes the field with the supplied long value.

这使得可以来对包括基本数值类型的变量在内的其他可以转变为这些数值类型的数据类型的数值进行索引并进行搜索。如Date/Calendar等等。

RegexQuery(继承自MultiTermQuery)
API文档中有叙述，但是在Lucene 3.0.2中没有这个类。不知道为什么。可能是实现出来的性能不够满意，所以没有随着3.0.2一起发布吧，不太清楚。
上面所说的几个Query可能会帮助理解关于Query的概念，下面是一些代码，帮助理解这几个Query。(注释是运行结果)

package com.eric.lucene;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.NumericField;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.BooleanClause;
import org.apache.lucene.search.BooleanQuery;
import org.apache.lucene.search.FuzzyQuery;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.NumericRangeQuery;
import org.apache.lucene.search.PhraseQuery;
import org.apache.lucene.search.PrefixQuery;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.TermRangeQuery;
import org.apache.lucene.search.TopScoreDocCollector;
import org.apache.lucene.search.WildcardQuery;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.Version;

/**
 * 注释是运行结果
 * @author Yuanbo Han
 *
 */
public class QueryTest {
	
	public static Query getTermQuery(){
		TermQuery query = new TermQuery(new Term("bookname","java"));
		return query;
//		thinking in java		0.625
//		thinking in java IV(Java Classic)		0.61871845
	}
	
	public static Query getBooleanQuery(){
		TermQuery termQuery2 = new TermQuery(new Term("bookname", "thinking"));
		TermQuery termQuery1 = new TermQuery(new Term("bookname", "java"));
		BooleanQuery query = new BooleanQuery();
		query.add(termQuery1, BooleanClause.Occur.SHOULD);
		query.add(termQuery2, BooleanClause.Occur.SHOULD);
		
		return query;
//		thinking in java		0.76735055
//		thinking in java IV(Java Classic)		0.68474615
//		thinking in c++		0.12914689
	}
	
	public static Query getPhraseQuery(){
		PhraseQuery query = new PhraseQuery();

		//query.setSlop(1);
		//thinking in java		0.75674474
		//thinking in java IV(Java Classic)		0.5297213
		
		query.setSlop(0);// no result. 说明没有thinking java存在
		query.add(new Term("bookname", "thinking"));
		query.add(new Term("bookname", "java"));
		return query;
	}
	
	public static Query getWildcardQuery(){
		//WildcardQuery query = new WildcardQuery(new Term("bookname","think*"));
		//thinking in java		1.0
		//thinking in java IV(Java Classic)		1.0
		//thinking in c++		1.0
		
		
		//WildcardQuery query = new WildcardQuery(new Term("bookname","ja?a"));
		//thinking in java		1.0
		//thinking in java IV(Java Classic)		1.0
		
		
		WildcardQuery query = new WildcardQuery(new Term("bookname","ja?a*"));
		//thinking in java		1.0
		//thinking in java IV(Java Classic)		1.0
		
		return query;
	}

	public static Query getPrefixQuery(){
		PrefixQuery query = new PrefixQuery(new Term("bookname","java"));//以java为前缀的匹配
		//thinking in java		1.0
		//thinking in java IV(Java Classic)		1.0

		return query;
	}
	
	public static Query getFuzzyQuery(){
		//FuzzyQuery query = new FuzzyQuery(new Term("bookname","jama"));// default: similarity = 0.5, prefixLength = 0.
		/*具体的edit distance 不知道怎么计算的，但是觉得源代码的注意有些问题。解释如下：相似度越高，说明需要做的修改的操作也少，但是它注释中如是说：“For example, for a minimumSimilarity of 0.5,		a term of the same length as the query term is considered similar to the query term if the edit distance between both terms is less than length(term)*0.5”但是这说明Similarity越高的话，可以做的操作可以越多，代码中也试过了，如果将similarity设置为0.9的话，是没有结果的。*/
		//thinking in java		0.625
		//thinking in java IV(Java Classic)		0.61871845

//		FuzzyQuery query = new FuzzyQuery(new Term("bookname","jama"),0.9f);//no result
		
//		FuzzyQuery query = new FuzzyQuery(new Term("bookname","jama"),0.5f,3);//no result
		
		FuzzyQuery query = new FuzzyQuery(new Term("bookname","jama"),0.5f,2);
		//thinking in java		0.625
		//thinking in java IV(Java Classic)		0.61871845

		
		return query;
	}
	
	public static Query getTermRangeQuery(){
//		TermRangeQuery query = new TermRangeQuery("bookname", "jama", "jaza", true, true);
		//thinking in java		1.0
		//thinking in java IV(Java Classic)		1.0

		TermRangeQuery query = new TermRangeQuery("bookname", "jama", "jana", true, true);// no result
		return query;
	}
	
	public static Query getNumericRangeQuery(){
//		Query query = NumericRangeQuery.newFloatRange("bookname", 0.3f, 0.10f, true, true);// no result
		
		
		/* if let the document add the fields below,(if you want to use NumericRangeQuery, you should create the index using the NumericField)
		 
		doc1.add(new NumericField("value", Field.Store.YES, true).setFloatValue(0.1f));
		doc2.add(new NumericField("value", Field.Store.YES, true).setFloatValue(0.5f));
		doc3.add(new NumericField("value", Field.Store.YES, true).setFloatValue(0.1f));
		
		
		将结果输出中的那句改成System.out.print(doc.get("value") + "\t\t");
		结果:
		0.1		1.0
		0.5		1.0
		0.1		1.0

		*/
		Query query = NumericRangeQuery.newFloatRange("value", null, null, true, true);// no result
		return query;
	}
	
	/**
	 * maybe some reasons.
	 * the api contains the RegexQuery, and other interfaces relevant to the class. 
	 * but in Lucene 3.0.2, the class has not been contained.
	 * maybe its performance is not satisfying.
	 * @return
	 */
	public static Query getRegexQuery(){
		
		return null;
	}
	
	public static void main(String[] args) throws Exception {
		Directory dir = new RAMDirectory();
		
		IndexWriter writer = new IndexWriter(
				dir, new StandardAnalyzer(Version.LUCENE_30), true,
				IndexWriter.MaxFieldLength.LIMITED);
		
		Document doc1 = new Document();
		Document doc2 = new Document();
		Document doc3 = new Document();
		
		doc1.add(new Field("bookname","thinking in java", Field.Store.YES, Field.Index.ANALYZED));
		doc2.add(new Field("bookname","thinking in java IV(Java Classic)", Field.Store.YES, Field.Index.ANALYZED));
		doc3.add(new Field("bookname","thinking in c++", Field.Store.YES, Field.Index.ANALYZED));
		
		writer.addDocument(doc1);
		writer.addDocument(doc2);
		writer.addDocument(doc3);
		
		writer.optimize();
		writer.close();
		
		IndexSearcher searcher = new IndexSearcher(dir);
		
		Query query = QueryTest.getNumericRangeQuery();
		
		TopScoreDocCollector collector = TopScoreDocCollector.create(100, false);
		searcher.search(query, collector);
		
		ScoreDoc[] hits = collector.topDocs().scoreDocs;
		for(int i=0; i<hits.length;i++){
			Document doc = searcher.doc(hits[i].doc);
			System.out.print(doc.get("bookname") + "\t\t");
			System.out.println(hits[i].score);
		}
	}
}

9. Lucene中的Ranking算法以及改进

你可能感兴趣的:(apache,算法,windows,Lucene,performance)

机器学习与深度学习间关系与区别 ℒℴѵℯ心·动ꦿ໊ོ꫞ 人工智能学习深度学习 python
一、机器学习概述定义机器学习（MachineLearning,ML）是一种通过数据驱动的方法，利用统计学和计算算法来训练模型，使计算机能够从数据中学习并自动进行预测或决策。机器学习通过分析大量数据样本，识别其中的模式和规律，从而对新的数据进行判断。其核心在于通过训练过程，让模型不断优化和提升其预测准确性。主要类型1.监督学习（SupervisedLearning）监督学习是指在训练数据集中包含输入
linux sdl windows.h,Windows下的SDL安装奔跑吧linux内核 linux sdl windows.h
首先你要下载并安装SDL开发包。如果装在C盘下，路径为C:\SDL1.2.5如果在WINDOWS下。你可以按以下步骤：1.打开VC++，点击"Tools",Options2,点击directories选项3.选择"Includefiles"增加一个新的路径。"C:\SDL1.2.5\include"4，现在选择"Libaryfiles“增加"C:\SDL1.2.5\lib"现在你可以开始编写你的第
Goolge earth studio 进阶4——路径修改与平滑陟彼高冈yu Google earth studio 进阶教程旅游
如果我们希望在大约中途时获得更多的城市鸟瞰视角。可以将相机拖动到这里并创建一个新的关键帧。camera_target_clip_7EarthStudio会自动平滑我们的路径，所以当我们通过这个关键帧时，不是一个生硬的角度，而是一个平滑的曲线。camera_target_clip_8路径上有贝塞尔控制手柄，允许我们调整路径的形状。右键单击，我们可以选择“平滑路径”，这是默认的自动平滑算法，或者我们可
将cmd中命令输出保存为txt文本文件落难Coder Windows cmd window
最近深度学习本地的训练中我们常常要在命令行中运行自己的代码，无可厚非，我们有必要保存我们的炼丹结果，但是复制命令行输出到txt是非常麻烦的，其实Windows下的命令行为我们提供了相应的操作。其基本的调用格式就是：运行指令>输出到的文件名称或者具体保存路径测试下，我打开cmd并且ping一下百度：pingwww.baidu.com>./data.txt看下相同目录下data.txt的输出：如果你再
PHP环境搭建详细教程好看资源平台前端 php
PHP是一个流行的服务器端脚本语言，广泛用于Web开发。为了使PHP能够在本地或服务器上运行，我们需要搭建一个合适的PHP环境。本教程将结合最新资料，介绍在不同操作系统上搭建PHP开发环境的多种方法，包括Windows、macOS和Linux系统的安装步骤，以及本地和Docker环境的配置。1.PHP环境搭建概述PHP环境的搭建主要分为以下几类：集成开发环境：例如XAMPP、WAMP、MAMP，这
基于社交网络算法优化的二维最大熵图像分割智能算法研学社（Jack旭）智能优化算法应用图像分割算法 php 开发语言
智能优化算法应用：基于社交网络优化的二维最大熵图像阈值分割-附代码文章目录智能优化算法应用：基于社交网络优化的二维最大熵图像阈值分割-附代码1.前言2.二维最大熵阈值分割原理3.基于社交网络优化的多阈值分割4.算法结果：5.参考文献：6.Matlab代码摘要：本文介绍基于最大熵的图像分割，并且应用社交网络算法进行阈值寻优。1.前言阅读此文章前，请阅读《图像分割：直方图区域划分及信息统计介绍》htt
121. 买卖股票的最佳时机薄荷糖的味道_fb40
给定一个数组，它的第i个元素是一支给定股票第i天的价格。如果你最多只允许完成一笔交易（即买入和卖出一支股票），设计一个算法来计算你所能获取的最大利润。注意你不能在买入股票前卖出股票。示例1:输入:[7,1,5,3,6,4]输出:5解释:在第2天（股票价格=1）的时候买入，在第5天（股票价格=6）的时候卖出，最大利润=6-1=5。注意利润不能是7-1=6,因为卖出价格需要大于买入价格。示例2:输入:
每日算法&面试题，大厂特训二十八天——第二十天（树）肥学 ⚡算法题⚡面试题每日精进 java 算法数据结构
目录标题导读算法特训二十八天面试题点击直接资料领取导读肥友们为了更好的去帮助新同学适应算法和面试题，最近我们开始进行专项突击一步一步来。上一期我们完成了动态规划二十一天现在我们进行下一项对各类算法进行二十八天的一个小总结。还在等什么快来一起肥学进行二十八天挑战吧！！特别介绍小白练手专栏，适合刚入手的新人欢迎订阅编程小白进阶python有趣练手项目里面包括了像《机器人尬聊》《恶搞程序》这样的有趣文章
回溯算法-重新安排行程 chirou_ 算法数据结构图论 c++图搜索
leetcode332.重新安排行程这题我还没自己ac过，只能现在凭着刚学完的热乎劲把我对题解的理解记下来。本题我认为对数据结构的考察比较多，用什么数据结构去存数据，去读取数据，都是很重要的。classSolution{private:unordered_map>targets;boolbacktracking(intticketNum,vector&result){//1.确定参数和返回值//2
Python 实现图片裁剪（附代码） | Python工具剑客阿良_ALiang
前言本文提供将图片按照自定义尺寸进行裁剪的工具方法，一如既往的实用主义。环境依赖ffmpeg环境安装，可以参考我的另一篇文章：windowsffmpeg安装部署_阿良的博客-CSDN博客本文主要使用到的不是ffmpeg，而是ffprobe也在上面这篇文章中的zip包中。ffmpy安装：pipinstallffmpy-ihttps://pypi.douban.com/simple代码不废话了，上代码
Faiss：高效相似性搜索与聚类的利器网络·魚大数据 faiss
Faiss是一个针对大规模向量集合的相似性搜索库，由FacebookAIResearch开发。它提供了一系列高效的算法和数据结构，用于加速向量之间的相似性搜索，特别是在大规模数据集上。本文将介绍Faiss的原理、核心功能以及如何在实际项目中使用它。Faiss原理：近似最近邻搜索：Faiss的核心功能之一是近似最近邻搜索，它能够高效地在大规模数据集中找到与给定查询向量最相似的向量。这种搜索是近似的，
insert into select 主键自增_mybatis拦截器实现主键自动生成 weixin_39521651 insert into select 主键自增 mybatis delete返回值 mybatis insert返回主键 mybatis insert返回对象 mybatis plus insert返回主键 mybatis plus 插入生成id
前言前阵子和朋友聊天，他说他们项目有个需求，要实现主键自动生成，不想每次新增的时候，都手动设置主键。于是我就问他，那你们数据库表设置主键自动递增不就得了。他的回答是他们项目目前的id都是采用雪花算法来生成，因此为了项目稳定性，不会切换id的生成方式。朋友问我有没有什么实现思路，他们公司的orm框架是mybatis，我就建议他说，不然让你老大把mybatis切换成mybatis-plus。mybat
k均值聚类算法考试例题_k均值算法(k均值聚类算法计算题) 寻找你83497 k均值聚类算法考试例题
?算法：第一步：选K个初始聚类中心，z1(1),z2(1)，…，zK(1)，其中括号内的序号为寻找聚类中心的迭代运算的次序号。聚类中心的向量值可任意设定，例如可选开始的K个.k均值聚类：---------一种硬聚类算法，隶属度只有两个取值0或1，提出的基本根据是“类内误差平方和最小化”准则；模糊的c均值聚类算法：--------一种模糊聚类算法，是.K均值聚类算法是先随机选取K个对象作为初始的聚类
Python实现简单的机器学习算法 master_chenchengg python python 办公效率 python开发 IT
Python实现简单的机器学习算法开篇：初探机器学习的奇妙之旅搭建环境：一切从安装开始必备工具箱第一步：安装Anaconda和JupyterNotebook小贴士：如何配置Python环境变量算法初体验：从零开始的Python机器学习线性回归：让数据说话数据准备：从哪里找数据编码实战：Python实现线性回归模型评估：如何判断模型好坏逻辑回归：从分类开始理论入门：什么是逻辑回归代码实现：使用skl
推荐算法_隐语义-梯度下降 _feivirus_ 算法机器学习和数学推荐算法机器学习隐语义
importnumpyasnp1.模型实现"""inputrate_matrix:M行N列的评分矩阵，值为P*Q.P:初始化用户特征矩阵M*K.Q:初始化物品特征矩阵K*N.latent_feature_cnt:隐特征的向量个数max_iteration:最大迭代次数alpha:步长lamda:正则化系数output分解之后的P和Q"""defLFM_grad_desc(rate_matrix,l
K近邻算法_分类鸢尾花数据集 _feivirus_ 算法机器学习和数学分类机器学习 K近邻
importnumpyasnpimportpandasaspdfromsklearn.datasetsimportload_irisfromsklearn.model_selectionimporttrain_test_splitfromsklearn.metricsimportaccuracy_score1.数据预处理iris=load_iris()df=pd.DataFrame(data=ir
数据结构 | 栈和队列 TT-Kun 数据结构与算法数据结构栈队列 C语言
文章目录栈和队列1.栈：后进先出（LIFO）的数据结构1.1概念与结构1.2栈的实现2.队列：先进先出（FIFO）的数据结构2.1概念与结构2.2队列的实现3.栈和队列算法题3.1有效的括号3.2用队列实现栈3.3用栈实现队列3.4设计循环队列结论栈和队列在计算机科学中，栈和队列是两种基本且重要的数据结构，它们在处理数据存储和访问顺序方面有着独特的规则和应用。本文将详细介绍栈和队列的概念、结构、实
[Python] 数据结构详解及代码 AIAdvocate 算法 python 数据结构链表
今日内容大纲介绍数据结构介绍列表链表1.数据结构和算法简介程序大白话翻译,程序=数据结构+算法数据结构指的是存储,组织数据的方式.算法指的是为了解决实际业务问题而思考思路和方法,就叫:算法.2.算法的5大特性介绍算法具有独立性算法是解决问题的思路和方式,最重要的是思维,而不是语言,其(算法)可以通过多种语言进行演绎.5大特性有输入,需要传入1或者多个参数有输出,需要返回1个或者多个结果有穷性,执行
Java：爬虫框架 dingcho Java java 爬虫
一、ApacheNutch2【参考地址】Nutch是一个开源Java实现的搜索引擎。它提供了我们运行自己的搜索引擎所需的全部工具。包括全文搜索和Web爬虫。Nutch致力于让每个人能很容易,同时花费很少就可以配置世界一流的Web搜索引擎.为了完成这一宏伟的目标,Nutch必须能够做到:每个月取几十亿网页为这些网页维护一个索引对索引文件进行每秒上千次的搜索提供高质量的搜索结果简单来说Nutch支持分
MongoDB知识概括 GeorgeLin98 持久层 mongodb
MongoDB知识概括MongoDB相关概念单机部署基本常用命令索引-IndexSpirngDataMongoDB集成副本集分片集群安全认证MongoDB相关概念业务应用场景：传统的关系型数据库（如MySQL），在数据操作的“三高”需求以及应对Web2.0的网站需求面前，显得力不从心。解释：“三高”需求：①Highperformance-对数据库高并发读写的需求。②HugeStorage-对海量数
Python算法L5：贪心算法小熊同学哦 Python算法算法 python 贪心算法
Python贪心算法简介目录Python贪心算法简介贪心算法的基本步骤贪心算法的适用场景经典贪心算法问题1.**零钱兑换问题**2.**区间调度问题**3.**背包问题**贪心算法的优缺点优点：缺点：结语贪心算法（GreedyAlgorithm）是一种在每一步选择中都采取当前最优或最优解的算法。它的核心思想是，在保证每一步局部最优的情况下，希望通过贪心选择达到全局最优解。虽然贪心算法并不总能得到全
2023最详细的Python安装教程（Windows版本）程序员林哥 Python python windows 开发语言
python安装是学习pyhon第一步，很多刚入门小白不清楚如何安装python，今天我来带大家完成python安装与配置，跟着我一步步来，很简单，你肯定能完成。第一部分：python安装（一）准备工作1、下载和安装python(认准官方网站)当然你不想去下载的话也可以分享给你，还有入门学习教程，点击下方卡片跳转进群领取（二）开始安装对于Windows操作系统，可以下载“executableins
【RabbitMQ 项目】服务端：数据管理模块之绑定管理月夜星辉雪 rabbitmq 分布式
文章目录一.编写思路二.代码实践一.编写思路定义绑定信息类交换机名称队列名称绑定关键字：交换机的路由交换算法中会用到没有是否持久化的标志，因为绑定是否持久化取决于交换机和队列是否持久化，只有它们都持久化时绑定才需要持久化。绑定就好像一根绳子，两端连接着交换机和队列，当一方不存在，它就没有存在的必要了定义绑定持久化类构造函数：如果数据库文件不存在则创建，打开数据库，创建binding_table插入
非对称加密算法原理与应用2——RSA私钥加密文件私语茶馆云部署与开发架构及产品灵感记录 RSA2048 私钥加密
作者：私语茶馆1.相关章节（1）非对称加密算法原理与应用1——秘钥的生成-CSDN博客第一章节讲述的是创建秘钥对，并将公钥和私钥导出为文件格式存储。本章节继续讲如何利用私钥加密内容，包括从密钥库或文件中读取私钥，并用RSA算法加密文件和String。2.私钥加密的概述本文主要基于第一章节的RSA2048bit的非对称加密算法讲述如何利用私钥加密文件。这种加密后的文件，只能由该私钥对应的公钥来解密。
粒子群优化 (PSO) 在三维正弦波函数中的应用 subject625Ruben 机器学习人工智能 matlab 算法
在这篇博客中，我们将展示如何使用粒子群优化（PSO）算法求解三维正弦波函数，并通过增加正弦波扰动，使优化过程更加复杂和有趣。本文将介绍目标函数的定义、PSO参数设置以及算法执行的详细过程，并展示搜索空间中的动态过程和收敛曲线。1.目标函数定义我们使用的目标函数是一个三维正弦波函数，定义如下：objectiveFunc=@(x)sin(sqrt(x(1).^2+x(2).^2))+0.5*sin(5
最简单将静态网页挂载到服务器上(不用nginx) 全能全知者服务器 nginx 运维前端 html 笔记
最简单将静态网页挂载到服务器上(不用nginx)如果随便弄个静态网页挂在服务器都要用nignx就太麻烦了，所以直接使用Apache来搭建一些简单前端静态网页会相对方便很多检查Web服务器服务状态：sudosystemctlstatushttpd#ApacheWeb服务器如果发现没有安装web服务器：安装Apache：sudoyuminstallhttpd启动Apache：sudosystemctl
windows下python opencv ffmpeg读取摄像头实现rtsp推流拉流图像处理大大大大大牛啊 opencv实战代码讲解视觉图像项目 windows python opencv
windows下pythonopencvffmpeg读取摄像头实现rtsp推流拉流整体流程1.下载所需文件1.1下载rtsp推流服务器1.2下载ffmpeg2.开启RTSP服务器3.opencv读取摄像头并调用ffmpeg进行推流4.opencv进行拉流5.opencv异步拉流整体流程1.下载所需文件1.1下载rtsp推流服务器下载RTSP服务器下载页面https://github.com/blu
非对称加密算法————RSA理论及详情 hu19930613
转自：https://www.kancloud.cn/kancloud/rsa_algorithm/48484一、一点历史1976年以前，所有的加密方法都是同一种模式：（1）甲方选择某一种加密规则，对信息进行加密；（2）乙方使用同一种规则，对信息进行解密。由于加密和解密使用同样规则（简称"密钥"），这被称为"对称加密算法"（Symmetric-keyalgorithm）。这种加密模式有一个最大弱点
ai绘画工具midjourney怎么下载？附作品管理教程设计师早上好
Midjourney是一款功能强大的AI绘画工具，它使用机器学习技术和深度神经网络等算法，可以生成各种艺术风格的绘画作品。在创意设计、广告宣传等方面有着广泛的应用前景。那么，ai绘画工具midjourney怎么下载？本文将为您介绍Midjourney的下载以及作品的相关管理。一、Midjourney下载Midjourney的下载非常简单，只需打开Midjourney官网（点击“GetMidjour
【加密算法基础——对称加密和非对称加密】 XWWW668899 网络安全服务器笔记
对称加密与非对称加密对称加密和非对称加密是两种基本的加密方法，各自有不同的特点和用途。以下是详细比较：1.对称加密特点密钥:使用相同的密钥进行加密和解密。发送方和接收方必须共享这个密钥。速度:通常速度较快，适合处理大量数据。实现:算法相对简单，计算效率高。常见算法AES(高级加密标准)DES(数据加密标准)3DES(三重数据加密标准)RC4(流密码)应用场景文件加密磁盘加密传输大量数据时的加密2.
Js函数返回值 _wy_ js return
一、返回控制与函数结果，语法为：return 表达式;作用: 结束函数执行，返回调用函数，而且把表达式的值作为函数的结果二、返回控制语法为：return;作用: 结束函数执行，返回调用函数，而且把undefined作为函数的结果在大多数情况下,为事件处理函数返回false,可以防止默认的事件行为.例如,默认情况下点击一个<a>元素,页面会跳转到该元素href属性
MySQL 的 char 与 varchar bylijinnan mysql
今天发现，create table 时，MySQL 4.1有时会把 char 自动转换成 varchar 测试举例： CREATE TABLE `varcharLessThan4` ( `lastName` varchar(3) ) ; mysql> desc varcharLessThan4; +----------+---------+------+-
Quartz——TriggerListener和JobListener eksliang TriggerListener JobListener quartz
转载请出自出处：http://eksliang.iteye.com/blog/2208624 一.概述 listener是一个监听器对象，用于监听scheduler中发生的事件，然后执行相应的操作；你可能已经猜到了，TriggerListeners接受与trigger相关的事件，JobListeners接受与jobs相关的事件。二.JobListener监听器 j
oracle层次查询 18289753290 oracle；层次查询；树查询
.oracle层次查询(connect by) oracle的emp表中包含了一列mgr指出谁是雇员的经理，由于经理也是雇员，所以经理的信息也存储在emp表中。这样emp表就是一个自引用表，表中的mgr列是一个自引用列，它指向emp表中的empno列，mgr表示一个员工的管理者， select empno,mgr,ename,sal from e
通过反射把map中的属性赋值到实体类bean对象中酷的飞上天空 javaee 泛型类型转换
使用过struts2后感觉最方便的就是这个框架能自动把表单的参数赋值到action里面的对象中但现在主要使用Spring框架的MVC，虽然也有@ModelAttribute可以使用但是明显感觉不方便。好吧，那就自己再造一个轮子吧。原理都知道，就是利用反射进行字段的赋值，下面贴代码主要类如下： import java.lang.reflect.Field; imp
SAP HANA数据存储：传统硬盘的瓶颈问题蓝儿唯美 HANA
SAPHANA平台有各种各样的应用场景，这也意味着客户的实施方法有许多种选择，关键是如何挑选最适合他们需求的实施方案。在《Implementing SAP HANA》这本书中，介绍了SAP平台在现实场景中的运作原理，并给出了实施建议和成功案例供参考。本系列文章节选自《Implementing SAP HANA》，介绍了行存储和列存储的各自特点，以及SAP HANA的数据存储方式如何提升空间压
Java Socket 多线程实现文件传输随便小屋 java socket
高级操作系统作业，让用Socket实现文件传输，有些代码也是在网上找的，写的不好，如果大家能用就用上。客户端类： package edu.logic.client; import java.io.BufferedInputStream; import java.io.Buffered
java初学者路径 aijuans java
学习Java有没有什么捷径?要想学好Java，首先要知道Java的大致分类。自从Sun推出Java以来，就力图使之无所不包，所以Java发展到现在，按应用来分主要分为三大块：J2SE,J2ME和J2EE,这也就是Sun ONE(Open Net Environment)体系。J2SE就是Java2的标准版，主要用于桌面应用软件的编程；J2ME主要应用于嵌入是系统开发，如手机和PDA的编程；J2EE
APP推广 aoyouzi APP 推广
一，免费篇 1，APP推荐类网站自主推荐最美应用、酷安网、DEMO8、木蚂蚁发现频道等,如果产品独特新颖，还能获取最美应用的评测推荐。PS：推荐简单。只要产品有趣好玩，用户会自主分享传播。例如足迹APP在最美应用推荐一次，几天用户暴增将服务器击垮。 2，各大应用商店首发合作老实盯着排期，多给应用市场官方负责人献殷勤。 3，论坛贴吧推广百度知道，百度贴吧，猫扑论坛，天涯社区，豆瓣（
JSP转发与重定向百合不是茶 jsp servlet Java Web jsp转发
在servlet和jsp中我们经常需要请求,这时就需要用到转发和重定向; 转发包括;forward和include 例子;forwrad转发; 将请求装法给reg.html页面关键代码; req.getRequestDispatcher("reg.html
web.xml之jsp-config bijian1013 java web.xml servlet jsp-config
1.作用：主要用于设定JSP页面的相关配置。 2.常见定义： <jsp-config> <taglib> <taglib-uri>URI(定义TLD文件的URI,JSP页面的tablib命令可以经由此URI获取到TLD文件)</tablib-uri> <taglib-location> TLD文件所在的位置
JSF2.2 ViewScoped Using CDI sunjing CDI JSF 2.2 ViewScoped
JSF 2.0 introduced annotation @ViewScoped; A bean annotated with this scope maintained its state as long as the user stays on the same view(reloads or navigation - no intervening views). One problem w
【分布式数据一致性二】Zookeeper数据读写一致性 bit1129 zookeeper
很多文档说Zookeeper是强一致性保证，事实不然。关于一致性模型请参考http://bit1129.iteye.com/blog/2155336 Zookeeper的数据同步协议 Zookeeper采用称为Quorum Based Protocol的数据同步协议。假如Zookeeper集群有N台Zookeeper服务器(N通常取奇数，3台能够满足数据可靠性同时
Java开发笔记白糖_ java开发
1、Map<key,value>的remove方法只能识别相同类型的key值 Map<Integer,String> map = new HashMap<Integer,String>(); map.put(1,"a"); map.put(2,"b"); map.put(3,"c"
图片黑色阴影 bozch 图片
.event{ padding:0; width:460px; min-width: 460px; border:0px solid #e4e4e4; height: 350px; min-heig
编程之美-饮料供货-动态规划 bylijinnan 动态规划
import java.util.Arrays; import java.util.Random; public class BeverageSupply { /** * 编程之美饮料供货 * 设Opt（V’，i）表示从i到n-1种饮料中，总容量为V’的方案中，满意度之和的最大值。 * 那么递归式就应该是：Opt（V’，i）=max{ k * Hi+Op
ajax大参数（大数据）提交性能分析 chenbowen00 Web Ajax 框架浏览器 prototype
近期在项目中发现如下一个问题项目中有个提交现场事件的功能，该功能主要是在web客户端保存现场数据（主要有截屏，终端日志等信息）然后提交到服务器上方便我们分析定位问题。客户在使用该功能的过程中反应点击提交后反应很慢，大概要等10到20秒的时间浏览器才能操作，期间页面不响应事件。根据客户描述分析了下的代码流程，很简单，主要通过OCX控件截屏，在将前端的日志等文件使用OCX控件打包，在将之转换为
[宇宙与天文]在太空采矿,在太空建造 comsci
我们在太空进行工业活动...但是不太可能把太空工业产品又运回到地面上进行加工,而一般是在哪里开采,就在哪里加工,太空的微重力环境,可能会使我们的工业产品的制造尺度非常巨大.... 地球上制造的最大工业机器是超级油轮和航空母舰,再大些就会遇到困难了,但是在空间船坞中,制造的最大工业机器,可能就没
ORACLE中CONSTRAINT的四对属性 daizj oracle CONSTRAINT
ORACLE中CONSTRAINT的四对属性 summary:在data migrate时,某些表的约束总是困扰着我们,让我们的migratet举步维艰,如何利用约束本身的属性来处理这些问题呢?本文详细介绍了约束的四对属性: Deferrable/not deferrable, Deferred/immediate, enalbe/disable, validate/novalidate,以及如
Gradle入门教程 dengkane gradle
一、寻找gradle的历程一开始的时候，我们只有一个工程，所有要用到的jar包都放到工程目录下面，时间长了，工程越来越大，使用到的jar包也越来越多，难以理解jar之间的依赖关系。再后来我们把旧的工程拆分到不同的工程里，靠ide来管理工程之间的依赖关系，各工程下的jar包依赖是杂乱的。一段时间后，我们发现用ide来管理项程很不方便，比如不方便脱离ide自动构建，于是我们写自己的ant脚本。再后
C语言简单循环示例 dcj3sjt126com c
# include <stdio.h> int main(void) { int i; int count = 0; int sum = 0; float avg; for (i=1; i<=100; i++) { if (i%2==0) { count++; sum += i; } } avg
presentModalViewController 的动画效果 dcj3sjt126com controller
系统自带(四种效果)： presentModalViewController模态的动画效果设置： [cpp] view plain copy UIViewController *detailViewController = [[UIViewController al
java 二分查找 shuizhaosi888 二分查找 java二分查找
需求：在排好顺序的一串数字中，找到数字T 一般解法：从左到右扫描数据，其运行花费线性时间O(N)。然而这个算法并没有用到该表已经排序的事实。 /** * * @param array * 顺序数组 * @param t * 要查找对象 * @return */ public stati
Spring Security（07）——缓存UserDetails 234390216 ehcache 缓存 Spring Security
Spring Security提供了一个实现了可以缓存UserDetails的UserDetailsService实现类，CachingUserDetailsService。该类的构造接收一个用于真正加载UserDetails的UserDetailsService实现类。当需要加载UserDetails时，其首先会从缓存中获取，如果缓存中没
Dozer 深层次复制 jayluns VO maven po
最近在做项目上遇到了一些小问题，因为架构在做设计的时候web前段展示用到了vo层，而在后台进行与数据库层操作的时候用到的是Po层。这样在业务层返回vo到控制层，每一次都需要从po-->转化到vo层，用到BeanUtils.copyProperties(source, target)只能复制简单的属性，因为实体类都配置了hibernate那些关联关系，所以它满足不了现在的需求，但后发现还有个很
CSS规范整理（摘自懒人图库） a409435341 html UI css 浏览器
刚没事闲着在网上瞎逛，找了一篇CSS规范整理，粗略看了一下后还蛮有一定的道理，并自问是否有这样的规范，这也是初入前端开发的人一个很好的规范吧。一、文件规范 1、文件均归档至约定的目录中。具体要求通过豆瓣的CSS规范进行讲解：所有的CSS分为两大类：通用类和业务类。通用的CSS文件，放在如下目录中：基本样式库 /css/core
C++动态链接库创建与使用你不认识的休道人 C++dll
一、创建动态链接库 1.新建工程test中选择”MFC [dll]”dll类型选择第二项"Regular DLL With MFC shared linked"，完成 2.在test.h中添加 extern “C” 返回类型 _declspec(dllexport)函数名(参数列表); 3.在test.cpp中最后写 extern “C” 返回类型 _decls
Android代码混淆之ProGuard rensanning ProGuard
Android应用的Java代码，通过反编译apk文件（dex2jar、apktool）很容易得到源代码，所以在release版本的apk中一定要混淆一下一些关键的Java源码。 ProGuard是一个开源的Java代码混淆器（obfuscation）。ADT r8开始它被默认集成到了Android SDK中。官网： http://proguard.sourceforge.net/
程序员在编程中遇到的奇葩弱智问题 tomcat_oracle jquery 编程 ide
　　现在收集一下：　　排名不分先后，按照发言顺序来的。 1、Jquery插件一个通用函数一直报错，尤其是很明显是存在的函数，很有可能就是你没有引入jquery。。。或者版本不对 2、调试半天没变化：不在同一个文件中调试。这个很可怕，我们很多时候会备份好几个项目，改完发现改错了。有个群友说的好：在汤匙
解决maven-dependency-plugin (goals "copy-dependencies","unpack") is not supported xp9802 dependency
解决办法：在plugins之前添加如下pluginManagement，二者前后顺序如下： [html] view plain copy <build> <pluginManagement

Lucene 3.0.2 代码 分析

你可能感兴趣的:(apache,算法,windows,Lucene,performance)

Lucene 3.0.2 代码分析