一、域索引选项:
//Field.Index(索引域选项) //Index.ANALYZED:进行分词和索引,适用于标题、内容等 //Index.NOT_ANALYZED:进行索引,但是不进行分词,如果身份证号,姓名,ID等,适用于精确搜索 //Index.ANALYZED_NOT_NORMS:进行分词但是不存储norms信息,这个norms中包括了创建索引的时间和权值等信息 //Index.NOT_ANALYZED_NOT_NORMS:即不进行分词也不存储norms信息 //Index.NO:不进行索引
//Field.Store.YES或者NO(存储域选项) //设置为YES表示或把这个域中的内容完全存储到文件中,方便进行文本的还原 //设置为NO表示把这个域的内容不存储到文件中,但是可以被索引,此时内容无法完全还原(doc.get)
package org.it.index; import java.io.File; import java.io.IOException; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.DirectoryReader; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; @SuppressWarnings("all") public class IndexUtil { private String[] ids={"1","2","3","4","5","6"}; private String[] emails={"[email protected]","[email protected]","[email protected]","[email protected]","[email protected]","[email protected]","[email protected]"}; private String[] contents={ "welcome to visited the space","hello boy","my name is cc","I like movie and swim","jijoij","wwwww","jjjjjjjjjjjj" }; private int[] attachs ={2,3,4,3,5,6}; private String[] names={"shanghai","zhangsan","john","mike","jake","jakedd"}; //1.创建Directory private Directory directory=null; public IndexUtil(){ try { directory=FSDirectory.open(new File("d:/lucene/index02")); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } public void query(){ try { IndexReader reader = IndexReader.open(directory); //存储的文档数 <span style="color:#FF0000;">//小知识点:maxDoc 和 numDocs()方法的区别:maxDoc() //返回索引中删除和未被删除的文档总数,后者返回索引中未被删除的文档总数</span> System.out.println("numDocs "+reader.numDocs()); //存储的文档总数 System.out.println("maxDocs "+reader.maxDoc()); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } public void index(){ //创建IndexWriter IndexWriter writer=null; try { writer=new IndexWriter(directory, new IndexWriterConfig(Version.LUCENE_4_9, new StandardAnalyzer(Version.LUCENE_4_9))); // 3.创建Document对象 Document doc=null; for(int i=0;i<ids.length;i++){ doc=new Document(); // 4.为Document添加Field doc.add(new Field("id",ids[i],Field.Store.YES,Field.Index.NOT_ANALYZED_NO_NORMS)); doc.add(new Field("emails",emails[i],Field.Store.YES,Field.Index.NOT_ANALYZED_NO_NORMS)); doc.add(new Field("contents",contents[i],Field.Store.NO,Field.Index.NOT_ANALYZED)); doc.add(new Field("names",names[i],Field.Store.YES,Field.Index.NOT_ANALYZED_NO_NORMS)); // 4通过IndexWriter 添加文档到索引中 writer.addDocument(doc); } } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); }finally { if (writer != null) { try { // 关闭流 writer.close(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } } } }
以上内容主要是回顾创建索引---》怎么设置域--》查询索引
还有域存储 域索引的概念
查询的时候用IndexReader:
IndexReader reader = IndexReader.open(directory);
举一反三:在上一节中的HelloLucene中,
doc.add(new Field("content", new FileReader(file)));
传统方法Reader----->BufferReader,现在我们这样用:
加载jar包: commons-io-2.4.jar
利用工具类,先转换为字符串就OK,其它步骤和id,path一样存储实现的。
String content=FileUtils.readFileToString(file); System.out.println(content);