屌丝又发少年狂
左键盘,右鼠标
闲来无事学搜索。
好久没有学习新的技术了,一直在项目中晃悠。
Lucene,上学那会儿就想学习学习了,遥记得当年毕业答辩的时候我用的搜索功能是直接查询数据库,被答辩的老师狠狠的鄙视了。
下面是一个入门级别的demo,写到这里或许可以帮助都别人,主要的是为了自己以后备忘。
step1:下载lucene
step2:新建项目helloworld,并引进必要的包,[You need four JARs: the Lucene JAR, the queryparser JAR, the common analysis JAR, and the Lucene demo JAR. You should see the Lucene JAR file in the core/ directory you created when you extracted the archive -- it should be named something like lucene-core-{version}.jar. You should also see files called lucene-queryparser-{version}.jar, lucene-analyzers-common-{version}.jar and lucene-demo-{version}.jar under queryparser, analysis/common/ and demo/, respectively.] 我们这个demo中用到的是以下几个包:
lucene-analyzers-common-4.4.0.jar ,
lucene-core-4.4.0.jar ,
lucene-queryparser-4.4.0.jar.
step3:准备测试用的文档,我的是
public static final String PATH_INDEX = "E:\\temp\\index"; public static final String PATH_FILE = "E:\\temp\\document";
PATH_INDEX存放的是索引文件,PATH_FILE存放的是测试文档。
step4:前3步是准备工作,现在开始写代码。代码就不一一描述了,在关键地方我都添加了注释
code1:新建/更新索引
package org.i94livng.lucene; import java.io.BufferedReader; import java.io.File; import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io.IOException; import java.io.InputStreamReader; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field.Store; import org.apache.lucene.document.StringField; import org.apache.lucene.document.TextField; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.index.Term; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; /** * 建索引 * @author HuHongyu * */ public class LuceneIndex { public static final String PATH_INDEX = "E:\\temp\\index"; public static final String PATH_FILE = "E:\\temp\\document"; /** * 创建索引 * @throws IOException */ public void createIndex()throws IOException{ File fileSource = new File(PATH_FILE); File[] files = fileSource.listFiles(); IndexWriter indexWriter = getIndexWriter(); for (File file : files) { Document document = new Document(); document.add(new StringField("path", file.getPath(), Store.YES)); document.add(new TextField("content", getDocumentContent(file), Store.YES)); document.add(new StringField("fileName",file.getName(),Store.YES)); //System.out.println(document.get("path")); //indexWriter.addDocument(document); //避免重复建索引,暂时用文件名进行区分 indexWriter.updateDocument(new Term("fileName", file.getName()), document); } indexWriter.close(); } /** * 获取文件中的数据 * @param file * @return * @throws Exception */ public static String getDocumentContent(File file){ String content = new String(); StringBuffer sb = new StringBuffer();; try { FileInputStream fis = new FileInputStream(file); BufferedReader bfr = new BufferedReader(new InputStreamReader(fis)); String s; while((s=bfr.readLine())!=null){ sb.append(s+"\n"); } fis.close(); bfr.close(); } catch (FileNotFoundException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } content = sb.toString(); return content; } /** * 获取IndexWriter * @return * @throws Exception */ public static IndexWriter getIndexWriter() { IndexWriter indexWriter = null; try { Directory directory = FSDirectory.open(new File(PATH_INDEX)); Analyzer luceneAnalyzer = new StandardAnalyzer(Version.LUCENE_44); IndexWriterConfig indexWriterConfig = new IndexWriterConfig(Version.LUCENE_44, luceneAnalyzer); indexWriter = new IndexWriter(directory, indexWriterConfig); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } return indexWriter; } }
PS:由于我想读取到文件中的内容,目前我是文件流吧文档中的内容取出来,再放到document中。
code2:搜索文档
package org.i94livng.lucene; import java.io.File; import java.io.IOException; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.index.DirectoryReader; import org.apache.lucene.index.IndexReader; import org.apache.lucene.queryparser.classic.ParseException; import org.apache.lucene.queryparser.classic.QueryParser; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.TopDocs; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; public class LuceneSearch { public static List<ResultModel> search(Map<String,String> searchMap)throws IOException,ParseException{ Directory directory = FSDirectory.open(new File(LuceneIndex.PATH_INDEX)); String serachContent = searchMap.get("content"); IndexReader reader = DirectoryReader.open(directory); IndexSearcher searcher = new IndexSearcher(reader); Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_44); //组装查询条件 QueryParser parser = new QueryParser(Version.LUCENE_44, "content",analyzer); // 将关键字包装成Query对象 Query query = parser.parse(serachContent); //获取准确度就高的5条记录 TopDocs topDocs = searcher.search(query, 5); //获取纪律列表 ScoreDoc[]scoreDocs = topDocs.scoreDocs; //把查询出来的结果转成document 并组装到 ResultModel中 List<ResultModel> resultList = new ArrayList<ResultModel>(); for (ScoreDoc scoreDoc : scoreDocs) { ResultModel resultModel = new ResultModel(); int docId = scoreDoc.doc; Document document = searcher.doc(docId); resultModel.setFilePath(document.get("path")); resultModel.setFileContent(document.get("content")); resultModel.setFileName(document.get("fileName")); resultList.add(resultModel); } return resultList; } public static void main(String[] args)throws IOException , ParseException{ LuceneIndex index = new LuceneIndex(); index.createIndex(); Map<String,String> searchMap = new HashMap<String,String>(); //搜索文档中出现“中”的文档 searchMap.put("content", "中"); List<ResultModel> result = search(searchMap); for (ResultModel resultModel : result) { System.out.println(resultModel.toString()); } } }
code3:搜索出来的结果实体。
package org.i94livng.lucene; public class ResultModel { private String fileName; private String fileContent; private String filePath; public String getFileName() { return fileName; } public void setFileName(String fileName) { this.fileName = fileName; } public String getFileContent() { return fileContent; } public void setFileContent(String fileContent) { this.fileContent = fileContent; } public String getFilePath() { return filePath; } public void setFilePath(String filePath) { this.filePath = filePath; } @Override public String toString() { return "ResultModel [fileName=" + fileName + ", fileContent=" + fileContent + ", filePath=" + filePath + "]"; } }
OK,代码就这样了,我执行main方法得到的结果如下
ResultModel [fileName=test3.txt, fileContent=中华人民共和国, filePath=E:\temp\document\test3.txt]
ResultModel [fileName=test5.txt, fileContent=中华人民共和国, filePath=E:\temp\document\test5.txt]
PS:项目代码见附件。