lucene 可以自己建立操作日志,刚在源码中发现,给个我刚建的日志文件:
IFD [Wed Dec 22 22:08:20 CST 2010; main]: setInfoStream deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@15dfd77 IW 0 [Wed Dec 22 22:08:20 CST 2010; main]: setInfoStream: dir=org.apache.lucene.store.SimpleFSDirectory@G:\package\lucene_test_dir lockFactory=org.apache.lucene.store.NativeFSLockFactory@1027b4d mergePolicy=org.apache.lucene.index.LogByteSizeMergePolicy@c55e36 mergeScheduler=org.apache.lucene.index.ConcurrentMergeScheduler@1ac3c08 ramBufferSizeMB=16.0 maxBufferedDocs=-1 maxBuffereDeleteTerms=-1 maxFieldLength=10000 index= maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens IW 0 [Wed Dec 22 22:08:23 CST 2010; main]: optimize: index now IW 0 [Wed Dec 22 22:08:23 CST 2010; main]: flush: now pause all indexing threads IW 0 [Wed Dec 22 22:08:23 CST 2010; main]: flush: segment=_0 docStoreSegment=_0 docStoreOffset=0 flushDocs=true flushDeletes=true flushDocStores=false numDocs=104 numBufDelTerms=0 IW 0 [Wed Dec 22 22:08:23 CST 2010; main]: index before flush IW 0 [Wed Dec 22 22:08:23 CST 2010; main]: DW: flush postings as segment _0 numDocs=104 IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: DW: oldRAMSize=2619392 newFlushedSize=1740286 docs/MB=62.663 new/old=66.439% IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: flushedFiles=[_0.nrm, _0.tis, _0.fnm, _0.tii, _0.frq, _0.prx] IFD [Wed Dec 22 22:08:24 CST 2010; main]: now checkpoint "segments_1" [1 segments ; isCommit = false] IFD [Wed Dec 22 22:08:24 CST 2010; main]: now checkpoint "segments_1" [1 segments ; isCommit = false] IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: LMP: findMerges: 1 segments IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: LMP: level 6.2247195 to 6.2380013: 1 segments IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: CMS: now merge IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: CMS: index: _0:C104->_0 IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: CMS: no more merges pending; now return IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: CMS: now merge IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: CMS: index: _0:C104->_0 IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: CMS: no more merges pending; now return IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: now flush at close IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: flush: now pause all indexing threads IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: flush: segment=null docStoreSegment=_0 docStoreOffset=104 flushDocs=false flushDeletes=true flushDocStores=true numDocs=0 numBufDelTerms=0 IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: index before flush _0:C104->_0 IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: flush shared docStore segment _0 IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: flushDocStores segment=_0 IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: closeDocStores segment=_0 IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: DW: closeDocStore: 2 files to flush to segment _0 numDocs=104 IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: flushDocStores files=[_0.fdt, _0.fdx] IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: CMS: now merge IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: CMS: index: _0:C104->_0 IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: CMS: no more merges pending; now return IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: now call final commit() IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: startCommit(): start sizeInBytes=0 IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: startCommit index=_0:C104->_0 changeCount=3 IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: now sync _0.nrm IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: now sync _0.tis IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: now sync _0.fnm IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: now sync _0.tii IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: now sync _0.frq IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: now sync _0.fdx IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: now sync _0.prx IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: now sync _0.fdt IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: done all syncs IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: commit: pendingCommit != null IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: commit: wrote segments file "segments_2" IFD [Wed Dec 22 22:08:24 CST 2010; main]: now checkpoint "segments_2" [1 segments ; isCommit = true] IFD [Wed Dec 22 22:08:24 CST 2010; main]: deleteCommits: now decRef commit "segments_1" IFD [Wed Dec 22 22:08:24 CST 2010; main]: delete "segments_1" IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: commit: done IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: at close: _0:C104->_0
接下来是我的建立索引的类,代码大多借鉴lucene自带的demo
indexer类用来建立索引:
package my.firstest.copy; import java.io.File; import java.io.FileNotFoundException; import java.io.IOException; import java.io.PrintStream; import java.util.Date; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; public class Indexer { private static File INDEX_DIR = new File("G:/package/lucene_test_dir"); private static final File docDir = new File("G:/package/lucene_test_docs"); public static void main(String[] args) throws Exception { if (!docDir.exists() || !docDir.canRead()) { System.out.println("索引的文件不存在!"); System.exit(1); } int fileCount=INDEX_DIR.list().length; if(fileCount!=0){ System.out.println("The old files is existed, begin to delete these files"); File[] files=INDEX_DIR.listFiles(); for(int i=0;i<fileCount;i++){ files[i].delete(); System.out.println("File "+files[i].getAbsolutePath()+"is deleted!"); } } Date start = new Date(); IndexWriter writer = new IndexWriter(FSDirectory.open(INDEX_DIR), new StandardAnalyzer(Version.LUCENE_CURRENT), true, IndexWriter.MaxFieldLength.LIMITED); writer.setUseCompoundFile(false); //writer.setMergeFactor(2); writer.setInfoStream(new PrintStream(new File("G:/package/lucene_test_log/log.txt"))); System.out.println("MergeFactor -> "+writer.getMergeFactor()); System.out.println("maxMergeDocs -> "+writer.getMergeFactor()); indexDocs(writer, docDir); writer.optimize(); writer.close(); Date end = new Date(); System.out.println("takes "+(end.getTime() - start.getTime()) + "milliseconds"); } protected static void indexDocs(IndexWriter writer, File file) throws IOException { if (file.canRead()) { if (file.isDirectory()) { String[] files = file.list(); if (files != null) { for (int i = 0; i < files.length; i++) { indexDocs(writer, new File(file, files[i])); } } } else { System.out.println("adding " + file); try { writer.addDocument(FileDocument.Document(file)); } catch (FileNotFoundException fnfe) { ; } } } } }
FileDocument:
package my.firstest.copy; import java.io.File; import java.io.FileReader; import org.apache.lucene.document.DateTools; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; public class FileDocument { public static Document Document(File f) throws java.io.FileNotFoundException { Document doc = new Document(); doc.add(new Field("path", f.getPath(), Field.Store.YES, Field.Index.NOT_ANALYZED)); doc.add(new Field("modified", DateTools.timeToString(f.lastModified(), DateTools.Resolution.MINUTE), Field.Store.YES, Field.Index.NOT_ANALYZED)); doc.add(new Field("contents", new FileReader(f))); return doc; } private FileDocument() { } }
关键就是writer.setInfoStream(new PrintStream(new File("G:/package/lucene_test_log/log.txt")));
在lucene的代码里,很多地方多充斥着类似:
if (infoStream != null) { message("init: hit exception on init; releasing write lock"); }
者个message方法时:
public void message(String message) { if (infoStream != null) infoStream.println("IW " + messageID + " [" + new Date() + "; " + Thread.currentThread().getName() + "]: " + message); }
这里的infoStream是IndexWriter的一个属性:
private PrintStream infoStream = null;
这个属性不去设置它是为null的
可以用writer.setInfoStream(PrintStream infoStream);这个方法去设置它
设置了以后日志信息就会自动写入到自己设的文件中去了.