Lucene中的近实时搜索SearcherManager

近实时搜索(near-real-time)可以搜索IndexWriter还未commit的内容。

Index索引的刷新过程:

只有IndexWriter上的commit操作才会导致Ram Directory内存上的数据完全同步到文件。
IndexWriter提供了实时获得reader的API,这个调用将会导致flush操作,生成新的segment,但不会commit (fsync),从而减少了IO。新的segment被加入到新生成的reader里。从返回的reader中可以看到更新。
所以,只要每次新的搜索都从IndexWriter获得一个新的reader,就可以搜索到最新的内容。这一操作的开销仅仅是flush,相对commmit来说,开销很小。

Lucene的index索引组织方式为一个index目录下的多个segment片段,新的doc会加入新的segment里,这些新的小segment每间隔一段时间就会合并起来。因为合并,总的sgement数量保持的较小,总体的search速度仍然很快。
为了防止读写冲突,lucene只创建新的segment,并对任何active状态的reader,不在使用后删除老的segment。
flush就是把数据写入操作系统的缓冲区,只要缓冲区不满,就不会有硬盘操作。
commit是把所有内存缓冲区内的数据写入到硬盘,是完全的硬盘操作,属于重量级的操作。这是因为Lucene索引中最主要的结构posting倒排通过VInt类型和delta的格式存储并紧密排列。合并时要对同一个term的posting(倒排)进行归并排序,是一个读出,合并再生成的过程。

SearchManager近实时搜索 实现原理:

Lucene通过NRTManager这个类来实现近实时搜索,所谓近实时搜索也就是在索引发生改变时,通过线程跟踪,在相对很短的时间内反映给用户程序的 调用NRTManager通过管理IndexWriter对象,并将IndexWriter的一些方法进行增删改,例如:addDocument,deleteDocument等方法暴漏给客户调用,它的操作全部在内存里面,所以如果你不调用IndexWriter的commit方法,通过以上的操作,用户硬盘里面的索引库是不会变化的,所以你每次更新完索引库请记得commit掉,这样才能将变化的索引一起写到硬盘中。

实现索引更新后的同步用户每次获取最新索引(IndexSearcher),可以通过两种方式:

第一种是通过调用NRTManagerReopenThread对象,该线程负责实时跟踪索引内存的变化,每次变化就调用maybeReopen方法,保持最新代索引,打开一个新的IndexSearcher对象,而用户所要的IndexSearcher对象是NRTManager通过调用getSearcherManager方法获得SearcherManager对象,然后通过SearcherManager对象获取IndexSearcher对象返回个客户使用,用户使用完之后调用SearcherManager的release释放IndexSearcher对象,最后记得关闭NRTManagerReopenThread;
第二种方式是不通过NRTManagerReopenThread对象,而是直接调用NRTManager的maybeReopen方法来获取最新的IndexSearcher对象来获取最新索引.

    public void testSearch() throws IOException {

        Directory directory = FSDirectory.open(new File("/root/data/03"));
        SearcherManager sm = new SearcherManager(directory, null);
        IndexSearcher searcher = sm.acquire();
        // IndexReader reader = DirectoryReader.open(directory);
        // IndexSearcher searcher = new IndexSearcher(reader);
        Query query = new TermQuery(new Term("title", "test"));
        TopDocs results = searcher.search(query, null, 100);
        System.out.println(results.totalHits);
        ScoreDoc[] docs = results.scoreDocs;
        for (ScoreDoc doc : docs) {
            System.out.println("doc inertalid:" + doc.doc + " ,docscore:" + doc.score);
            Document document = searcher.doc(doc.doc);
            System.out.println("id:" + document.get("id") + " ,title:" + document.get("title"));
        }
        sm.release(searcher);
        sm.close();
    }
    public void testUpdateAndSearch() throws IOException, InterruptedException {

        Directory directory = FSDirectory.open(new File("/root/data/03"));

        Analyzer analyzer = new StandardAnalyzer();
        IndexWriterConfig config = new IndexWriterConfig(Version.LATEST, analyzer);
        config.setOpenMode(OpenMode.CREATE_OR_APPEND);
        IndexWriter writer = new IndexWriter(directory, config);
        TrackingIndexWriter trackingWriter = new TrackingIndexWriter(writer);
        SearcherManager sm = new SearcherManager(writer, true, null);
        ControlledRealTimeReopenThread thread = new ControlledRealTimeReopenThread(trackingWriter, sm, 60, 1);
        thread.setDaemon(true);
        thread.setName("NRT Index Manager Thread");
        thread.start();

        Document doc = new Document();
        Field idField = new StringField("id", "3", Store.YES);
        Field titleField = new TextField("title", "test for 3", Store.YES);
        doc.add(idField);
        doc.add(titleField);
        long gerenation = trackingWriter.updateDocument(new Term("id", "2"), doc);
        // Thread.sleep(1000);
        // writer.close();
        // sm.maybeRefresh();
        // sm = new SearcherManager(writer, true, null);
        thread.waitForGeneration(gerenation);
        IndexSearcher searcher = sm.acquire();
        Query query = new TermQuery(new Term("title", "test"));
        TopDocs results = searcher.search(query, null, 100);
        System.out.println(results.totalHits);
        ScoreDoc[] docs = results.scoreDocs;
        for (ScoreDoc scoreDoc : docs) {
            System.out.println("doc inertalid:" + scoreDoc.doc + " ,docscore:" + scoreDoc.score);
            Document document = searcher.doc(scoreDoc.doc);
            System.out.println("id:" + document.get("id") + " ,title:" + document.get("title"));
        }
        sm.release(searcher);
        sm.close();

        // IndexSearcher searcher = sm.acquire();

        // IndexReader reader = DirectoryReader.open(directory);
        // IndexSearcher searcher = new IndexSearcher(reader);
        // Query query = new TermQuery(new Term("title", "test"));
        // TopDocs results = searcher.search(query, null, 100);
        // System.out.println(results.totalHits);
        // ScoreDoc[] docs = results.scoreDocs;
        // for (ScoreDoc doc : docs) {
        // System.out.println("doc inertalid:" + doc.doc + " ,docscore:" +
        // doc.score);
        // Document document = searcher.doc(doc.doc);
        // System.out.println("id:" + document.get("id") + " ,title:" +
        // document.get("title"));
        // }
        // sm.release(searcher);
    }

创建索引:

    public void testBulidIndex() throws IOException {
        Directory directory = FSDirectory.open(new File("/root/data/03"));
        // Directory directory=new RAMDirectory();
        Analyzer analyzer = new StandardAnalyzer();
        IndexWriterConfig config = new IndexWriterConfig(Version.LATEST, analyzer);
        config.setOpenMode(OpenMode.CREATE);
        IndexWriter writer = new IndexWriter(directory, config);
        Document doc1 = new Document();
        Field idField1 = new StringField("id", "1", Store.YES);
        Field titleField1 = new TextField("title", "test for 1", Store.YES);
        doc1.add(idField1);
        doc1.add(titleField1);
        writer.addDocument(doc1);

        Document doc2 = new Document();
        Field idField2 = new StringField("id", "2", Store.YES);
        Field titleField2 = new TextField("title", "test for 2", Store.YES);
        doc2.add(idField2);
        doc2.add(titleField2);
        writer.addDocument(doc2);

        writer.commit();
        writer.close();
    }

你可能感兴趣的:(Java,Lucene,搜索引擎)