Elasticsearch Term Vectors(词频统计)

Term Vectors API returns information and statistics on terms in the fields of a particular document. The document could be stored in the index or artificially provided by the user.

作用:如果想进行全文检索,即从一个词搜索与它相关的文档,这就是Term Vectors。

Term Vectors记录的信息:

doc_freq索引中有几个记录出现、ttf该词在全部记录中出现的次数、term_freq在当前文档词的频率

tokens:position位置、start_offset开始的偏移值、end_offset结束的偏移值

Elasticsearch Term Vectors(词频统计)_第1张图片

JAVA:

@Test
    public void testTermVectors() throws IOException {
        
        TermVectorsRequest request = new TermVectorsRequest("article_index", "15");
        request.setFields("keyword");
        request.setFieldStatistics(true); 
        request.setTermStatistics(true); 
        request.setPositions(true); 
        request.setOffsets(true); 
        request.setPayloads(false); 

        Map filterSettings = new HashMap<>();
        filterSettings.put("max_num_terms", 3);
        filterSettings.put("min_term_freq", 1);
        filterSettings.put("max_term_freq", 10);
        filterSettings.put("min_doc_freq", 1);
        filterSettings.put("max_doc_freq", 100);
        filterSettings.put("min_word_length", 1);
        filterSettings.put("max_word_length", 10);

        request.setFilterSettings(filterSettings); 
        TermVectorsResponse response =  client.termvectors(request, RequestOptions.DEFAULT);
        
        List termVectorList = response.getTermVectorsList();
        for(TermVector termVector:termVectorList) {
            String fieldName = termVector.getFieldName();
            FieldStatistics fieldStatistics = termVector.getFieldStatistics();
            List terms = termVector.getTerms();
            for(Term term:terms) {
                System.out.println("----term---"+term.getTerm()+"  -DocFreq:-" + term.getDocFreq()+"  -TermFreq:-"+term.getTermFreq()+"--"+term.getTokens());
            }
        }
    }

 

Elasticsearch Term Vectors(词频统计)_第2张图片

 

更多代码请参考:https://github.com/hsn999/Elasticsearch_7_springboot_demo

 

 

你可能感兴趣的:(java)