Elasticsearch使用msearch提高聚合效率(与search检索对比)

描述

数据量共约3000万+,在使用es进行term聚合的时候,发现执行耗费时间巨大,因此采用了msearch的检索方式

多搜索接口编辑 多搜索 API 从单个 API 请求执行多个搜索。 请求的格式类似于批量 API 格式,并使用 换行符分隔的 JSON (NDJSON) 格式。
结构类型于下

GET my-index-000001/_msearch
{ }
{"query" : {"match" : { "message": "this is a test"}}}
{"index": "my-index-000002"}
{"query" : {"match_all" : {}}}

kibana查询操作

使用kibana进行msearch操作

POST cqu_dev_journal_paper/_msearch
{"index":["cqu_dev_journal_paper"],"types":[],"search_type":"dfs_query_then_fetch","ccs_minimize_roundtrips":true}
{"from":10,"size":10,"query":{"bool":{"must":[{"exists":{"field":"title","boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"version":true,"explain":false,"_source":{"includes":["id","sid","title","author","journalId","journalVolumeId","publisherName","doi","pageInfo"],"excludes":[]},"sort":[{"createTime":{"order":"desc"}}],"track_total_hits":2147483647}
{"index":["cqu_dev_journal_paper"],"types":[],"search_type":"query_then_fetch","ccs_minimize_roundtrips":true}
{"from":0,"size":1,"query":{"bool":{"must":[{"exists":{"field":"title","boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"version":true,"explain":false,"_source":{"includes":["id"],"excludes":[]},"aggregations":{"year":{"terms":{"field":"year","size":50,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}],"collect_mode":"breadth_first"}}}}
{"index":["cqu_dev_journal_paper"],"types":[],"search_type":"query_then_fetch","ccs_minimize_roundtrips":true}
{"from":0,"size":1,"query":{"bool":{"must":[{"exists":{"field":"title","boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"version":true,"explain":false,"_source":{"includes":["id"],"excludes":[]},"aggregations":{"subject":{"terms":{"field":"subject","size":50,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}],"collect_mode":"breadth_first"}}}}
{"index":["cqu_dev_journal_paper"],"types":[],"search_type":"query_then_fetch","ccs_minimize_roundtrips":true}
{"from":0,"size":1,"query":{"bool":{"must":[{"exists":{"field":"title","boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"version":true,"explain":false,"_source":{"includes":["id"],"excludes":[]},"aggregations":{"publisherName":{"terms":{"field":"publisherName.raw","size":50,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}],"collect_mode":"breadth_first"}}}}
{"index":["cqu_dev_journal_paper"],"types":[],"search_type":"query_then_fetch","ccs_minimize_roundtrips":true}
{"from":0,"size":1,"query":{"bool":{"must":[{"exists":{"field":"title","boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"version":true,"explain":false,"_source":{"includes":["id"],"excludes":[]},"aggregations":{"journalId":{"terms":{"field":"journalId","size":50,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}],"collect_mode":"breadth_first"}}}}
{"index":["cqu_dev_journal_paper"],"types":[],"search_type":"query_then_fetch","ccs_minimize_roundtrips":true}
{"from":0,"size":1,"query":{"bool":{"must":[{"exists":{"field":"title","boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"version":true,"explain":false,"_source":{"includes":["id"],"excludes":[]},"aggregations":{"publishCountry":{"terms":{"field":"publishCountry.raw","size":50,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}],"collect_mode":"breadth_first"}}}}
{"index":["cqu_dev_journal_paper"],"types":[],"search_type":"query_then_fetch","ccs_minimize_roundtrips":true}
{"from":0,"size":1,"query":{"bool":{"must":[{"exists":{"field":"title","boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"version":true,"explain":false,"_source":{"includes":["id"],"excludes":[]},"aggregations":{"language":{"terms":{"field":"language.raw","size":50,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}],"collect_mode":"breadth_first"}}}}


语句解析

msearch查询的数据结构如下,

header\n
body\n
header\n
body\n

因此下面语句中第一个json就是header,第二个是请求的body,

{"index":["cqu_dev_journal_paper"],"types":[],"search_type":"dfs_query_then_fetch","ccs_minimize_roundtrips":true}
{"from":10,"size":10,"query":{"bool":{"must":[{"exists":{"field":"title","boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"version":true,"explain":false,"_source":{"includes":["id","sid","title","author","journalId","journalVolumeId","publisherName","doi","pageInfo"],"excludes":[]},"sort":[{"createTime":{"order":"desc"}}],"track_total_hits":2147483647}

官网对于请求的说明:
地址: es 多搜索接口

请求正文包含以换行符分隔的搜索列表和 搜索对象。
(必填,对象) 用于限制或更改搜索的参数。 此对象对于每个搜索正文都是必需的,但可以为空 () 或空白 线。{} (可选,对象) 包含搜索请求的参数: 对象的属性 aggregations (可选,聚合对象) 您希望在搜索期间运行的聚合。请参阅聚合。 query (可选,查询 DSL 对象)您希望在 搜索。响应中将返回与此查询匹配的命中。 from (可选,整数) 返回命中的起始偏移量。默认值为 。0 size (可选,整数) 要返回的命中数。默认值为 。10

代码操作验证对比

使用SpringDataElaticsearch进行对es检索的操作,代码如下:

MultiSearch 检索

  //MultiSearch 查询
        List<Query> queryList = new ArrayList<>();

        NativeSearchQueryBuilder listNativeSearchQueryBuilder = new NativeSearchQueryBuilder();
        //返回数据总数量
        listNativeSearchQueryBuilder.withTrackTotalHits(true);
        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
        boolQueryBuilder.must(QueryBuilders.existsQuery("title"));

        //设置检索词和聚合项的查询条件
        listNativeSearchQueryBuilder.withQuery(boolQueryBuilder);

        //构造分页排序
        Pageable pageable = PageRequest.of(1, 10, Sort.by(Sort.Direction.DESC, "createTime"));
        listNativeSearchQueryBuilder.withPageable(pageable);

        //设置返回字段
        listNativeSearchQueryBuilder.withFields("id", "sid", "title", "author", "journalId", "journalVolumeId", "publisherName", "doi", "pageInfo");

        //列表查询的query放入查询列表
        queryList.add(listNativeSearchQueryBuilder.build());

        HashMap<String, String> aggFieldMap = new HashMap<>();
        aggFieldMap.put("subject", "subject");
        aggFieldMap.put("year", "year");
        aggFieldMap.put("journalId", "journalId");
        aggFieldMap.put("publisherName", "publisherName.raw");
        aggFieldMap.put("language", "language.raw");
        aggFieldMap.put("publishCountry", "publishCountry.raw");

        NativeSearchQueryBuilder aggNativeSearchQueryBuilder;
        for (String name : aggFieldMap.keySet()) {
            aggNativeSearchQueryBuilder = new NativeSearchQueryBuilder();
            aggNativeSearchQueryBuilder.withQuery(boolQueryBuilder);
            //表示不返回列表数据
            aggNativeSearchQueryBuilder.withPageable(PageRequest.of(0, 1));
            aggNativeSearchQueryBuilder.withSearchType(SearchType.QUERY_THEN_FETCH);
            aggNativeSearchQueryBuilder.withFields("id");
            aggNativeSearchQueryBuilder.addAggregation(AggregationBuilders.terms(name).field(aggFieldMap.get(name))
                    .collectMode(Aggregator.SubAggCollectionMode.BREADTH_FIRST).size(50));
            queryList.add(aggNativeSearchQueryBuilder.build());
        }

        long  pageStart = System.currentTimeMillis();
        //multiSearch:允许在一次请求中执行多个查询操作,并将查询结果一起返回
        List<SearchHits<JournalPaperEsBean>> searchHitsList = elasticsearchOperations.multiSearch(queryList, JournalPaperEsBean.class);
        System.out.println("==========es agg search:" + (System.currentTimeMillis() - pageStart));

    }

执行耗费时间:3723ms

search检索

@Test
    void test() {
        //search 查询
        NativeSearchQueryBuilder nativeSearchQueryBuilder = new NativeSearchQueryBuilder();
        //返回数据总数量
        nativeSearchQueryBuilder.withTrackTotalHits(true);
        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
        boolQueryBuilder.must(QueryBuilders.existsQuery("title"));

        //设置检索词和聚合项的查询条件
        nativeSearchQueryBuilder.withQuery(boolQueryBuilder);

        //构造分页排序
        Pageable pageable = PageRequest.of(1, 10, Sort.by(Sort.Direction.DESC, "createTime"));
        nativeSearchQueryBuilder.withPageable(pageable);

        nativeSearchQueryBuilder.addAggregation(AggregationBuilders.terms("subject").field("subject").size(50));
        nativeSearchQueryBuilder.addAggregation(AggregationBuilders.terms("year").field("year").size(50).order(BucketOrder.key(false)));
        nativeSearchQueryBuilder.addAggregation(AggregationBuilders.terms("journalId").field("journalId").size(50));
        nativeSearchQueryBuilder.addAggregation(AggregationBuilders.terms("publisherName").field("publisherName.raw").size(50));
        nativeSearchQueryBuilder.addAggregation(AggregationBuilders.terms("language").field("language.raw").size(50));
        nativeSearchQueryBuilder.addAggregation(AggregationBuilders.terms("publishCountry").field("publishCountry.raw").size(50));

        //设置返回字段
        nativeSearchQueryBuilder.withFields("id", "sid", "title", "author", "journalId", "journalVolumeId", "publisherName", "doi", "pageInfo");

        long start = System.currentTimeMillis();
        //执行检索操作
        SearchHits<JournalPaperEsBean> searchHits = elasticsearchOperations.search(nativeSearchQueryBuilder.build(), JournalPaperEsBean.class);

        System.out.println("=====agg:" + (System.currentTimeMillis() - start));
    }

耗时:15298ms

对比可以发现:使用msearch比search效率提高了近5倍!

你可能感兴趣的:(Elasticsearch,elasticsearch,搜索引擎,大数据)