数据量共约3000万+,在使用es进行term聚合的时候,发现执行耗费时间巨大,因此采用了msearch的检索方式
多搜索接口编辑 多搜索 API 从单个 API 请求执行多个搜索。 请求的格式类似于批量 API 格式,并使用 换行符分隔的 JSON (NDJSON) 格式。
结构类型于下
GET my-index-000001/_msearch
{ }
{"query" : {"match" : { "message": "this is a test"}}}
{"index": "my-index-000002"}
{"query" : {"match_all" : {}}}
使用kibana进行msearch操作
POST cqu_dev_journal_paper/_msearch
{"index":["cqu_dev_journal_paper"],"types":[],"search_type":"dfs_query_then_fetch","ccs_minimize_roundtrips":true}
{"from":10,"size":10,"query":{"bool":{"must":[{"exists":{"field":"title","boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"version":true,"explain":false,"_source":{"includes":["id","sid","title","author","journalId","journalVolumeId","publisherName","doi","pageInfo"],"excludes":[]},"sort":[{"createTime":{"order":"desc"}}],"track_total_hits":2147483647}
{"index":["cqu_dev_journal_paper"],"types":[],"search_type":"query_then_fetch","ccs_minimize_roundtrips":true}
{"from":0,"size":1,"query":{"bool":{"must":[{"exists":{"field":"title","boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"version":true,"explain":false,"_source":{"includes":["id"],"excludes":[]},"aggregations":{"year":{"terms":{"field":"year","size":50,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}],"collect_mode":"breadth_first"}}}}
{"index":["cqu_dev_journal_paper"],"types":[],"search_type":"query_then_fetch","ccs_minimize_roundtrips":true}
{"from":0,"size":1,"query":{"bool":{"must":[{"exists":{"field":"title","boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"version":true,"explain":false,"_source":{"includes":["id"],"excludes":[]},"aggregations":{"subject":{"terms":{"field":"subject","size":50,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}],"collect_mode":"breadth_first"}}}}
{"index":["cqu_dev_journal_paper"],"types":[],"search_type":"query_then_fetch","ccs_minimize_roundtrips":true}
{"from":0,"size":1,"query":{"bool":{"must":[{"exists":{"field":"title","boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"version":true,"explain":false,"_source":{"includes":["id"],"excludes":[]},"aggregations":{"publisherName":{"terms":{"field":"publisherName.raw","size":50,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}],"collect_mode":"breadth_first"}}}}
{"index":["cqu_dev_journal_paper"],"types":[],"search_type":"query_then_fetch","ccs_minimize_roundtrips":true}
{"from":0,"size":1,"query":{"bool":{"must":[{"exists":{"field":"title","boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"version":true,"explain":false,"_source":{"includes":["id"],"excludes":[]},"aggregations":{"journalId":{"terms":{"field":"journalId","size":50,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}],"collect_mode":"breadth_first"}}}}
{"index":["cqu_dev_journal_paper"],"types":[],"search_type":"query_then_fetch","ccs_minimize_roundtrips":true}
{"from":0,"size":1,"query":{"bool":{"must":[{"exists":{"field":"title","boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"version":true,"explain":false,"_source":{"includes":["id"],"excludes":[]},"aggregations":{"publishCountry":{"terms":{"field":"publishCountry.raw","size":50,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}],"collect_mode":"breadth_first"}}}}
{"index":["cqu_dev_journal_paper"],"types":[],"search_type":"query_then_fetch","ccs_minimize_roundtrips":true}
{"from":0,"size":1,"query":{"bool":{"must":[{"exists":{"field":"title","boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"version":true,"explain":false,"_source":{"includes":["id"],"excludes":[]},"aggregations":{"language":{"terms":{"field":"language.raw","size":50,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}],"collect_mode":"breadth_first"}}}}
msearch查询的数据结构如下,
header\n
body\n
header\n
body\n
因此下面语句中第一个json就是header,第二个是请求的body,
{"index":["cqu_dev_journal_paper"],"types":[],"search_type":"dfs_query_then_fetch","ccs_minimize_roundtrips":true}
{"from":10,"size":10,"query":{"bool":{"must":[{"exists":{"field":"title","boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"version":true,"explain":false,"_source":{"includes":["id","sid","title","author","journalId","journalVolumeId","publisherName","doi","pageInfo"],"excludes":[]},"sort":[{"createTime":{"order":"desc"}}],"track_total_hits":2147483647}
官网对于请求的说明:
地址: es 多搜索接口
请求正文包含以换行符分隔的搜索列表和 搜索对象。
(必填,对象) 用于限制或更改搜索的参数。
此对象对于每个搜索正文都是必需的,但可以为空 () 或空白 线。{}
(可选,对象) 包含搜索请求的参数:
对象的属性
aggregations
(可选,聚合对象) 您希望在搜索期间运行的聚合。请参阅聚合。
query
(可选,查询 DSL 对象)您希望在 搜索。响应中将返回与此查询匹配的命中。
from
(可选,整数) 返回命中的起始偏移量。默认值为 。0
size
(可选,整数) 要返回的命中数。默认值为 。10
使用SpringDataElaticsearch进行对es检索的操作,代码如下:
MultiSearch 检索
//MultiSearch 查询
List<Query> queryList = new ArrayList<>();
NativeSearchQueryBuilder listNativeSearchQueryBuilder = new NativeSearchQueryBuilder();
//返回数据总数量
listNativeSearchQueryBuilder.withTrackTotalHits(true);
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
boolQueryBuilder.must(QueryBuilders.existsQuery("title"));
//设置检索词和聚合项的查询条件
listNativeSearchQueryBuilder.withQuery(boolQueryBuilder);
//构造分页排序
Pageable pageable = PageRequest.of(1, 10, Sort.by(Sort.Direction.DESC, "createTime"));
listNativeSearchQueryBuilder.withPageable(pageable);
//设置返回字段
listNativeSearchQueryBuilder.withFields("id", "sid", "title", "author", "journalId", "journalVolumeId", "publisherName", "doi", "pageInfo");
//列表查询的query放入查询列表
queryList.add(listNativeSearchQueryBuilder.build());
HashMap<String, String> aggFieldMap = new HashMap<>();
aggFieldMap.put("subject", "subject");
aggFieldMap.put("year", "year");
aggFieldMap.put("journalId", "journalId");
aggFieldMap.put("publisherName", "publisherName.raw");
aggFieldMap.put("language", "language.raw");
aggFieldMap.put("publishCountry", "publishCountry.raw");
NativeSearchQueryBuilder aggNativeSearchQueryBuilder;
for (String name : aggFieldMap.keySet()) {
aggNativeSearchQueryBuilder = new NativeSearchQueryBuilder();
aggNativeSearchQueryBuilder.withQuery(boolQueryBuilder);
//表示不返回列表数据
aggNativeSearchQueryBuilder.withPageable(PageRequest.of(0, 1));
aggNativeSearchQueryBuilder.withSearchType(SearchType.QUERY_THEN_FETCH);
aggNativeSearchQueryBuilder.withFields("id");
aggNativeSearchQueryBuilder.addAggregation(AggregationBuilders.terms(name).field(aggFieldMap.get(name))
.collectMode(Aggregator.SubAggCollectionMode.BREADTH_FIRST).size(50));
queryList.add(aggNativeSearchQueryBuilder.build());
}
long pageStart = System.currentTimeMillis();
//multiSearch:允许在一次请求中执行多个查询操作,并将查询结果一起返回
List<SearchHits<JournalPaperEsBean>> searchHitsList = elasticsearchOperations.multiSearch(queryList, JournalPaperEsBean.class);
System.out.println("==========es agg search:" + (System.currentTimeMillis() - pageStart));
}
执行耗费时间:3723ms
search检索
@Test
void test() {
//search 查询
NativeSearchQueryBuilder nativeSearchQueryBuilder = new NativeSearchQueryBuilder();
//返回数据总数量
nativeSearchQueryBuilder.withTrackTotalHits(true);
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
boolQueryBuilder.must(QueryBuilders.existsQuery("title"));
//设置检索词和聚合项的查询条件
nativeSearchQueryBuilder.withQuery(boolQueryBuilder);
//构造分页排序
Pageable pageable = PageRequest.of(1, 10, Sort.by(Sort.Direction.DESC, "createTime"));
nativeSearchQueryBuilder.withPageable(pageable);
nativeSearchQueryBuilder.addAggregation(AggregationBuilders.terms("subject").field("subject").size(50));
nativeSearchQueryBuilder.addAggregation(AggregationBuilders.terms("year").field("year").size(50).order(BucketOrder.key(false)));
nativeSearchQueryBuilder.addAggregation(AggregationBuilders.terms("journalId").field("journalId").size(50));
nativeSearchQueryBuilder.addAggregation(AggregationBuilders.terms("publisherName").field("publisherName.raw").size(50));
nativeSearchQueryBuilder.addAggregation(AggregationBuilders.terms("language").field("language.raw").size(50));
nativeSearchQueryBuilder.addAggregation(AggregationBuilders.terms("publishCountry").field("publishCountry.raw").size(50));
//设置返回字段
nativeSearchQueryBuilder.withFields("id", "sid", "title", "author", "journalId", "journalVolumeId", "publisherName", "doi", "pageInfo");
long start = System.currentTimeMillis();
//执行检索操作
SearchHits<JournalPaperEsBean> searchHits = elasticsearchOperations.search(nativeSearchQueryBuilder.build(), JournalPaperEsBean.class);
System.out.println("=====agg:" + (System.currentTimeMillis() - start));
}
耗时:15298ms
对比可以发现:使用msearch比search效率提高了近5倍!