【起因】
正常查某索引下全部数据的dsl举例如下:
POST /fcar_city/city/_search?scroll=10m
{
"query": {
"bool": {
"must": [
{
"match_all": { }
}
]
}
}
}
我的意图是把该索引下的全部数据查询出来,上述代码查询结果如下:
{
"_shards": {
"total": 5,
"failed": 0,
"successful": 5
},
"hits": {
"hits": [
{
"_index": "fcar_city",
"_type": "city",
"_source": {
"t_b_city|administrative_name": "扬州",
"t_b_city|create_emp": "1",
"t_b_city|create_time": "2016-06-28 11:59:58",
"t_b_city|id": "60",
"t_b_city|modify_time": "2016-06-28 11:59:58",
"t_b_city|operate_range": "1",
"t_b_city|channel_status": "2",
"t_b_city|is_business": "1",
"t_b_city|modify_emp": "1",
"t_b_city|name": "扬州",
"t_b_city|en_name": "yz"
},
"_id": "60",
"_score": 1
},
{
"_index": "fcar_city",
"_type": "city",
"_source": {
"t_b_city|administrative_name": "通化",
"t_b_city|create_emp": "1",
"t_b_city|create_time": "2016-06-28 11:59:58",
"t_b_city|id": "44",
"t_b_city|modify_time": "2016-06-28 11:59:58",
"t_b_city|operate_range": "1",
"t_b_city|channel_status": "2",
"t_b_city|is_business": "1",
"t_b_city|modify_emp": "1",
"t_b_city|name": "通化",
"t_b_city|en_name": "th"
},
"_id": "44",
"_score": 1
},
{
"_index": "fcar_city",
"_type": "city",
"_source": {
"t_b_city|create_emp": "1",
"t_b_city|create_time": "2016-06-28 11:59:58",
"t_b_city|modify_time": "2016-10-09 08:40:00",
"t_b_city|center_lat": "28.656386",
"t_b_city|is_business": "1",
"t_b_city|modify_emp": "253",
"t_b_city|name": "台州",
"t_b_city|en_name": "tz",
"t_b_city|administrative_name": "台州",
"t_b_city|id": "48",
"t_b_city|operate_range": "2",
"t_b_city|channel_status": "2",
"t_b_city|status": "2",
"t_b_city|center_lon": "121.420757"
},
"_id": "48",
"_score": 1
},
{
"_index": "fcar_city",
"_type": "city",
"_source": {
"t_b_city|administrative_name": "咸阳",
"t_b_city|create_emp": "1",
"t_b_city|create_time": "2016-06-28 11:59:58",
"t_b_city|id": "52",
"t_b_city|modify_time": "2016-06-28 11:59:58",
"t_b_city|operate_range": "1",
"t_b_city|channel_status": "2",
"t_b_city|is_business": "1",
"t_b_city|modify_emp": "1",
"t_b_city|name": "咸阳",
"t_b_city|en_name": "xiy"
},
"_id": "52",
"_score": 1
},
{
"_index": "fcar_city",
"_type": "city",
"_source": {
"t_b_city|administrative_name": "烟台",
"t_b_city|create_emp": "1",
"t_b_city|create_time": "2016-06-28 11:59:58",
"t_b_city|id": "29",
"t_b_city|modify_time": "2016-06-28 11:59:58",
"t_b_city|operate_range": "1",
"t_b_city|channel_status": "2",
"t_b_city|is_business": "1",
"t_b_city|modify_emp": "1",
"t_b_city|name": "烟台",
"t_b_city|en_name": "yt"
},
"_id": "29",
"_score": 1
},
{
"_index": "fcar_city",
"_type": "city",
"_source": {
"t_b_city|administrative_name": "晋城",
"t_b_city|create_emp": "1",
"t_b_city|create_time": "2016-06-28 11:59:58",
"t_b_city|id": "40",
"t_b_city|modify_time": "2016-06-28 11:59:58",
"t_b_city|operate_range": "1",
"t_b_city|channel_status": "2",
"t_b_city|is_business": "1",
"t_b_city|modify_emp": "1",
"t_b_city|name": "晋城",
"t_b_city|en_name": "jc"
},
"_id": "40",
"_score": 1
},
{
"_index": "fcar_city",
"_type": "city",
"_source": {
"t_b_city|administrative_name": "聊城",
"t_b_city|create_emp": "1",
"t_b_city|create_time": "2016-06-28 11:59:58",
"t_b_city|id": "41",
"t_b_city|modify_time": "2016-06-28 11:59:58",
"t_b_city|operate_range": "1",
"t_b_city|channel_status": "2",
"t_b_city|is_business": "1",
"t_b_city|modify_emp": "1",
"t_b_city|name": "聊城",
"t_b_city|en_name": "lc"
},
"_id": "41",
"_score": 1
},
{
"_index": "fcar_city",
"_type": "city",
"_source": {
"t_b_city|administrative_name": "柳州",
"t_b_city|create_emp": "1",
"t_b_city|create_time": "2016-06-28 11:59:58",
"t_b_city|id": "22",
"t_b_city|modify_time": "2016-06-28 11:59:58",
"t_b_city|operate_range": "1",
"t_b_city|channel_status": "2",
"t_b_city|is_business": "1",
"t_b_city|modify_emp": "1",
"t_b_city|name": "柳州",
"t_b_city|en_name": "lz"
},
"_id": "22",
"_score": 1
},
{
"_index": "fcar_city",
"_type": "city",
"_source": {
"t_b_city|administrative_name": "萍乡",
"t_b_city|create_emp": "1",
"t_b_city|create_time": "2016-06-28 11:59:58",
"t_b_city|id": "24",
"t_b_city|modify_time": "2016-06-28 11:59:58",
"t_b_city|operate_range": "1",
"t_b_city|channel_status": "2",
"t_b_city|is_business": "1",
"t_b_city|modify_emp": "1",
"t_b_city|name": "萍乡",
"t_b_city|en_name": "px"
},
"_id": "24",
"_score": 1
},
{
"_index": "fcar_city",
"_type": "city",
"_source": {
"t_b_city|administrative_name": "随州",
"t_b_city|create_emp": "1",
"t_b_city|create_time": "2016-06-28 11:59:58",
"t_b_city|id": "25",
"t_b_city|modify_time": "2016-06-28 11:59:58",
"t_b_city|operate_range": "1",
"t_b_city|channel_status": "2",
"t_b_city|is_business": "1",
"t_b_city|modify_emp": "1",
"t_b_city|name": "随州",
"t_b_city|en_name": "sz"
},
"_id": "25",
"_score": 1
}
],
"total": 152,
"max_score": 1
},
"took": 3,
"timed_out": false
}
不难发现,tota显示l一共152条,但是默认只查了10条,这就是我前几天遇到的一个问题。
鉴于上一篇博客,我尝试通过使用from,size搭配,改写了dsl,如下:
POST /fcar_city/city/_search
{
"query": {
"bool": {
"must": [
{
"match_all": { }
}
]
}
},
"from": 0,
"size": 1000
}
可见,此时已经查出来全部的152条记录,但是通过from,size查询,就像我上一篇博客所说,可能会耗费性能较大,而且导致“Result window is too large”的问题,之后通过查询官方网站,scroll走进我的视线里。
【Scroll】
es官方对scroll特性介绍的第一句话是这样:
A scroll query is used to retrieve large numbers of documents from Elasticsearch efficiently, without paying the penalty of deep pagination.
即scroll适用于大量数据的查询,而且无需担心深度分页带来的问题。
基本写法如下:
GET /old_index/_search?scroll=1m
{
"query": { "match_all": {}},
"sort" : ["_doc"],
"size": 1000
}
注意2点:
(1)scroll=1m,代表scroll开启时间为1分钟;
(2)“_doc”是最有效的排序手段。
当在“_search”之后使用了“scroll”,即使“size”设置的很大,也不会出现“Result window is too large”问题,亲测。而且对cup占用过大对问题也没有出现,原因就在于scroll的原理上。其中的奥妙就在这2段介绍中:
Scrolling allows us to do an initial search and to keep pulling batches of results from Elasticsearch until there are no more results left. It’s a bit like a cursor in a traditional database.
A scrolled search takes a snapshot in time. It doesn’t see any changes that are made to the index after the initial search request has been made. It does this by keeping the old data files around, so that it can preserve its “view” on what the index looked like at the time it started.
可见,scroll所查询的,正式某一个时刻的“snapshot”,类似于视图,所以说,对于实时性要求特别高的场景,不适合适用scroll,l列表查询的话,通过from,size也是OK的。查询“字典表”的所有数据,适用scroll就很有必要。
同时要滚动查看结果,我们执行搜索请求并将scroll值设置为我们要保持滚动窗口打开的时间长度。每次运行滚动请求时都会刷新滚动到期时间,因此只需要足够长的时间来处理当前批次的结果,而不是所有与查询匹配的文档。超时非常重要,因为保持滚动窗口打开会消耗资源,我们希望在不再需要它们时立即释放它们。设置超时使Elasticsearch能够在一段时间不活动后自动释放资源。
so,that's all. 后续分享java代码对scroll的封装。
作者:暂7师师长常乃超
来源:CSDN
原文:https://blog.csdn.net/zzh920625/article/details/84556548
版权声明:本文为博主原创文章,转载请附上博文链接!