ElasticSearch默认采用的分页方式是 from+ size 的形式,这种形式下,如果数据量不大或者from、size不大的情况下,效率还是蛮高的。但是在深度分页的情况下,这种使用方式效率是非常低的,并发一旦过大,还有可能直接拖垮整个ElasticSearch的集群。
那么 from+ size 这种分页方式的原理是什么呢?
我们知道,ElasticSearch本身就是分布式的,数据基本上会均匀的分布在各个分片上,比如当一个查询,from=990,size=10,ElasticSearch会在每个分片上都先获取1000个文档,然后通过Coordinating Node聚合所有结果,通过排序选取前1000个文档,最后再从这1000个选择出来的文档中,挑选出10个文档反回给客户端。试想一下,如果from=100000,size=10,三个分片,那么每个分片都要查询出100010个文档,那么Coordinating Node聚合三个分片的数据,就是3*100010个文档,这就非常恐怖了,会占用大量的内存
现在如果我们要做以下的操作:当前index采用了3个分片,from=100000,size=10,那么这种情况就是一种典型的深度分页。根据上面我们说的分页知识,这时候每个分片都要查询出100010个文档,那么Coordinating Node聚合三个分片的数据,就是3*100010个文档,这就非常恐怖了,会占用大量的内存。偶尔的查询可能还好,如果恰好遇到大的并发,直接就会把内存给打爆的,拖垮整个ElasticSearch集群。所以为了避免深度分页带来的内存开销,ElasticSearch内部有一个默认设定,即最多只能查询前10000个文档。那么如果产品必须要做深度分页,那么应该采取什么方案呢?这个时候,Search_After就开始登场了。
Search_After通过维护一个实时游标来避免scroll的缺点,它可以用于实时请求和高并发场景。
每个文档具有一个唯一值的字段应该用作排序规范的仲裁器。否则,具有相同排序值的文档的排序顺序将是未定义的。建议的方法是使用字段_id或者业务Id,保证是每个文档的一个唯一值。
POST /kibana_sample_data_ecommerce/_search
{
"size": 1,
"query": {
"match": {
"customer_first_name": "Brigitte"
}
},
"sort": [
{ "order_date": {"order": "desc"}},
{"order_id": "asc"}
]
}
返回结果:
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 135,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "kibana_sample_data_ecommerce",
"_type" : "_doc",
"_id" : "KEtz4XcBG0yVqdDwEOv7",
"_score" : null,
"_source" : {
"category" : [
"Women's Shoes",
"Women's Clothing"
],
"currency" : "EUR",
"customer_first_name" : "Brigitte",
"customer_full_name" : "Brigitte Cross",
"customer_gender" : "FEMALE",
"customer_id" : 12,
"customer_last_name" : "Cross",
"customer_phone" : "",
"day_of_week" : "Saturday",
"day_of_week_i" : 5,
"email" : "[email protected]",
"manufacturer" : [
"Tigress Enterprises",
"Pyramidustries"
],
"order_date" : "2021-03-13T23:22:34+00:00",
"order_id" : 592088,
"products" : [
{
"base_price" : 36.99,
"discount_percentage" : 0,
"quantity" : 1,
"manufacturer" : "Tigress Enterprises",
"tax_amount" : 0,
"product_id" : 14810,
"category" : "Women's Shoes",
"sku" : "ZO0026600266",
"taxless_price" : 36.99,
"unit_discount_amount" : 0,
"min_price" : 19.23,
"_id" : "sold_product_592088_14810",
"discount_amount" : 0,
"created_on" : "2016-12-31T23:22:34+00:00",
"product_name" : "Ankle boots - black",
"price" : 36.99,
"taxful_price" : 36.99,
"base_unit_price" : 36.99
},
{
"base_price" : 54.99,
"discount_percentage" : 0,
"quantity" : 1,
"manufacturer" : "Pyramidustries",
"tax_amount" : 0,
"product_id" : 9131,
"category" : "Women's Clothing",
"sku" : "ZO0184501845",
"taxless_price" : 54.99,
"unit_discount_amount" : 0,
"min_price" : 25.3,
"_id" : "sold_product_592088_9131",
"discount_amount" : 0,
"created_on" : "2016-12-31T23:22:34+00:00",
"product_name" : "Light jacket - peacoat",
"price" : 54.99,
"taxful_price" : 54.99,
"base_unit_price" : 54.99
}
],
"sku" : [
"ZO0026600266",
"ZO0184501845"
],
"taxful_total_price" : 91.98,
"taxless_total_price" : 91.98,
"total_quantity" : 2,
"total_unique_products" : 2,
"type" : "order",
"user" : "brigitte",
"geoip" : {
"country_iso_code" : "US",
"location" : {
"lon" : -74,
"lat" : 40.8
},
"region_name" : "New York",
"continent_name" : "North America",
"city_name" : "New York"
},
"event" : {
"dataset" : "sample_ecommerce"
}
},
"sort" : [
1615677754000,
"592088"
]
}
]
}
}
上面的请求会为每一个文档返回一个包含sort排序值的数组。这些sort排序值可以被用于 search_after 参数里以便抓取下一页的数据。比如,我们可以使用最后的一个文档的sort排序值,将它传递给 search_after 参数
POST /kibana_sample_data_ecommerce/_search
{
"size": 1,
"query": {
"match": {
"customer_first_name": "Brigitte"
}
},
"search_after":
[
1615677754000,
"592088"
],
"sort": [
{ "order_date": {"order": "desc"}},
{"order_id": "asc"}
]
}
返回结果:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 135,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "kibana_sample_data_ecommerce",
"_type" : "_doc",
"_id" : "w0ty4XcBG0yVqdDw29iP",
"_score" : null,
"_source" : {
"category" : [
"Women's Clothing"
],
"currency" : "EUR",
"customer_first_name" : "Brigitte",
"customer_full_name" : "Brigitte Meyer",
"customer_gender" : "FEMALE",
"customer_id" : 12,
"customer_last_name" : "Meyer",
"customer_phone" : "",
"day_of_week" : "Saturday",
"day_of_week_i" : 5,
"email" : "[email protected]",
"manufacturer" : [
"Spherecords",
"Tigress Enterprises"
],
"order_date" : "2021-03-13T16:06:14+00:00",
"order_id" : 591709,
"products" : [
{
"base_price" : 7.99,
"discount_percentage" : 0,
"quantity" : 1,
"manufacturer" : "Spherecords",
"tax_amount" : 0,
"product_id" : 20734,
"category" : "Women's Clothing",
"sku" : "ZO0638206382",
"taxless_price" : 7.99,
"unit_discount_amount" : 0,
"min_price" : 3.6,
"_id" : "sold_product_591709_20734",
"discount_amount" : 0,
"created_on" : "2016-12-31T16:06:14+00:00",
"product_name" : "Basic T-shirt - dark blue",
"price" : 7.99,
"taxful_price" : 7.99,
"base_unit_price" : 7.99
},
{
"base_price" : 32.99,
"discount_percentage" : 0,
"quantity" : 1,
"manufacturer" : "Tigress Enterprises",
"tax_amount" : 0,
"product_id" : 7539,
"category" : "Women's Clothing",
"sku" : "ZO0038800388",
"taxless_price" : 32.99,
"unit_discount_amount" : 0,
"min_price" : 17.48,
"_id" : "sold_product_591709_7539",
"discount_amount" : 0,
"created_on" : "2016-12-31T16:06:14+00:00",
"product_name" : "Summer dress - scarab",
"price" : 32.99,
"taxful_price" : 32.99,
"base_unit_price" : 32.99
}
],
"sku" : [
"ZO0638206382",
"ZO0038800388"
],
"taxful_total_price" : 40.98,
"taxless_total_price" : 40.98,
"total_quantity" : 2,
"total_unique_products" : 2,
"type" : "order",
"user" : "brigitte",
"geoip" : {
"country_iso_code" : "US",
"location" : {
"lon" : -74,
"lat" : 40.8
},
"region_name" : "New York",
"continent_name" : "North America",
"city_name" : "New York"
},
"event" : {
"dataset" : "sample_ecommerce"
}
},
"sort" : [
1615651574000,
"591709"
]
}
]
}
}