ElasticSearch利用Search After解决深度分页问题

ElasticSearch利用Search After解决深度分页问题

    • 1.ElasticSearch常见分页
    • 2.ElasticSearch深度分页问题
    • 3.ElasticSearch深度分页问题的解决
    • 4.ElasticSearch之Search_After的注意事项

1.ElasticSearch常见分页

ElasticSearch默认采用的分页方式是 from+ size 的形式,这种形式下,如果数据量不大或者from、size不大的情况下,效率还是蛮高的。但是在深度分页的情况下,这种使用方式效率是非常低的,并发一旦过大,还有可能直接拖垮整个ElasticSearch的集群。

那么 from+ size 这种分页方式的原理是什么呢?

我们知道,ElasticSearch本身就是分布式的,数据基本上会均匀的分布在各个分片上,比如当一个查询,from=990,size=10,ElasticSearch会在每个分片上都先获取1000个文档,然后通过Coordinating Node聚合所有结果,通过排序选取前1000个文档,最后再从这1000个选择出来的文档中,挑选出10个文档反回给客户端。试想一下,如果from=100000,size=10,三个分片,那么每个分片都要查询出100010个文档,那么Coordinating Node聚合三个分片的数据,就是3*100010个文档,这就非常恐怖了,会占用大量的内存

2.ElasticSearch深度分页问题

现在如果我们要做以下的操作:当前index采用了3个分片,from=100000,size=10,那么这种情况就是一种典型的深度分页。根据上面我们说的分页知识,这时候每个分片都要查询出100010个文档,那么Coordinating Node聚合三个分片的数据,就是3*100010个文档,这就非常恐怖了,会占用大量的内存。偶尔的查询可能还好,如果恰好遇到大的并发,直接就会把内存给打爆的,拖垮整个ElasticSearch集群。所以为了避免深度分页带来的内存开销,ElasticSearch内部有一个默认设定,即最多只能查询前10000个文档。那么如果产品必须要做深度分页,那么应该采取什么方案呢?这个时候,Search_After就开始登场了。

3.ElasticSearch深度分页问题的解决

Search_After通过维护一个实时游标来避免scroll的缺点,它可以用于实时请求和高并发场景。

每个文档具有一个唯一值的字段应该用作排序规范的仲裁器。否则,具有相同排序值的文档的排序顺序将是未定义的。建议的方法是使用字段_id或者业务Id,保证是每个文档的一个唯一值。

POST /kibana_sample_data_ecommerce/_search
{
  "size": 1,
  "query": {
    "match": {
      "customer_first_name": "Brigitte"
    }
  },
  "sort": [
    { "order_date": {"order": "desc"}},
    {"order_id": "asc"}
  ]      
}

返回结果:

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 135,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "kibana_sample_data_ecommerce",
        "_type" : "_doc",
        "_id" : "KEtz4XcBG0yVqdDwEOv7",
        "_score" : null,
        "_source" : {
          "category" : [
            "Women's Shoes",
            "Women's Clothing"
          ],
          "currency" : "EUR",
          "customer_first_name" : "Brigitte",
          "customer_full_name" : "Brigitte Cross",
          "customer_gender" : "FEMALE",
          "customer_id" : 12,
          "customer_last_name" : "Cross",
          "customer_phone" : "",
          "day_of_week" : "Saturday",
          "day_of_week_i" : 5,
          "email" : "[email protected]",
          "manufacturer" : [
            "Tigress Enterprises",
            "Pyramidustries"
          ],
          "order_date" : "2021-03-13T23:22:34+00:00",
          "order_id" : 592088,
          "products" : [
            {
              "base_price" : 36.99,
              "discount_percentage" : 0,
              "quantity" : 1,
              "manufacturer" : "Tigress Enterprises",
              "tax_amount" : 0,
              "product_id" : 14810,
              "category" : "Women's Shoes",
              "sku" : "ZO0026600266",
              "taxless_price" : 36.99,
              "unit_discount_amount" : 0,
              "min_price" : 19.23,
              "_id" : "sold_product_592088_14810",
              "discount_amount" : 0,
              "created_on" : "2016-12-31T23:22:34+00:00",
              "product_name" : "Ankle boots - black",
              "price" : 36.99,
              "taxful_price" : 36.99,
              "base_unit_price" : 36.99
            },
            {
              "base_price" : 54.99,
              "discount_percentage" : 0,
              "quantity" : 1,
              "manufacturer" : "Pyramidustries",
              "tax_amount" : 0,
              "product_id" : 9131,
              "category" : "Women's Clothing",
              "sku" : "ZO0184501845",
              "taxless_price" : 54.99,
              "unit_discount_amount" : 0,
              "min_price" : 25.3,
              "_id" : "sold_product_592088_9131",
              "discount_amount" : 0,
              "created_on" : "2016-12-31T23:22:34+00:00",
              "product_name" : "Light jacket - peacoat",
              "price" : 54.99,
              "taxful_price" : 54.99,
              "base_unit_price" : 54.99
            }
          ],
          "sku" : [
            "ZO0026600266",
            "ZO0184501845"
          ],
          "taxful_total_price" : 91.98,
          "taxless_total_price" : 91.98,
          "total_quantity" : 2,
          "total_unique_products" : 2,
          "type" : "order",
          "user" : "brigitte",
          "geoip" : {
            "country_iso_code" : "US",
            "location" : {
              "lon" : -74,
              "lat" : 40.8
            },
            "region_name" : "New York",
            "continent_name" : "North America",
            "city_name" : "New York"
          },
          "event" : {
            "dataset" : "sample_ecommerce"
          }
        },
        "sort" : [
          1615677754000,
          "592088"
        ]
      }
    ]
  }
}

上面的请求会为每一个文档返回一个包含sort排序值的数组。这些sort排序值可以被用于 search_after 参数里以便抓取下一页的数据。比如,我们可以使用最后的一个文档的sort排序值,将它传递给 search_after 参数

POST /kibana_sample_data_ecommerce/_search
{
  "size": 1,
  "query": {
    "match": {
      "customer_first_name": "Brigitte"
    }
  },
  "search_after":
     [
          1615677754000,
          "592088"
    ],
  "sort": [
    { "order_date": {"order": "desc"}},
    {"order_id": "asc"}
  ]      
}

返回结果:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 135,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "kibana_sample_data_ecommerce",
        "_type" : "_doc",
        "_id" : "w0ty4XcBG0yVqdDw29iP",
        "_score" : null,
        "_source" : {
          "category" : [
            "Women's Clothing"
          ],
          "currency" : "EUR",
          "customer_first_name" : "Brigitte",
          "customer_full_name" : "Brigitte Meyer",
          "customer_gender" : "FEMALE",
          "customer_id" : 12,
          "customer_last_name" : "Meyer",
          "customer_phone" : "",
          "day_of_week" : "Saturday",
          "day_of_week_i" : 5,
          "email" : "[email protected]",
          "manufacturer" : [
            "Spherecords",
            "Tigress Enterprises"
          ],
          "order_date" : "2021-03-13T16:06:14+00:00",
          "order_id" : 591709,
          "products" : [
            {
              "base_price" : 7.99,
              "discount_percentage" : 0,
              "quantity" : 1,
              "manufacturer" : "Spherecords",
              "tax_amount" : 0,
              "product_id" : 20734,
              "category" : "Women's Clothing",
              "sku" : "ZO0638206382",
              "taxless_price" : 7.99,
              "unit_discount_amount" : 0,
              "min_price" : 3.6,
              "_id" : "sold_product_591709_20734",
              "discount_amount" : 0,
              "created_on" : "2016-12-31T16:06:14+00:00",
              "product_name" : "Basic T-shirt - dark blue",
              "price" : 7.99,
              "taxful_price" : 7.99,
              "base_unit_price" : 7.99
            },
            {
              "base_price" : 32.99,
              "discount_percentage" : 0,
              "quantity" : 1,
              "manufacturer" : "Tigress Enterprises",
              "tax_amount" : 0,
              "product_id" : 7539,
              "category" : "Women's Clothing",
              "sku" : "ZO0038800388",
              "taxless_price" : 32.99,
              "unit_discount_amount" : 0,
              "min_price" : 17.48,
              "_id" : "sold_product_591709_7539",
              "discount_amount" : 0,
              "created_on" : "2016-12-31T16:06:14+00:00",
              "product_name" : "Summer dress - scarab",
              "price" : 32.99,
              "taxful_price" : 32.99,
              "base_unit_price" : 32.99
            }
          ],
          "sku" : [
            "ZO0638206382",
            "ZO0038800388"
          ],
          "taxful_total_price" : 40.98,
          "taxless_total_price" : 40.98,
          "total_quantity" : 2,
          "total_unique_products" : 2,
          "type" : "order",
          "user" : "brigitte",
          "geoip" : {
            "country_iso_code" : "US",
            "location" : {
              "lon" : -74,
              "lat" : 40.8
            },
            "region_name" : "New York",
            "continent_name" : "North America",
            "city_name" : "New York"
          },
          "event" : {
            "dataset" : "sample_ecommerce"
          }
        },
        "sort" : [
          1615651574000,
          "591709"
        ]
      }
    ]
  }
}

4.ElasticSearch之Search_After的注意事项

  1. 搜索时,需要指定sort,并且保证值是唯一的(可以通过加入_id或者文档body中的业务唯一值来保证);
  2. 再次查询时,使用上一次最后一个文档的sort值作为search_after的值来进行查询;
  3. 不能使用随机跳页,只能是下一页或者小范围的跳页(一次查询出小范围内各个页数,利用缓存等技术,来实现小范围分页);

你可能感兴趣的:(Elasticsearch,es深度分页,深度分页,elasticsearch,es)