ElasticSearch24:初识搜索引擎_分页搜索以及deep paging性能问题图解揭秘

1.使用es实现分页搜索

size,from

test_index/test_type共12条数据

第一页

GET /test_index/test_type/_search?from=0&size=5

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 12,
    "max_score": 1,
    "hits": [
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "5",
        "_score": 1,
        "_source": {
          "test_field": "taizhou"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "8",
        "_score": 1,
        "_source": {
          "test_field": "jiaxing"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "9",
        "_score": 1,
        "_source": {
          "test_field": "zhoushan"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "10",
        "_score": 1,
        "_source": {
          "test_field": "lishui"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "12",
        "_score": 1,
        "_source": {
          "test_field": "jinhua"
        }
      }
    ]
  }
}


第二页

GET /test_index/test_type/_search?from=5&size=5

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 12,
    "max_score": 1,
    "hits": [
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "2",
        "_score": 1,
        "_source": {
          "test_field": "hangzhou"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "4",
        "_score": 1,
        "_source": {
          "test_field": "ningbo"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "6",
        "_score": 1,
        "_source": {
          "test_field": "quzhou"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "1",
        "_score": 1,
        "_source": {
          "test_field": "zhejiang"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "7",
        "_score": 1,
        "_source": {
          "test_field": "huzhou"
        }
      }
    ]
  }
}

第三页

GET /test_index/test_type/_search?from=10&size=5

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 12,
    "max_score": 1,
    "hits": [
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "3",
        "_score": 1,
        "_source": {
          "test_field": "wenzhou"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "11",
        "_score": 1,
        "_source": {
          "test_field": "shaoxing"
        }
      }
    ]
  }
}



2.什么是deep paging问题?为什么会产生这个问题?它的底层原理是什么?

deep paging性能问题,以及底层深度揭秘

什么叫deep paging:简单的说,就是搜索特别深。比如总共60000条数据,每个shard上分了20000条数据,每一页1000条,我们搜索最后一页。

错误的理解:每个shard都要返回最后的10条数据(10001~10010),综合一下返回10条数据。这个理解是错误的。

正确的理解:你的请求可能打到不包含这个index的node上,这个node就是一个coordinate node,那么这个coordinate node就会把搜索请求转发到index所在的三个node上去。比如说我们刚才说的情况,实际上每个shard都要将内部的20000条数据中的第10001~100010条数据拿出来,不是10条,是10010条数据(1~10010),3个shard每个shard都返回10010条数据给coordinate node,coordinate node会收到总共30030条数据,然后在这些数据中排序,然后取到1000页的10条数据。





deep paging的缺点:

搜索的过深的话,就需要在coordinate node上保存大量的数据,还要进行大量数据的排序,排序之后,再取出对应的那一页。所以这个过程,既消耗网络带宽,耗费内存,还耗费cpu。所以deep paging 的性能问题,我们应该尽量避免出现这个deep paging操作。




你可能感兴趣的:(ElasticSearch)