Elasticsearch:如何让一个文档在搜索结果中永远排第一名

在许多的情况下,我们需要把想要的文档的排名排在第一。比如对于 eCommerce 的用户来说,我想把某个或某些商品的排名提高,这样它们永远排在其它的文档的前面。对于一些讨论区,我想把某些帖子永远置顶。又或者说,某些搜索网站,由于客户给的钱多,我想把他们的广告永远放到搜索结果的第一页的第一个位置。这个叫做竞价排名。

Elasticsearch:如何让一个文档在搜索结果中永远排第一名_第1张图片

在我之前的文章 “Elasticsearch: 运用 Pinned query 来提高文档的排名 (7.5发行版新功能)”,我介绍了一种使用 Pinneed query 的方法来针对一些 id 来进行处理从而使得它们的排名靠前。这个在很多的场合是非常有用的。前提条件是我们需要知道它们的 id,另外这个功能也只适合在 7.5 发现的版本之后。

在今天的文章中,我来介绍一种比较通用的办法:使用 script 来进行排序。当然使用 script 的坏处是:针对大量数据来说,它需要针对每个文档进行计算,会带来一些计算的损耗。

 

准备数据

在今天的练习中,我们使用如下的数据:

POST _bulk
{ "index" : { "_index" : "twitter", "_id": 1} }
{"user":"张三","message":"今儿天气不错啊,出去转转去","uid":"1","city":"北京","province":"北京","country":"中国","address":"中国北京市海淀区","location":{"lat":"39.970718","lon":"116.325747"}, "DOB":"1980-12-01"}
{ "index" : { "_index" : "twitter", "_id": 2 }}
{"user":"老刘","message":"出发,下一站云南!","uid":"2", "city":"北京","province":"北京","country":"中国","address":"中国北京市东城区台基厂三条3号","location":{"lat":"39.904313","lon":"116.412754"}, "DOB":"1981-12-01"}
{ "index" : { "_index" : "twitter", "_id": 3} }
{"user":"李四","message":"happy birthday!","uid":"3","city":"北京","province":"北京","country":"中国","address":"中国北京市东城区","location":{"lat":"39.893801","lon":"116.408986"}, "DOB":"1982-12-01"}
{ "index" : { "_index" : "twitter", "_id": 4} }
{"user":"老贾","message":"123,gogogo","uid":"4","city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区建国门","location":{"lat":"39.718256","lon":"116.367910"}, "DOB":"1983-12-01"}
{ "index" : { "_index" : "twitter", "_id": 5} }
{"user":"老王","message":"Happy BirthDay My Friend!","uid":"5","city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区国贸","location":{"lat":"39.918256","lon":"116.467910"}, "DOB":"1984-12-01"}
{ "index" : { "_index" : "twitter", "_id": 6} }
{"user":"老吴","message":"好友来了都今天我生日,好友来了,什么 birthday happy 就成!","uid":"6","city":"上海","province":"上海","country":"中国","address":"中国上海市闵行区","location":{"lat":"31.175927","lon":"121.383328"}, "DOB":"1985-12-01"}

 

对数据进行搜索

首先我们想查询所有在北京的用户:

GET twitter/_search
{
  "query": {
    "match": {
      "city": "北京"
    }
  }
}

我们执行上面的搜索,得到如下的结果:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "eq"
    },
    "max_score" : 0.48232412,
    "hits" : [
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.48232412,
        "_source" : {
          "user" : "张三",
          "message" : "今儿天气不错啊,出去转转去",
          "uid" : "1",
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市海淀区",
          "location" : {
            "lat" : "39.970718",
            "lon" : "116.325747"
          },
          "DOB" : "1980-12-01"
        }
      },
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.48232412,
        "_source" : {
          "user" : "老刘",
          "message" : "出发,下一站云南!",
          "uid" : "2",
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市东城区台基厂三条3号",
          "location" : {
            "lat" : "39.904313",
            "lon" : "116.412754"
          },
          "DOB" : "1981-12-01"
        }
      },
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.48232412,
        "_source" : {
          "user" : "李四",
          "message" : "happy birthday!",
          "uid" : "3",
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市东城区",
          "location" : {
            "lat" : "39.893801",
            "lon" : "116.408986"
          },
          "DOB" : "1982-12-01"
        }
      },
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 0.48232412,
        "_source" : {
          "user" : "老贾",
          "message" : "123,gogogo",
          "uid" : "4",
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市朝阳区建国门",
          "location" : {
            "lat" : "39.718256",
            "lon" : "116.367910"
          },
          "DOB" : "1983-12-01"
        }
      },
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 0.48232412,
        "_source" : {
          "user" : "老王",
          "message" : "Happy BirthDay My Friend!",
          "uid" : "5",
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市朝阳区国贸",
          "location" : {
            "lat" : "39.918256",
            "lon" : "116.467910"
          },
          "DOB" : "1984-12-01"
        }
      }
    ]
  }
}

从上面我们可以看出来:uid 为1的文档排在第一的位置,尽管它和其它文档的分数都是一样的。

接下来,我们想把 uid 为2和3的文档的得分提高,想让它们在搜索结果中排在前面的位置,那么我们该如何做到呢?我们可以使用如下的方法:

GET twitter/_search
{
  "query": {
    "match": {
      "city": "北京"
    }
  },
  "sort": [
    {
      "_script": {
        "type": "number",
        "script": {
          "source": "Boolean.compare(params.ids.contains(doc['uid.keyword'].value), false);",
          "lang": "painless",
          "params": {
            "ids": [
              "2",
              "3"
            ]
          }
        },
        "order": "desc"
      }
    },
    {
      "_score": {
        "order": "desc"
      }
    }
  ]
}

在上面,我使用了一个 script 的脚本来重新计算一个 number,并按照它来进行排序。在 ids 中,我们定义了文档的 uid 值。这是一个数组。我们可以把想提高排名的 uid 值填入这个数组中,从而达到使得它们的排名靠前。

上面搜索的运行结果为:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.48232412,
        "_source" : {
          "user" : "老刘",
          "message" : "出发,下一站云南!",
          "uid" : "2",
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市东城区台基厂三条3号",
          "location" : {
            "lat" : "39.904313",
            "lon" : "116.412754"
          },
          "DOB" : "1981-12-01"
        },
        "sort" : [
          1.0,
          0.48232412
        ]
      },
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.48232412,
        "_source" : {
          "user" : "李四",
          "message" : "happy birthday!",
          "uid" : "3",
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市东城区",
          "location" : {
            "lat" : "39.893801",
            "lon" : "116.408986"
          },
          "DOB" : "1982-12-01"
        },
        "sort" : [
          1.0,
          0.48232412
        ]
      },
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.48232412,
        "_source" : {
          "user" : "张三",
          "message" : "今儿天气不错啊,出去转转去",
          "uid" : "1",
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市海淀区",
          "location" : {
            "lat" : "39.970718",
            "lon" : "116.325747"
          },
          "DOB" : "1980-12-01"
        },
        "sort" : [
          0.0,
          0.48232412
        ]
      },
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 0.48232412,
        "_source" : {
          "user" : "老贾",
          "message" : "123,gogogo",
          "uid" : "4",
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市朝阳区建国门",
          "location" : {
            "lat" : "39.718256",
            "lon" : "116.367910"
          },
          "DOB" : "1983-12-01"
        },
        "sort" : [
          0.0,
          0.48232412
        ]
      },
      {
        "_index" : "twitter",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 0.48232412,
        "_source" : {
          "user" : "老王",
          "message" : "Happy BirthDay My Friend!",
          "uid" : "5",
          "city" : "北京",
          "province" : "北京",
          "country" : "中国",
          "address" : "中国北京市朝阳区国贸",
          "location" : {
            "lat" : "39.918256",
            "lon" : "116.467910"
          },
          "DOB" : "1984-12-01"
        },
        "sort" : [
          0.0,
          0.48232412
        ]
      }
    ]
  }
}

从上面的返回结果中,我们可以看出来 uid 为 2 和 3 的文档排名靠前。它们出现在搜索结果的最前面。

你可能感兴趣的:(Elastic,elasticsearch,大数据)