ES复合查询之bool/boosting查询_9_1_1

1、bool查询

bool查询是组合叶子查询或复合查询子句的默认查询方式,如must,should,must_not或者filter子句;must与should子句查询最终分数由两个子句各自匹配分数相加得到,而must_not与filter子句需要在过滤查询中执行;

bool查询底层由Lucene中的BooleanQuery类实现,该查询由一个或多个布尔子句组成,每个子句由特定类型声明;

1.1、bool查询子句中的类型
序号 类型 描述
1 must 该查询子句必须出现在匹配的文档中且与相似度分数计算相关
2 filter 该查询子句必须出现在匹配的文档中且是在过滤上下文中执行,与must查询不同的是该查询会忽略相似度分数计算且会对结果缓存
3 should 该查询子句应该出现在匹配的文档中
4 must_not 该查询子句必须不能出现在匹配的文档中,该查询在过滤上下文中执行,这也意味着不会计算相似度分数(分数为0)且对结果会缓存

文档同时匹配查询子句must或should可获得更高的分数,而最后相似度分_score就是通过匹配must或should计算出的分数相加得到

//请求参数
POST bank/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "gender.keyword": "M"
          }
        }
      ],
      "filter": {
        "term": {
          "state.keyword": "MO"
        }
      },
      "must_not": [
        {
          "range": {
            "age": {
              "gte": 20,
              "lte": 30
            }
          }
        }
      ],
      "should": [
        {
          "match": {
            "email": "comcubine.com"
          }
        },
        {
          "match": {
            "address": "Avenue"
          }
        }
      ],
      "minimum_should_match": 1,
      "boost": 1
    }
  }
}

//返回结果
{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 7.1838775,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "58",
        "_score" : 7.1838775,
        "_source" : {
          "account_number" : 58,
          "balance" : 31697,
          "firstname" : "Marva",
          "lastname" : "Cannon",
          "age" : 40,
          "gender" : "M",
          "address" : "993 Highland Place",
          "employer" : "Comcubine",
          "email" : "[email protected]",
          "city" : "Orviston",
          "state" : "MO"
        }
      },
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "286",
        "_score" : 2.2192826,
        "_source" : {
          "account_number" : 286,
          "balance" : 39063,
          "firstname" : "Rosetta",
          "lastname" : "Turner",
          "age" : 35,
          "gender" : "M",
          "address" : "169 Jefferson Avenue",
          "employer" : "Spacewax",
          "email" : "[email protected]",
          "city" : "Stewart",
          "state" : "MO"
        }
      }
    ]
  }
}

minimum_should_match参数说明
可以使用minimum_should_match参数指定必须匹配should子句的文档数量或文档百分比,若一个bool查询包含至少一个should子句且无must或filter子句,则minimum_should_match默认值为1,反之为0;

1.2、使用bool.filter计算相似度分

查询中包含filter子句的查询不会计算相似度分(返回_score为0),
以下三个示例均返回字段为state且值为WA的文档
1)、示例查询分数均为0,因为未指定可计算分数的查询

//请求参数
GET bank/_search
{
  "size": 2, 
  "query": {
    "bool": {
      "filter": {
        "term": {
          "state.keyword": "WA"
        }
      }
    }
  }
}

//结果返回
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 19,
      "relation" : "eq"
    },
    "max_score" : 0.0,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "20",
        "_score" : 0.0,
        "_source" : {
          "account_number" : 20,
          "balance" : 16418,
          "firstname" : "Elinor",
          "lastname" : "Ratliff",
          "age" : 36,
          "gender" : "M",
          "address" : "282 Kings Place",
          "employer" : "Scentric",
          "email" : "[email protected]",
          "city" : "Ribera",
          "state" : "WA"
        }
      },
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "284",
        "_score" : 0.0,
        "_source" : {
          "account_number" : 284,
          "balance" : 22806,
          "firstname" : "Randolph",
          "lastname" : "Banks",
          "age" : 29,
          "gender" : "M",
          "address" : "875 Hamilton Avenue",
          "employer" : "Caxt",
          "email" : "[email protected]",
          "city" : "Crawfordsville",
          "state" : "WA"
        }
      }
    ]
  }
}

2)、示例查询分为1.0,因为使用了match_all查询返回了所有文档

//请求参数
GET bank/_search
{
  "size": 2, 
  "query": {
    "bool": {
      "must": {
        "match_all":{}
      },
      "filter": {
        "term": {
          "state.keyword": "WA"
        }
      }
    }
  }
}

//结果返回,分数均为1.0
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 19,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "20",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 20,
          "balance" : 16418,
          "firstname" : "Elinor",
          "lastname" : "Ratliff",
          "age" : 36,
          "gender" : "M",
          "address" : "282 Kings Place",
          "employer" : "Scentric",
          "email" : "[email protected]",
          "city" : "Ribera",
          "state" : "WA"
        }
      },
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "284",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 284,
          "balance" : 22806,
          "firstname" : "Randolph",
          "lastname" : "Banks",
          "age" : 29,
          "gender" : "M",
          "address" : "875 Hamilton Avenue",
          "employer" : "Caxt",
          "email" : "[email protected]",
          "city" : "Crawfordsville",
          "state" : "WA"
        }
      }
    ]
  }
}

3)、示例查询分为1.0,因为使用了constant_score查询,其效果与示例2中一样

//请求参数,boost设置为1.2
GET bank/_search
{
  "size": 2, 
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "state.keyword": "WA"
        }
      },
      "boost": 1.2
    }
  }
}

//结果返回,分数均为1.2
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 19,
      "relation" : "eq"
    },
    "max_score" : 1.2,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "20",
        "_score" : 1.2,
        "_source" : {
          "account_number" : 20,
          "balance" : 16418,
          "firstname" : "Elinor",
          "lastname" : "Ratliff",
          "age" : 36,
          "gender" : "M",
          "address" : "282 Kings Place",
          "employer" : "Scentric",
          "email" : "[email protected]",
          "city" : "Ribera",
          "state" : "WA"
        }
      },
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "284",
        "_score" : 1.2,
        "_source" : {
          "account_number" : 284,
          "balance" : 22806,
          "firstname" : "Randolph",
          "lastname" : "Banks",
          "age" : 29,
          "gender" : "M",
          "address" : "875 Hamilton Avenue",
          "employer" : "Caxt",
          "email" : "[email protected]",
          "city" : "Crawfordsville",
          "state" : "WA"
        }
      }
    ]
  }
}
1.3、为查询命名

为查询命名以观察实际是哪个查询子句被匹配
每一个过滤操作或查询操作在指定匹配子句时都可配置_name参数

//请求参数,针对每个查询指定查询字段别名
GET bank/_search
{
  "size": 3,
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "email": {
              "query": "comcubine.com",
              "_name": "q_n1"
            }
          }
        },
        {
          "match": {
            "address": {
              "query": "Avenue",
              "_name": "q_n2"
            }
          }
        }
      ],
      "filter": {
        "terms": {
          "age": [
            40,
            38
          ],
          "_name": "q_a"
        }
      }
    }
  }
}


//结果返回,同时列举匹配项
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 85,
      "relation" : "eq"
    },
    "max_score" : 6.5046196,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "58",
        "_score" : 6.5046196,
        "_source" : {
          "account_number" : 58,
          "balance" : 31697,
          "firstname" : "Marva",
          "lastname" : "Cannon",
          "age" : 40,
          "gender" : "M",
          "address" : "993 Highland Place",
          "employer" : "Comcubine",
          "email" : "[email protected]",
          "city" : "Orviston",
          "state" : "MO"
        },
        "matched_queries" : [
          "q_a",
          "q_n1"
        ]
      },
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "664",
        "_score" : 1.5400248,
        "_source" : {
          "account_number" : 664,
          "balance" : 16163,
          "firstname" : "Hart",
          "lastname" : "Mccormick",
          "age" : 40,
          "gender" : "M",
          "address" : "144 Guider Avenue",
          "employer" : "Dyno",
          "email" : "[email protected]",
          "city" : "Carbonville",
          "state" : "ID"
        },
        "matched_queries" : [
          "q_a",
          "q_n2"
        ]
      },
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "791",
        "_score" : 1.5400248,
        "_source" : {
          "account_number" : 791,
          "balance" : 48249,
          "firstname" : "Janine",
          "lastname" : "Huber",
          "age" : 38,
          "gender" : "F",
          "address" : "348 Porter Avenue",
          "employer" : "Viocular",
          "email" : "[email protected]",
          "city" : "Fivepointville",
          "state" : "MA"
        },
        "matched_queries" : [
          "q_a",
          "q_n2"
        ]
      }
    ]
  }
}

查询结果当中会包含每一个匹配到的查询,在查询操作和过滤操作上指定标签只在bool查询中有意义;

2、boosting查询

返回匹配positive查询的文档并降低匹配negative查询的文档相似度分;
这样就可以在不排除某些文档的前提下对文档进行查询,搜索结果中存在只不过相似度分数相比正常匹配的要低;

GET bank/_search
{
  "query": {
    "boosting": {
      "positive": {
        "term": {
          "state.keyword": {
            "value": "DC"
          }
        }
      },
      "negative": {
        "term": {
          "age": {
            "value": 23
          }
        }
      },
      "negative_boost": 0.2
    }
  }
}
2.1、boosting查询的顶层参数
序号 参数 参数说明
1 positive 必须存在,查询对象,指定希望执行的查询子句,返回的结果都将满足该子句指定的条件
2 negative 必须存在,查询对象,指定的查询子句用于降低匹配文档的相似度分
3 negative_boost 必须存在,浮点数,介于0与1.0之间的浮点数,用于降低匹配文档的相似分

若一个匹配返回的文档既满足positive查询子句又满足negative查询子句,那么boosting查询计算相似度分数步骤如下:
1)、获取从positive查询中的原始分数;
2)、将获取的分数与negative_boost系数相乘得到最终分;

你可能感兴趣的:(ELK)