4.6-单字符串多字段查询-Multi-Match

三种场景

  • 最佳字段 (Best Fields)

    • 当字段之间相互竞争,⼜相互关联。例如 title 和 body 这样的字段。评分来⾃最匹配字段
  • 多数字段 (Most Fields)

    • 处理英⽂内容时:⼀种常⻅的⼿段是,在主字段( English Analyzer),抽取词⼲,加⼊同义词,以 匹配更多的⽂档。相同的⽂本,加⼊⼦字段(Standard Analyzer),以提供更加精确的匹配。其他字 段作为匹配⽂档提⾼相关度的信号。匹配字段越多则越好
  • 混合字段 (Cross Field)

    • 对于某些实体,例如⼈名,地址,图书信息。需要在多个字段中确定信息,单个字段只能作为整体 的⼀部分。希望在任何这些列出的字段中找到尽可能多的词

Multi Match Query

  • Best Fields 是默认类型,可以不⽤指定

  • Minimum should match 等参数可以传递到⽣成的 query 中

POST blogs/_search
{
  "query": {
    "multi_match": {
      "type": "best_fields",
      "query": "Quick pets",
      "fields": ["title","body"],
      "tie_breaker": 0.2,
      "minimum_should_match": "20%"
    }
  }
}

⼀个查询案例

  • 英⽂分词器,导致精确度降低,时态信息丢失
PUT /titles
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "english"
      }
    }
  }
}

POST titles/_bulk
{ "index": { "_id": 1 }}
{ "title": "My dog barks" }
{ "index": { "_id": 2 }}
{ "title": "I see a lot of barking dogs on the road " }


GET titles/_search
{
  "query": {
    "match": {
      "title": "barking dogs"
    }
  }
}
image.png

使⽤多数字段匹配解决

  • ⽤⼴度匹配字段 title 包括尽可能多的⽂档——以提 升召回率——同时⼜使⽤字段 title.std 作为信号 将 相关度更⾼的⽂档置于结果顶部。

  • 每个字段对于最终评分的贡献可以通过⾃定义值 boost 来控制。⽐如,使 title 字段更为重要, 这样同时也降低了其他信号字段的作⽤

DELETE /titles
PUT /titles
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "english",
        "fields": {"std": {"type": "text","analyzer": "standard"}}
      }
    }
  }
}

POST titles/_bulk
{ "index": { "_id": 1 }}
{ "title": "My dog barks" }
{ "index": { "_id": 2 }}
{ "title": "I see a lot of barking dogs on the road " }

GET /titles/_search
{
   "query": {
        "multi_match": {
            "query":  "barking dogs",
            "type":   "most_fields",
            "fields": [ "title", "title.std" ]
        }
    }
}

GET /titles/_search
{
   "query": {
        "multi_match": {
            "query":  "barking dogs",
            "type":   "most_fields",
            "fields": [ "title^10", "title.std" ]
        }
    }
}

跨字段搜索

  • ⽆法使⽤ Operator

  • 可以⽤ copy_to 解决,但是需要额外的存储空间

PUT address/_doc/1
{
  "street": "5 Poland Street",
  "city": "London",
  "country": "United Kingdom",
  "postcode": "W1V 3Dg"
}


POST address/_search
{
 "query": {
    "multi_match": {
      "query": "Poland Street W1V",
      "type": "most_fields",
      "fields": ["street", "city", "country", "postcode"]
    }
  }
}

跨字段搜索 [cross_fields解决]

POST address/_search
{
 "query": {
    "multi_match": {
      "query": "Poland Street W1V",
      "type": "cross_fields",
      "operator": "and", 
      "fields": ["street", "city", "country", "postcode"]
    }
  }
}
  • ⽀持使⽤ Operator

  • 与 copy_to, 相⽐,其中⼀个优势就是它可以在搜索时为单个字段提升权重。

本节知识点回顾

  • Multi Match 查询的基本语法

  • 查询的类型

  • 最佳字段 / 多数字段 / 跨字段

  • Boosting

  • 控制 Precision

  • 以及使⽤⼦字段多数字段算分,控制

  • 使⽤ Operator

课程demo

POST blogs/_search
{
    "query": {
        "dis_max": {
            "queries": [
                { "match": { "title": "Quick pets" }},
                { "match": { "body":  "Quick pets" }}
            ],
            "tie_breaker": 0.2
        }
    }
}

POST blogs/_search
{
  "query": {
    "multi_match": {
      "type": "best_fields",
      "query": "Quick pets",
      "fields": ["title","body"],
      "tie_breaker": 0.2,
      "minimum_should_match": "20%"
    }
  }
}



POST books/_search
{
    "multi_match": {
        "query":  "Quick brown fox",
        "fields": "*_title"
    }
}


POST books/_search
{
    "multi_match": {
        "query":  "Quick brown fox",
        "fields": [ "*_title", "chapter_title^2" ]
    }
}



DELETE /titles
PUT /titles
{
  "settings": {
    "number_of_replicas": 1
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "english",
        "fields": {
          "std": {
            "type": "text",
            "analyzer": "standard"
          }
        }
      }
    }
  }
}

PUT /titles
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "english"
      }
    }
  }
}

POST titles/_bulk
{ "index": { "_id": 1 }}
{ "title": "My dog barks" }
{ "index": { "_id": 2 }}
{ "title": "I see a lot of barking dogs on the road " }


GET titles/_search
{
  "query": {
    "match": {
      "title": "barking dogs"
    }
  }
}

DELETE /titles
PUT /titles
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "english",
        "fields": {"std": {"type": "text","analyzer": "standard"}}
      }
    }
  }
}

POST titles/_bulk
{ "index": { "_id": 1 }}
{ "title": "My dog barks" }
{ "index": { "_id": 2 }}
{ "title": "I see a lot of barking dogs on the road " }

GET /titles/_search
{
   "query": {
        "multi_match": {
            "query":  "barking dogs",
            "type":   "most_fields",
            "fields": [ "title", "title.std" ]
        }
    }
}

GET /titles/_search
{
   "query": {
        "multi_match": {
            "query":  "barking dogs",
            "type":   "most_fields",
            "fields": [ "title^10", "title.std" ]
        }
    }
}



PUT address/_doc/1
{
  "street": "5 Poland Street",
  "city": "London",
  "country": "United Kingdom",
  "postcode": "W1V 3Dg"
}


POST address/_search
{
 "query": {
    "multi_match": {
      "query": "Poland Street W1V",
      "type": "most_fields",
      "fields": ["street", "city", "country", "postcode"]
    }
  }
}


POST address/_search
{
 "query": {
    "multi_match": {
      "query": "Poland Street W1V",
      "type": "cross_fields",
      "operator": "and", 
      "fields": ["street", "city", "country", "postcode"]
    }
  }
}

相关阅读

  • https://www.elastic.co/guide/en/elasticsearch/reference/7.1/query-dsl-dis-max-query.html

你可能感兴趣的:(4.6-单字符串多字段查询-Multi-Match)