elasticsearch学习笔记高级篇（十四）——实战前缀搜索、通配符搜索、正则搜索

准备数据：

PUT /test_index/_create/1
{
  "test_field": "C3D0-KD345"
}
PUT /test_index/_create/2
{
  "test_field": "C3K5-DFG65"
}
PUT /test_index/_create/3
{
  "test_field": "C4I8-UI365"
}

前缀搜索：

原理：前缀匹配不会计算相关度分数，与前缀过滤的唯一区别就是过滤会有cache bitset。它会扫描整个倒排索引。找到符合前缀条件的文档。所以说前缀越短，要处理的文档就越多，性能就越差，尽可能应该用长前缀搜索。
示例，搜索前缀为C3的文档：

GET /test_index/_search
{
  "query": {
    "match_phrase_prefix": {
      "test_field": "C3"
    }
  }
}

结果：

{
  "took" : 16,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.9808292,
    "hits" : [
      {
        "_index" : "test_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.9808292,
        "_source" : {
          "test_field" : "C3D0-KD345"
        }
      },
      {
        "_index" : "test_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.9808292,
        "_source" : {
          "test_field" : "C3K5-DFG65"
        }
      }
    ]
  }
}

通配符搜索：

通配符搜索跟前缀搜索类似，比前缀搜索要更加强大。也是需要扫描整个倒排索引，性能也是很差的。
？：表示匹配任意一个字符

：表示匹配任意多个字符

示例：通配符搜索条件为*4?的文档

GET /test_index/_search
{
  "query": {
    "wildcard": {
      "test_field": {
        "value": "*4?"
      }
    }
  }
}

输出结果：

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "test_field" : "C3D0-KD345"
        }
      }
    ]
  }
}

正则搜索：

regexp可以说功能比之前的通配符搜索功能更加强大，但是都会扫描整个倒排索引，性能也是会非常的差。
[0-9]：指定范围内的数字
[a-z]：指定范围内的字母
.：一个字符
+：前面的正则表达式可以出现一次或多次
*：前面的正则表达式可以出现零次或多次
{n}: n是非负整数，表示匹配n次
示例，搜索条件为.*[a-z]{3}[0-9]{2}的文档

GET /test_index/_search
{
  "query": {
    "regexp": {
      "test_field": {
        "value": ".*[a-z]{3}[0-9]{2}"
      }
    }
  }
}