ES搜索条件不生效问题分析

Logstash从kafka集群Topic获取数据,解析出其字段,然后写入到ES中,logstash.conf配置如下:

input {
     kafka {
        bootstrap_servers => "ip1:9094,ip2:9094,ip3:9094"
        auto_offset_reset => "latest"
        group_id => "lo"  
        id => "8.0.6"
        client_id => "logstash-1"
        check_crcs => "false"
    topics => ["exception_log"]
        codec => "json"
        }
}

filter{
        json {
                source => "body"
        }
} 

output {
      elasticsearch {
                hosts => ["es1:9200","es2:9200","es3:9200"]
                index => "exception-%{+YYYY.MM.dd}"
        }
}

写入完成后,查询其结果:

 {
        "_index": "exception-2019.04.30",
        "_type": "doc",
        "_id": "ssvZbGoB9BdDCt57XEyk",
        "_score": 1,
        "_source": {
          "nanos": "0",
          "success": true,
          "msg": "Add TimelineEntry success",
          "@timestamp": "2019-04-30T06:05:44.661Z",
          "errorInfo": "",
          "timestamp": "1556604344651",
          "type": "ShareTimelineImpl",
          "priority": "INFO",
          "@version": "1",
          "userId": 521585010,
          "body": """{"errorInfo":"","msg":"Add TimelineEntry success","success":true,"type":"ShareTimelineImpl","userId":521585010}""",
          "fields": {
            "_ds_unique_id": "452:65025:3752073:21855ca9:244477118",
            "HOSTNAME": "timeline11.server.163.org"
          },
          "host": ""
        }
      },

搜索userID=526952388的结果

GET /lofter-exception-2019.04.30/_search
{"query": {
    "bool": {
      "must": [
        {"match": {
          "userId": 526952388
        }}
      ]
    }
  }
}

结果有626条,而且HOSTNAME都是timeline11.server.163.org:

 "hits": {
    "total": 626,
    "max_score": 1,
    "hits": [
      {
        "_index": "lofter-exception-2019.04.30",
        "_type": "doc",
        "_id": "t5_cbGoBnt41zOK2Ltpc",
        "_score": 1,
        "_source": {
          "nanos": "0",
          "success": true,
          "msg": "Add TimelineEntry success",
          "@timestamp": "2019-04-30T06:08:50.150Z",
          "errorInfo": "",
          "timestamp": "1556604530017",
          "type": "ShareTimelineImpl",
          "priority": "INFO",
          "@version": "1",
          "userId": 526952388,
          "body": """{"errorInfo":"","msg":"Add TimelineEntry success","success":true,"type":"ShareTimelineImpl","userId":526952388}""",
          "fields": {
            "_ds_unique_id": "452:65025:3752065:c732bf64:243554598",
            "HOSTNAME": "timeline11.server.163.org"
          },
          "host": ""
        }
      },

想根据userID和HOSTNAME以及success的条件进行过滤。

HOSTNAME条件设置为timeline111.server.163.org,期望匹配结果为空。

GET /exception-2019.04.30/_search
{"query": {
  "bool": {
    "must": [
            {
        "match": {
          "fields.HOSTNAME": "timeline111.server.163.org"
        }
      },
      {"match": {
        "userId": 526952388
      }},
      {
        "match": {
          "success": true
        }
      }
    ]
  }
}
, "_source": ["userId","body","fields.HOSTNAME"]
}

实际结果还是有626条,感觉加的过滤条件不生效。

"hits": {
    "total": 626,
    "max_score": 1.4777467,
    "hits": [
      {
        "_index": "lofter-exception-2019.04.30",
        "_type": "doc",
        "_id": "t5_cbGoBnt41zOK2Ltpc",
        "_score": 1.4777467,
        "_source": {
          "body": """{"errorInfo":"","msg":"Add TimelineEntry success","success":true,"type":"ShareTimelineImpl","userId":526952388}""",
          "fields": {
            "HOSTNAME": "timeline11.server.163.org"
          },
          "userId": 526952388
        }
      },

网上针对这个问题的分析
[https://stackoverflow.com/questions/23150670/elasticsearch-match-vs-term-query]
修改请求体之后:

GET /lofter-exception-2019.04.30/_search
{"query": {
  "bool": {
    "must": [
            {
        "term": {
          "fields.HOSTNAME.keyword": "timeline111.server.163.org"
        }
      },
      {"term": {
        "userId": 526952388
      }},
      {
        "term": {
          "success": true
        }
      }
    ]
  }
}
, "_source": ["userId","body","fields.HOSTNAME"]
}

结果是符合预期的:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

如果使用ES的搜索过程中,发现加了过滤条件不生效,可以尝试以下方法:
1)条件字段是否有keyword,有的话,使用xxx.keyword

  1. match是分词匹配的,会将条件中的一句话切分为多个单词,只需其中一个单词匹配,就命中,然后根据Lucence的评分系统计算评分;而term是严格全句匹配的,但是有个条件:使用term要确定的这个字段是否“被分析”(analyzed),默认的字符串是被分析的。根据实际情况选择到底是match还是term。可以参照: https://www.jianshu.com/p/eb30eee13923

attention
ES中的查询操作分为2种:查询(query)和过滤(filter)。查询即是之前提到的query查询,它(查询)默认会计算每个返回文档的得分,然后根据得分排序。而过滤(filter)只会筛选出符合的文档,并不计算得分,且它可以缓存文档。所以,单从性能考虑,过滤比查询更快。
使用过滤语句得到的结果集 -- 一个简单的文档列表,快速匹配运算并存入内存是十分方便的, 每个文档仅需要1个字节。这些缓存的过滤结果集与后续请求的结合使用是非常高效的。
查询语句不仅要查找相匹配的文档,还需要计算每个文档的相关性,所以一般来说查询语句要比 过滤语句更耗时,并且查询结果也不可缓存。详细介绍可以参考:
https://doc.yonyoucloud.com/doc/mastering-elasticsearch/chapter-2/27_README.html

你可能感兴趣的:(ES搜索条件不生效问题分析)