Logstash从kafka集群Topic获取数据,解析出其字段,然后写入到ES中,logstash.conf配置如下:
input {
kafka {
bootstrap_servers => "ip1:9094,ip2:9094,ip3:9094"
auto_offset_reset => "latest"
group_id => "lo"
id => "8.0.6"
client_id => "logstash-1"
check_crcs => "false"
topics => ["exception_log"]
codec => "json"
}
}
filter{
json {
source => "body"
}
}
output {
elasticsearch {
hosts => ["es1:9200","es2:9200","es3:9200"]
index => "exception-%{+YYYY.MM.dd}"
}
}
写入完成后,查询其结果:
{
"_index": "exception-2019.04.30",
"_type": "doc",
"_id": "ssvZbGoB9BdDCt57XEyk",
"_score": 1,
"_source": {
"nanos": "0",
"success": true,
"msg": "Add TimelineEntry success",
"@timestamp": "2019-04-30T06:05:44.661Z",
"errorInfo": "",
"timestamp": "1556604344651",
"type": "ShareTimelineImpl",
"priority": "INFO",
"@version": "1",
"userId": 521585010,
"body": """{"errorInfo":"","msg":"Add TimelineEntry success","success":true,"type":"ShareTimelineImpl","userId":521585010}""",
"fields": {
"_ds_unique_id": "452:65025:3752073:21855ca9:244477118",
"HOSTNAME": "timeline11.server.163.org"
},
"host": ""
}
},
搜索userID=526952388的结果
GET /lofter-exception-2019.04.30/_search
{"query": {
"bool": {
"must": [
{"match": {
"userId": 526952388
}}
]
}
}
}
结果有626条,而且HOSTNAME都是timeline11.server.163.org:
"hits": {
"total": 626,
"max_score": 1,
"hits": [
{
"_index": "lofter-exception-2019.04.30",
"_type": "doc",
"_id": "t5_cbGoBnt41zOK2Ltpc",
"_score": 1,
"_source": {
"nanos": "0",
"success": true,
"msg": "Add TimelineEntry success",
"@timestamp": "2019-04-30T06:08:50.150Z",
"errorInfo": "",
"timestamp": "1556604530017",
"type": "ShareTimelineImpl",
"priority": "INFO",
"@version": "1",
"userId": 526952388,
"body": """{"errorInfo":"","msg":"Add TimelineEntry success","success":true,"type":"ShareTimelineImpl","userId":526952388}""",
"fields": {
"_ds_unique_id": "452:65025:3752065:c732bf64:243554598",
"HOSTNAME": "timeline11.server.163.org"
},
"host": ""
}
},
想根据userID和HOSTNAME以及success的条件进行过滤。
HOSTNAME条件设置为timeline111.server.163.org,期望匹配结果为空。
GET /exception-2019.04.30/_search
{"query": {
"bool": {
"must": [
{
"match": {
"fields.HOSTNAME": "timeline111.server.163.org"
}
},
{"match": {
"userId": 526952388
}},
{
"match": {
"success": true
}
}
]
}
}
, "_source": ["userId","body","fields.HOSTNAME"]
}
实际结果还是有626条,感觉加的过滤条件不生效。
"hits": {
"total": 626,
"max_score": 1.4777467,
"hits": [
{
"_index": "lofter-exception-2019.04.30",
"_type": "doc",
"_id": "t5_cbGoBnt41zOK2Ltpc",
"_score": 1.4777467,
"_source": {
"body": """{"errorInfo":"","msg":"Add TimelineEntry success","success":true,"type":"ShareTimelineImpl","userId":526952388}""",
"fields": {
"HOSTNAME": "timeline11.server.163.org"
},
"userId": 526952388
}
},
网上针对这个问题的分析
[https://stackoverflow.com/questions/23150670/elasticsearch-match-vs-term-query]
修改请求体之后:
GET /lofter-exception-2019.04.30/_search
{"query": {
"bool": {
"must": [
{
"term": {
"fields.HOSTNAME.keyword": "timeline111.server.163.org"
}
},
{"term": {
"userId": 526952388
}},
{
"term": {
"success": true
}
}
]
}
}
, "_source": ["userId","body","fields.HOSTNAME"]
}
结果是符合预期的:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
如果使用ES的搜索过程中,发现加了过滤条件不生效,可以尝试以下方法:
1)条件字段是否有keyword,有的话,使用xxx.keyword
- match是分词匹配的,会将条件中的一句话切分为多个单词,只需其中一个单词匹配,就命中,然后根据Lucence的评分系统计算评分;而term是严格全句匹配的,但是有个条件:使用term要确定的这个字段是否“被分析”(analyzed),默认的字符串是被分析的。根据实际情况选择到底是match还是term。可以参照: https://www.jianshu.com/p/eb30eee13923
attention
ES中的查询操作分为2种:查询(query)和过滤(filter)。查询即是之前提到的query查询,它(查询)默认会计算每个返回文档的得分,然后根据得分排序。而过滤(filter)只会筛选出符合的文档,并不计算得分,且它可以缓存文档。所以,单从性能考虑,过滤比查询更快。
使用过滤语句得到的结果集 -- 一个简单的文档列表,快速匹配运算并存入内存是十分方便的, 每个文档仅需要1个字节。这些缓存的过滤结果集与后续请求的结合使用是非常高效的。
查询语句不仅要查找相匹配的文档,还需要计算每个文档的相关性,所以一般来说查询语句要比 过滤语句更耗时,并且查询结果也不可缓存。详细介绍可以参考:
https://doc.yonyoucloud.com/doc/mastering-elasticsearch/chapter-2/27_README.html