logstash grok正则语法规则

这几天一直在研究ELK的搭建和使用,遇到logstash 的grok模块的时候,被困扰了很久,网上搜索很多资料,大部分都是残缺不全的并且很多都是“点到为止”

grok语法详解

为了理解方便,我们先来举一个例子更加直观:

2019-06-18T16:21:17.237207+08:00 12350 [Note] Aborted connection 12350 to db: ‘imchat’ user: ‘db_im_chenyongyang_201811’ host: ‘192.168.0.67’ (Got timeout reading communication packets)

2019-06-18T16:21:59.761223+08:00 18306 [Note] Aborted connection 18306 to db: ‘imchat’ user: ‘db_im_chenyongyang_201811’ host: ‘192.168.0.65’ (Got timeout reading communication packets)

2019-06-18T16:57:45.311404+08:00 18305 [Note] Aborted connection 18305 to db: ‘imchat’ user: ‘db_im_chenyongyang_201811’ host: ‘192.168.0.65’ (Got timeout reading communication packets)

以上为mysql所产生的日志
以下为grok正则匹配语句:

(?[0-9]±\d±\d+T\d+:\d+:\d+.\d++\d+:\d+) %{NUMBER:num} (?[\w+]) (?\w+\s\w+) (?\d+) (?\w+\s\w+:) (?’\w+’) (?\w+:) (?’\w+’) (?\w+:) ‘%{IP:client}’ (?([a-zA-Z]+.*))

结果为:

{
“timestamp”: [
[
“2019-06-18T16:57:45.311404+08:00”
]
],
“num”: [
[
“18305”
]
],
“BASE10NUM”: [
[
“18305”
]
],
“message”: [
[
“[Note]”
]
],
“conn”: [
[
“Aborted connection”
]
],
“threadid”: [
[
“18305”
]
],
“db”: [
[
“to db:”
]
],
“database”: [
[
“‘imchat’”
]
],
“char”: [
[
“user:”
]
],
“username”: [
[
“‘db_im_chenyongyang_201811’”
]
],
“host”: [
[
“host:”
]
],
“client”: [
[
“192.168.0.65”
]
],
“IPV6”: [
[
null
]
],
“IPV4”: [
[
“192.168.0.65”
]
],
“eoormessages”: [
[
“(Got timeout reading communication packets)”
]
]
}

grok有两种匹配模式:
第一种:
给定关键字匹配,这种匹配模式的语法固定为:%{IP:clent}
%{IP:clent}中,IP为关键字,由grok给出,冒号后面为标签,可以自己定义。
第二种:
自由模式匹配,这种匹配模式的语法固定为:(?regexp)
(?regexp)中,lablename代表标签名,是自己定义的,后面的regexp为匹配的正则表达式

在写入grok配置时,可以先到: http://grokdebug.herokuapp.com/ 这个网址去先写匹配表达式,然后再复制到配置文件中

你可能感兴趣的:(Linux,ELK)