grok作为一个logstash的过滤插件,支持根据正则表达式解析文本日志行,拆成字段message结构化后再存储,方便kibana的搜索和统计。
.....
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';access_log /var/log/nginx/access.log main;
sendfile on;
#tcp_nopush on;keepalive_timeout 65;
#gzip on;
include /etc/nginx/conf.d/*.conf;
}
[root@centos6 nginx]# cat /var/log/nginx/access.log 查看日志输出内容:
192.168.10.132 - - [08/Jul/2019:12:53:45 +0800] "GET /saudgsg/bujguj HTTP/1.0" 200 1201 "-" "ApacheBench/2.3" "-"
192.168.10.132 - - [08/Jul/2019:12:53:45 +0800] "GET /saudgsg/bujguj HTTP/1.0" 200 1201 "-" "ApacheBench/2.3" "-"
192.168.10.132 - - [08/Jul/2019:12:53:45 +0800] "GET /saudgsg/bujguj HTTP/1.0" 200 1201 "-" "ApacheBench/2.3" "-"
192.168.10.1 - - [08/Jul/2019:12:54:36 +0800] "GET /indexfsd HTTP/1.1" 200 1201 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KH
TML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0" "-"
192.168.10.1 - - [08/Jul/2019:12:54:36 +0800] "GET /favicon.ico HTTP/1.1" 200 1320 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0" "-"
192.168.10.1 - - [08/Jul/2019:12:54:36 +0800] "GET /favicon.ico HTTP/1.1" 200 1320 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0" "-"
192.168.10.1 - - [08/Jul/2019:12:54:36 +0800] "POST /bs/base/searchIndexImage.htm?v=1&device=10 HTTP/1.1" 502 575 "-" "Mozilla/5.0 (Windows NT 1
0.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0" "-"
192.168.10.1 - - [08/Jul/2019:12:54:36 +0800] "POST /bs/base/getArticleList.htm?v=1&device=10 HTTP/1.1" 502 575 "-" "Mozilla/5.0 (Windows NT 10.
0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0" "-"
192.168.10.1 - - [08/Jul/2019:12:54:36 +0800] "POST /bs/thirdparthy/getShareUrl.htm?t=1562561715347 HTTP/1.1" 502 575 "-" "Mozilla/5.0 (Windows
NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0" "-"
logstash中默认存在一部分正则表达式来让我们套用,在如下的文件中我们可以看到:
/usr/local/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-patterns-core-4.1.2/patterns
其中最基本的定义是在grok-patterns中,但是某些正则不适合我们的nginx字段,此时就需要我们来自定义,然后grok通过patterns_dir来调用即可。 这里截取部分的文本内容供参考文本写法:
我这里编写了一个符合这台nginx服务器的日志过滤器,如果正则表达式不太熟的同学可以看下正则表达式-语法:
[root@centos6 patterns]# vim nginx-access
NGINXACCESS %{IP:clientip} - (%{USERNAME:user}|-) \[%{HTTPDATE:timestamp}\] \"%{WORD:request_verb} %{NOTSPACE:request} HTTP/%{NUMBER:httpversion
}\" %{NUMBER:status:int} %{NUMBER:body_sent:int} \"-\" \"%{GREEDYDATA:agent}\" \"-\"
logstash基本格式 input >> codec >> filter >> codec >> output ,codec用于文字编码格式转换
[root@centos6 bin]# vim nginx_access.conf
input {
file {
path => "/var/log/nginx/access.log" #日志文件路径
}
}filter {
grok {
patterns_dir => "/usr/local/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-patterns-core-4.1.2/patterns" #模块文件路径
match => { "message" => "%{NGINXACCESS}" } #使用过滤的方法
remove_field => "message" #过滤后丢弃原有信息
}
}output {
stdout {
codec=>rubydebug #屏幕输出调试
}
}
[root@centos6 bin]# ./logstash -f nginx_access.conf 启动logstash日志收集,并打开浏览器对nginx访问。输出内容如下:
左边为编写过滤器时自定义的文本名称和一些logstash自带参数,右边为日志文本过滤分段够的内容。
调式无误后对配置文件进一步修改,输出到elasticsearch: