因为我是以 elk stock 结构为目标,所以我会以 elasticsearch + redis + logstash + kibana 为中心来写下面的内容。
通过 《logstash 最佳实践》 学习
管道配置文件主体内容
管理配置文件主要用来发起任务的。输入(input)、处理(filter)、输出(output)。
input
这个主要指定监听那些文件或输出。我的elk stock 架构中,只有文件和redis两个类型
input {
# redis
redis {
host => "127.0.0.1"
port => 6379
password => "123456"
key => "logstash-queue"
data_type => "list"
db => 0
}
# 文件
file {
type => "nginx-access"
path => "/usr/local/nginx/logs/access.log"
start_position => beginning
sincedb_path => "/var/log/logstash/sincedb/nginx"
codec => multiline {
pattern => "^\d+"
negate => true
what => "previous"
}
}
}
note: input.file.codec 这个日志内容如果会出现多行,可以通过 ^d+ 进行分割,换行会被转成 n
fileter
常用匹配方式 grok(正则匹配)
logstash-7.4.0/vendor/bundle/jruby/2.5.0/gems/logstash-patterns-core-4.1.2/patterns 目录下面是预定义的正则匹配。使用方法如 %{IPORHOST:client}
如果没有办法满足,可以自己写正则去匹配。验证正则是否正确可以通过 kibana 里的开发工具(Dev) > Grok调试器(Grok Debugger) 来验证。
也可以通过 http://grokdebug.herokuapp.com/ 验证。
filter {
if [type] == "nginx-access" {
grok {
match => {
"message" => "%{COMBINEDAPACHELOG}+%{GREEDYDATA:extra_fields}"
}
}
} else if [type] == "nginx-error" {
grok {
match => ["message" , "(?%{YEAR}[./-]%{MONTHNUM}[./-]%{MONTHDAY}[- ]%{TIME}) \[%{LOGLEVEL:severity}\] %{POSINT:pid}#%{NUMBER}: %{GREEDYDATA:errormessage}(?:, client: (?%{IP}|%{HOSTNAME}))(?:, server: %{IPORHOST:server}?)(?:, request: %{QS:request})?(?:, upstream: (?\"%{URI}\"|%{QS}))?(?:, host: %{QS:request_host})?(?:, referrer: \"%{URI:referrer}\")?"]
}
}
}
优化方案
- 直接传入 json 格式的日志
直接传入日志可以省日志内容匹配部分资源占用。但是并不是所有的软件日志都能配置。有些鸡肋。
output
output {
redis {
host => "127.0.0.1"
port => 6379
password => "123456"
key => "logstash-queue"
data_type => "list"
db => 4
}
elasticsearch {
hosts => ["http://localhost:9200"]
index => "logstash-%{+YYYY.MM.dd}"
}
}
导入 es 时设置字段类型
es 里支持全文索引,但是默认是支持英文的。不符合我们的需求,我们需要借用 ik 分词插件才能达到要求。
FQA
1、一条数据有很多行的处理办法
使用 input.codec 进行合并, 以 nginx 默认格式日志为例。
2019/09/23 10:39:01 [error] 4130#0: *1 FastCGI sent in stderr: "PHP message: PHP Warning: require(/var/www/study/tp5-study/public/../thinkphp/base.php): failed to open stream: No such file or directory in /var/www/study/tp5-study/public/index.php on line 16
PHP message: PHP Stack trace:
PHP message: PHP 1. {main}() /var/www/study/tp5-study/public/index.php:0
PHP message: PHP Fatal error: require(): Failed opening required '/var/www/study/tp5-study/public/../thinkphp/base.php' (include_path='.:') in /var/www/study/tp5-study/public/index.php on line 16
PHP message: PHP Stack trace:
PHP message: PHP 1. {main}() /var/www/study/tp5-study/public/index.php:0" while reading response header from upstream, client: 192.168.33.1, server: tp5.study.me, request: "GET /favicon.ico HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "tp5.study.me", referrer: "http://tp5.study.me/"
2019/09/23 10:40:14 [error] 4130#0: *7 FastCGI sent in stderr: "PHP message: PHP Warning: require(/var/www/study/tp5-study/public/../thinkphp/base.php): failed to open stream: No such file or directory in /var/www/study/tp5-study/public/index.php on line 16
PHP message: PHP Stack trace:
PHP message: PHP 1. {main}() /var/www/study/tp5-study/public/index.php:0
PHP message: PHP Fatal error: require(): Failed opening required '/var/www/study/tp5-study/public/../thinkphp/base.php' (include_path='.:') in /var/www/study/tp5-study/public/index.php on line 16
PHP message: PHP Stack trace:
PHP message: PHP 1. {main}() /var/www/study/tp5-study/public/index.php:0" while reading response header from upstream, client: 192.168.33.1, server: tp5.study.me, request: "GET /favicon.ico HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "tp5.study.me"
以上内容可知,第条日志的开头都是由日期组成的。所以我们以数字开头的进行日志分割。即可
input {
stdin {
codec => multiline {
pattern => "^\d+"
negate => true
what => "previous"
}
}
}
2、日志默认会带着一个 message,这个message 是未匹配数据的日志。已经把内容提出来了,就没有必要存在原始数据。
filter {
grok {
match => ["message" , "(?%{YEAR}[./-]%{MONTHNUM}[./-]%{MONTHDAY}[- ]%{TIME}) \[%{LOGLEVEL:severity}\] %{POSINT:pid}#%{NUMBER}: %{GREEDYDATA:message}(?:, client: (?%{IP}|%{HOSTNAME}))(?:, server: %{IPORHOST:server}?)(?:, request: %{QS:request})?(?:, upstream: (?\"%{URI}\"|%{QS}))?(?:, host: %{QS:request_host})?(?:, referrer: \"%{URI:referrer}\")?"]
overwrite => ["message"]
}
}
通过 overwrite 进行重写。overwrite必需在 filter.grok 里
3、日志抓取中都有 @timestamp,我希望旧数据的时间写到这个时间里去
注:这个是 logstash 自带的东西,不推荐修改,所以用 timestamp 来代替,不同的是这个是匹配得到的时间