Logstash 原理分析/配置文件详解 时间 日期 时区 ip 反斜杠 grok在线地址 类型转换

  1. 基本配置

    Logstash 本身不能建立集群,Filebeat 连接 Logstash 后会自动轮询 Logstash 服务器是否可用,把数据发送到可用的 Logstash 服务器上面去

    Logstash 配置,监听5044端口,接收 Filebeat 发送过来的日志,然后利用 grok 对日志过滤,根据不同的日志设置不同的 type,并将日志存储到 Elasticsearch 集群上面

    项目日志跟nginx日志配置在一起,elasticsearch 配置的索引 index 里面不能大写,不然会出现奇怪的bug

    input {
      beats {
        port => "5044"
      }
    }
     
    filter {
     
      date {
          match => ["@timestamp", "yyyy-MM-dd HH:mm:ss"]
      }
      grok {
        match => {
          "source" => "(?([A-Za-z]*-[A-Za-z]*-[A-Za-z]*)|([A-Za-z]*-[A-Za-z]*)|access|error)"
        }
      }
       mutate {
      	convert => [ "upstream_response_time", "float" ]
        }
     
    }
     
    output {
      # 针对不同的项目日志需要写不同的判断项
      if [type] == "MS-System-OTA"{
        elasticsearch {
          hosts => ["172.18.1.152:9200","172.18.1.153:9200","172.18.1.154:9200"]
          index => "logstash-ms-system-ota-%{+YYYY.MM.dd}"
        }
      }else if [type] == "access" or [type] == "error"{
        elasticsearch {
          hosts => ["172.18.1.152:9200","172.18.1.153:9200","172.18.1.154:9200"]
          index => "logstash-nginx-%{+YYYY.MM.dd}"
        }
      }else{
        elasticsearch {
          hosts => ["172.18.1.152:9200","172.18.1.153:9200","172.18.1.154:9200"]
        }
      }
      stdout {
        codec => rubydebug
      }
    }
    
  2. logstash 的 grok-patterns

    Grok 是 Logstash 最重要的插件之一,我们利用 Grok 对日志文件进行分析,取出我们需要的数据

    USERNAME [a-zA-Z0-9._-]+
    USER %{USERNAME}
    INT (?:[+-]?(?:[0-9]+))
    BASE10NUM (?[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))
    NUMBER (?:%{BASE10NUM})
    BASE16NUM (?(?"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>`(?>\\.|[^\\`]+)+`)|``))
    UUID [A-Fa-f0-9]{8}-(?:[A-Fa-f0-9]{4}-){3}[A-Fa-f0-9]{12}
    
    # Networking
    MAC (?:%{CISCOMAC}|%{WINDOWSMAC}|%{COMMONMAC})
    CISCOMAC (?:(?:[A-Fa-f0-9]{4}\.){2}[A-Fa-f0-9]{4})
    WINDOWSMAC (?:(?:[A-Fa-f0-9]{2}-){5}[A-Fa-f0-9]{2})
    COMMONMAC (?:(?:[A-Fa-f0-9]{2}:){5}[A-Fa-f0-9]{2})
    IPV6 ((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?
    IPV4 (?/(?>[\w_%!$@:.,-]+|\\.)*)+
    TTY (?:/dev/(pts|tty([pq])?)(\w+)?/?(?:[0-9]+))
    WINPATH (?>[A-Za-z]+:|\\)(?:\\[^\\?*]*)+
    URIPROTO [A-Za-z]+(\+[A-Za-z+]+)?
    URIHOST %{IPORHOST}(?::%{POSINT:port})?
    # uripath comes loosely from RFC1738, but mostly from what Firefox
    # doesn't turn into %XX
    URIPATH (?:/[A-Za-z0-9$.+!*'(){},~:;=@#%_\-]*)+
    #URIPARAM \?(?:[A-Za-z0-9]+(?:=(?:[^&]*))?(?:&(?:[A-Za-z0-9]+(?:=(?:[^&]*))?)?)*)?
    URIPARAM \?[A-Za-z0-9$.+!*'|(){},~@#%&/=:;_?\-\[\]]*
    URIPATHPARAM %{URIPATH}(?:%{URIPARAM})?
    URI %{URIPROTO}://(?:%{USER}(?::[^@]*)?@)?(?:%{URIHOST})?(?:%{URIPATHPARAM})?
    
    # Months: January, Feb, 3, 03, 12, December
    MONTH \b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\b
    MONTHNUM (?:0?[1-9]|1[0-2])
    MONTHNUM2 (?:0[1-9]|1[0-2])
    MONTHDAY (?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])
    
    # Days: Monday, Tue, Thu, etc...
    DAY (?:Mon(?:day)?|Tue(?:sday)?|Wed(?:nesday)?|Thu(?:rsday)?|Fri(?:day)?|Sat(?:urday)?|Sun(?:day)?)
    
    # Years?
    YEAR (?>\d\d){1,2}
    HOUR (?:2[0123]|[01]?[0-9])
    MINUTE (?:[0-5][0-9])
    # '60' is a leap second in most time standards and thus is valid.
    SECOND (?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)
    TIME (?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9])
    # datestamp is YYYY/MM/DD-HH:MM:SS.UUUU (or something like it)
    DATE_US %{MONTHNUM}[/-]%{MONTHDAY}[/-]%{YEAR}
    DATE_EU %{MONTHDAY}[./-]%{MONTHNUM}[./-]%{YEAR}
    ISO8601_TIMEZONE (?:Z|[+-]%{HOUR}(?::?%{MINUTE}))
    ISO8601_SECOND (?:%{SECOND}|60)
    TIMESTAMP_ISO8601 %{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}?
    DATE %{DATE_US}|%{DATE_EU}
    DATESTAMP %{DATE}[- ]%{TIME}
    TZ (?:[PMCE][SD]T|UTC)
    DATESTAMP_RFC822 %{DAY} %{MONTH} %{MONTHDAY} %{YEAR} %{TIME} %{TZ}
    DATESTAMP_RFC2822 %{DAY}, %{MONTHDAY} %{MONTH} %{YEAR} %{TIME} %{ISO8601_TIMEZONE}
    DATESTAMP_OTHER %{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{TZ} %{YEAR}
    DATESTAMP_EVENTLOG %{YEAR}%{MONTHNUM2}%{MONTHDAY}%{HOUR}%{MINUTE}%{SECOND}
    
    # Syslog Dates: Month Day HH:MM:SS
    SYSLOGTIMESTAMP %{MONTH} +%{MONTHDAY} %{TIME}
    PROG (?:[\w._/%-]+)
    SYSLOGPROG %{PROG:program}(?:\[%{POSINT:pid}\])?
    SYSLOGHOST %{IPORHOST}
    SYSLOGFACILITY <%{NONNEGINT:facility}.%{NONNEGINT:priority}>
    HTTPDATE %{MONTHDAY}/%{MONTH}/%{YEAR}:%{TIME} %{INT}
    
    # Shortcuts
    QS %{QUOTEDSTRING}
    
    # Log formats
    SYSLOGBASE %{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}:
    COMMONAPACHELOG %{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-)
    COMBINEDAPACHELOG %{COMMONAPACHELOG} %{QS:referrer} %{QS:agent}
    
    # Log Levels
    LOGLEVEL ([Aa]lert|ALERT|[Tt]race|TRACE|[Dd]ebug|DEBUG|[Nn]otice|NOTICE|[Ii]nfo|INFO|[Ww]arn?(?:ing)?|WARN?(?:ING)?|[Ee]rr?(?:or)?|ERR?(?:OR)?|[Cc]rit?(?:ical)?|CRIT?(?:ICAL)?|[Ff]atal|FATAL|[Ss]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?)
    
  3. 针对几个不同的message写的几个grok demo 读取日志文件
    1. 对于 nginx 的 error.log 的 message 的处理
    # message:   2018/09/18 16:33:51 [error] 15003#0: *545757 no live upstreams while connecting to upstream, client: 39.108.4.83, server: dev-springboot-admin.tvflnet.com, request: "POST /instances HTTP/1.1", upstream: "http://localhost/instances", host: "dev-springboot-admin.tvflnet.com"
    
    filter {
      #定义数据的格式
      grok {
        match => { "message" => "%{DATA:timestamp}\ \[%{DATA:level}\] %{DATA:nginxmessage}\, client: %{DATA:client}\, server: %{DATA:server}\, request: "%{DATA:request}\", upstream: "%{DATA:upstream}\", host: "%{DATA:host}\""}
      }
    }
    
    1. 对于 nginx 的 error.log 的 message 的处理
    # message:    2018/04/19 20:40:27 [error] 4222#0: *53138 open() "/data/local/project/WebSites/AppOTA/theme/js/frame/layer/skin/default/icon.png" failed (2: No such file or directory), client: 218.17.216.171, server: dev-app-ota.tvflnet.com, request: "GET /theme/js/frame/layer/skin/default/icon.png HTTP/1.1", host: "dev-app-ota.tvflnet.com", referrer: "http://dev-app-ota.tvflnet.com/theme/js/frame/layer/skin/layer.css"
    
    filter {
      #定义数据的格式
      grok {
        match => { "message" => "%{DATA:timestamp}\ \[%{DATA:level}\] %{DATA:nginxmessage}\, client: %{DATA:client}\, server: %{DATA:server}\, request: \"%{DATA:request}\", host: \"%{DATA:host}\", referrer: \"%{DATA:referrer}\""}
      }
    }
    
    1. 对于 lua 的 error.log 的 message 的处理
    # message:    2018/09/05 18:02:19 [error] 2325#0: *17083157 [lua] PushFinish.lua:38: end push statistics, client: 119.137.53.205, server: dev-system-ota-statistics.tvflnet.com, request: "POST /upgrade/push HTTP/1.1", host: "dev-system-ota-statistics.tvflnet.com"
    
    filter {
      #定义数据的格式
      grok {
        match => { "message" => "%{DATA:timestamp}\ \[%{DATA:level}\] %{DATA:luamessage}\, client: %{DATA:client}\, server: %{DATA:server}\, request: \"%{DATA:request}\", host: \"%{DATA:host}\""}
      }
    }
    
    1. 对于 电视端接口日志的 message 的处理
    # message:    traceid:[Thread:943-sn:sn-mac:mac] 2018-09-18 11:07:03.525 DEBUG com.flnet.utils.web.log.DogLogAspect 55 - Params-参数(JSON):{"backStr":"{\"groupid\":5}","build":201808310938,"ip":"119.147.146.189","mac":"mac","modelCode":"SHARP_0_50#SHARP#IQIYI#LCD_50SUINFCA_H","sn":"sn","version":"modelCode"}
    
    filter {
      #定义数据的格式
      grok {
        match => { "message" => "traceid:%{DATA:traceid}\[Thread:%{DATA:thread}\-sn:%{DATA:sn}\-mac:%{DATA:mac}\]\ %{TIMESTAMP_ISO8601:timestamp}\ %{DATA:level}\ %{GREEDYDATA:message}"}
      }
    }
    
    1. 对于 项目日志的 message 的处理
    # message:    traceid:[] 2018-09-14 02:14:48.209 WARN  de.codecentric.boot.admin.client.registration.ApplicationRegistrator 115 - Failed to register application as Application(name=ta-system-ota, managementUrl=http://TV-DEV-API01:10005/actuator, healthUrl=http://TV-DEV-API01:10005/actuator/health, serviceUrl=http://TV-DEV-API01:10005/, metadata={startup=2018-09-10T10:20:41.812+08:00}) at spring-boot-admin ([https://dev-springboot-admin.tvflnet.com/instances]): I/O error on POST request for "https://dev-springboot-admin.tvflnet.com/instances": connect timed out; nested exception is java.net.SocketTimeoutException: connect timed out. Further attempts are logged on DEBUG level
    
    filter {
      #定义数据的格式
      grok {
        match => { "message" => "traceid:\[%{DATA:traceid}\] %{TIMESTAMP_ISO8601:timestamp}\ %{DATA:level}\ %{GREEDYDATA:message}"}
      }
    }
    
    1. nginx 配置的日志
    # message:     {"@timestamp":"2018-09-20T02:47:00+08:00", "http_host":"":"system-ota-tvapi.tvflnet.com", "", "status":"200", "method":"HEAD / HTTP/1.1", "request_body":"-", "url":"/:"/index.html", "", "host":"":"172.18.156.12", "", "clientip":"":"100.116.222.149", "", "size":"0", "responsetime":"0.000", "upstreamtime":"-", "upstreamhost":"-", "xff":"":"140.205.205.25", "", "referer":"-", "agent":"Go-http-client/1.1"}
    filter {
      #定义数据的格式
      grok {
        match => { "message" =>  "{\"@timestamp\":\"%{TIMESTAMP_ISO8601:timestamp}\", \"http_host\":\"%{DATA:http_host}\", \"status\":\"%{DATA:status}\", \"method\":\"%{DATA:method}\", \"request_body\":\"%{DATA:request_body}\", \"url\":\"%{DATA:url}\", \"host\":\"%{DATA:host}\", \"clientip\":\"%{DATA:clientip}\", \"size\":\"%{DATA:size}\", \"responsetime\":\"%{DATA:responsetime}\", \"upstreamtime\":\"%{DATA:upstreamtime}\", \"upstreamhost\":\"%{DATA:upstreamhost}\", \"xff\":\"%{DATA:xff}\", \"referer\":\"%{DATA:referer}\", \"agent\":\"%{DATA:agent}\"}"
      }
    }
    

    对于多项 不同的匹配配置多个grok
    Logstash 启动命令:nohup ./bin/logstash -f ./config/conf.d/logstash-simple.conf >/dev/null 2>&1 &

  4. 对于日期时间的处理
filter {
  date {
    # 有多个项的话能匹配多个不同的格式
    match => [ "logdate", "MMM dd yyyy HH:mm:ss","ISO8601" ]
    target => "fieldName1"
    timezone => "Asia/Shanghai"
  }
}

date插件特有的选项如下:

  • local

    • string类型
    • 没有默认值
      用于指定本地方言,比如设置为en,en-US等.主要用于解析非数字的月,和天,比如Monday,May等.如果是时间日期都是数字的话,不用关心这个值.
  • match

    • array类型
    • 默认为[]
      用于将指定的字段按照指定的格式解析.比如:
    match => ["createtime", "yyyyMMdd","yyyy-MM-dd"]
    

    第一个值为字段名,其余值为解析的格式,如果有多个可能的格式,可以设置多个.

  • tag_on_failure

    • array类型
    • 默认为["_dateparsefailure"]
      添加一个值到tags字段中,如果日期解析失败.
  • target

    • string类型
    • 默认为@timestamp
    • 用于指定转化后的日期保存的字段名
  • timezone

    • string类型
    • 没有默认值
      用于为要被解析的时间指定一个时区,值为时区的canonical ID,可以在这里看到可以使用的值.
      一般不用设置,因为会根据当前系统的时区获取这个值.
      这里设置的时区并不是logstash最终储存的时间的时区,logstash最终储存的时间为 UTC标准时间.
      比如这里设置时间为20171120:

    如果时区为Asia/Shanghai那么转化后的时间为2017-11-19T16:00:00.000Z;
    如果时区为Europe/Vienna那么转化后的时间为2017-11-19T23:00:00.000Z;
    处理时区问题

    ruby { 
    	code => "event.set('timestamp', event.get('@timestamp').time.localtime + 8*60*60)" 
    }
    ruby {
    	code => "event.set('@timestamp',event.get('timestamp'))"
    }
    
  • 转义字符(其他字符)的转换
    mutate {
       gsub => [
         "request_body", "\\x22", '"'
       ]
       gsub => [
         "request_body", "\\x0A", "\n"
       ]
     }
    
  • JSON 处理
    json {
       source => "message"
     }
    
  • 删除某些项
    mutate {
      remove_field => [ "message" ]
    }
    
  • 格式转换
    mutate {
      convert => [ "upstream_response_time", "float" ]
    }
    

    Elasticsearch 字段数据类型

    Elasticsearch 可以支持单个document中含有多个不同的数据类型。

  • 核心数据类型(Core datatypes)
    • 字符型(String datatype):string
    • 数字型(Numeric datatypes):long:64位存储 , integer:32位存储 , short:16位存储 , byte:8位存储 , double:64位双精度存储 , float:32位单精度存储
    • 日期型(Date datatype):date
    • 布尔型(Boolean datatype):boolean
    • 二进制型(Binary datatype):binary
  • 复杂数据类型(Complex datatypes)
    • 数组类型(Array datatype):数组类型不需要专门指定数组元素的type,例如:
      • 字符型数组: [ “one”, “two” ]
      • 整型数组:[ 1, 2 ]
      • 数组型数组:[ 1, [ 2, 3 ]] 等价于[ 1, 2, 3 ]
      • 对象数组:[ { “name”: “Mary”, “age”: 12 }, { “name”: “John”, “age”: 10 }]
    • 对象类型(Object datatype): object 用于单个JSON对象;
    • 嵌套类型(Nested datatype): nested 用于JSON数组;
  • 地理位置类型(Geo datatypes)
    • 地理坐标类型(Geo-point datatype): geo_point 用于经纬度坐标;
    • 地理形状类型(Geo-Shape datatype): geo_shape 用于类似于多边形的复杂形状;
  • 专业类型(Specialised datatypes)
    • IPv4 类型(IPv4 datatype): ip 用于IPv4 地址;
    • Completion 类型(Completion datatype): completion 提供自动补全建议;
    • Token count 类型(Token count datatype): token_count 用于统计做了标记的字段的index数目,该值会一直增加,不会因为过滤条件而减少。
    • mapper-murmur3 类型:通过插件,可以通过 murmur3 来计算index的 hash 值;
    • 附加类型(Attachment datatype):采用mapper-attachments 插件,可支持 attachments 索引,例如Microsoft Office 格式,Open Document 格式,ePub, HTML 等。
  1. Logstash 关于 '\' 反斜杠的处理
    利用 mutategsub 处理字符串要保留斜杠的时候会出现解析失败,
    想要保留反斜杠,必须在反斜杠后面保留一个字符,如下
    mutate {
      gsub => [
        "request_body", "\\x5C\\x22", '\\"'
      ]
    
  2. Logstash 处理 ip
    	geoip {
    		source => "clientip"
    	}
    
  3. Logstash 在线验证地址
  • 国内:http://grok.qiexun.net/

  • 国外:http://grokdebug.herokuapp.com/

  • 详细资料:https://www.cnblogs.com/iiiiher/p/7919149.html

  • grok 语法:https://github.com/kkos/oniguruma/blob/master/doc/RE

你可能感兴趣的:(ELK)