由于公司没有日志收集系统,所以之前搭建了一个ELK日志收集分析系统,如此一来查询日志就方便了很多,再加上ELK的一些方便实用的索引分析功能,可以将数据已图表形式展现,一目了然。
网上好多都是采用了ELK+redis的架构,但考虑到目前公司业务不是很多,所以没有加入redis一层
ELK的安装请参考我上一篇文章 ELK环境搭建
其实你的ELK系统搭建完成以后,剩下的主要就是logstash收集配置的编写以及整个系统的性能调优了,性能调优后续在写。
这里会逐步说明配置的一些含义,后面会将完整配置贴出。
nginx日志是自定义的格式,所以需要用logstash将message格式化存储到ES中,这里采用grok过滤器,使用match正则表达式解析,根据自己的log_format定制。
log_format main '$remote_addr | $time_local | $request | $uri | '
'$status | $body_bytes_sent | $bytes_sent | $gzip_ratio | $http_referer | '
'"$http_user_agent" | $http_x_forwarded_for | $upstream_addr | $upstream_response_time | $upstream_status | $request_time';
IP地址 | 29/Nov/2016:10:25:16 +0800 | POST /api HTTP/1.1 | /api | 200 | 108 | 326 | - | - | "UGCLehiGphoneClient/2.9.0 Mozilla/5.0 (Linux; Android 5.0.2; X800 Build/BBXCNOP5500710201S) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/37.0.0.0 Mobile Safari/537.36" | - | IP地址:端口号 | 0.058 | 200 | 0.058
这里可以借助grop官网的debugger和patterns来快速帮助我们写正则表达式
对应上面输出的日志,我写的正则表达式如下
%{IPORHOST:clientip} \| %{HTTPDATE:timestamp} \| (?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:http_version})?|-) \| %{URIPATH:uripath} \| %{NUMBER:response} \| (?:%{NUMBER:body_bytes_sent}|-) \| (?:%{NUMBER:bytes_sent}|-) \| (?:%{NOTSPACE:gzip_ratio}|-) \| (?:%{QS:http_referer}|-) \| %{QS:agent} \| (?:%{QS:http_x_forwarded_for}|-) \| (%{URIHOST:upstream_addr}|-) \| (%{BASE16FLOAT:upstream_response_time}) \| %{NUMBER:upstream_status} \| (%{BASE16FLOAT:request_time})
在grop官网的debugger解析结果如下
{
"clientip": [
[
"10.73.134.29"
]
],
"HOSTNAME": [
[
"10.73.134.29",
"117.121.58.159"
]
],
"IP": [
[
null,
null
]
],
"IPV6": [
[
null,
null
]
],
"IPV4": [
[
null,
null
]
],
"timestamp": [
[
"28/Nov/2016:16:13:07 +0800"
]
],
"MONTHDAY": [
[
"28"
]
],
"MONTH": [
[
"Nov"
]
],
"YEAR": [
[
"2016"
]
],
"TIME": [
[
"16:13:07"
]
],
"HOUR": [
[
"16"
]
],
"MINUTE": [
[
"13"
]
],
"SECOND": [
[
"07"
]
],
"INT": [
[
"+0800"
]
],
"verb": [
[
"POST"
]
],
"request": [
[
"/inner"
]
],
"http_version": [
[
"1.1"
]
],
"BASE10NUM": [
[
"1.1",
"200",
"243",
"461",
"200"
]
],
"uripath": [
[
"/inner"
]
],
"response": [
[
"200"
]
],
"body_bytes_sent": [
[
"243"
]
],
"bytes_sent": [
[
"461"
]
],
"gzip_ratio": [
[
"-"
]
],
"http_referer": [
[
null
]
],
"QUOTEDSTRING": [
[
null,
""-"",
null
]
],
"agent": [
[
""-""
]
],
"http_x_forwarded_for": [
[
null
]
],
"upstream_addr": [
[
"117.121.58.159:8001"
]
],
"IPORHOST": [
[
"117.121.58.159"
]
],
"port": [
[
"8001"
]
],
"upstream_response_time": [
[
"0.046"
]
],
"upstream_status": [
[
"200"
]
],
"request_time": [
[
"0.046"
]
]
}
地理坐标分析-geoip
cd 你的logstash地址/logstash/etc
curl -O "http://geolite.maxmind.com/download/geoip/database/GeoLiteCity.dat.gz"
gunzip GeoLiteCity.dat.gz
只需要在filter里配置即可
#地理坐标分析
geoip {
source => "clientip"
##这里指定好解压后GeoIP数据库文件的位置
database => "替换为你的文件路径/logstash-2.4.1/etc/GeoLiteCity.dat"
}
如果你出现下面的错误
报错No Compatible Fields: The “[nginx-access-]YYYY-MM” index pattern does not contain any of the following field types: geo_point
原因:索引格式为[nginx-access-]YYYY-MM的日志文件由logstash输出到Elasticsearch;在 elasticsearch 中,所有的数据都有一个类型,什么样的类型,就可以在其上做一些对应类型的特殊操作。geo信息中的location字段是经纬度,我们需要使用经纬度来定位地理位置;在 elasticsearch 中,对于经纬度来说,要想使用 elasticsearch 提供的地理位置查询相关的功能,就需要构造一个结构,并且将其类型属性设置为geo_point,此错误明显是由于我们的geo的location字段类型不是geo_point。
解决方法:**Elasticsearch 支持给索引预定义设置和 mapping(前提是你用的 elasticsearch 版本支持这个 API,不过估计应该都支持)。其实ES中已经有一个默认预定义的模板,我们只要使用预定的模板即可,我们在ES中看下模板。简而言之就是**output的index名称,必须以logstash-
开头
output {
if [type] == "nginx_lehi_access" { #nginx-access
elasticsearch {
action => "index" #The operation on ES
hosts => [替换为你的ES服务器列表,字符串数组格式] #ElasticSearch host, can be array.
index => "logstash-nginx_lehi_access" #The index to write data to.
}
}
}
input {
file {
type => "nginx_lehi_access"
#监听文件的路径
path => "替换为你的日志文件路径/access.log"
}
}
filter {
if [type] == "nginx_lehi_access" {
#定义数据的格式
grok {
match => [
"message", "%{IPORHOST:clientip} \| %{HTTPDATE:timestamp} \| (?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:http_version})?|-) \| %{URIPATH:uripath} \| %{NUMBER:response} \| (?:%{NUMBER:body_bytes_sent}|-) \| (?:%{NUMBER:bytes_sent}|-) \| (?:%{NOTSPACE:gzip_ratio}|-) \| (?:%{QS:http_referer}|-) \| %{QS:user_agent} \| (?:%{QS:http_x_forwarded_for}|-) \| (%{URIHOST:upstream_addr}|-) \| (%{BASE16FLOAT:upstream_response_time}) \| %{NUMBER:upstream_status} \| (%{BASE16FLOAT:request_time})"
]
}
#定义时间戳的格式
date {
match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ]
}
#地理坐标分析
geoip {
source => "clientip"
##这里指定好解压后GeoIP数据库文件的位置
database => "替换为你的文件路径/logstash-2.4.1/etc/GeoLiteCity.dat"
}
#同样地还有客户端的UA,由于UA的格式比较多,logstash也会自动去分析,提取操作系统等相关信息
#定义客户端设备是哪一个字段
useragent {
source => "user_agent"
target => "userAgent"
}
#把所有字段进行urldecode(显示中文)
urldecode {
all_fields => true
}
#需要进行转换的字段,这里是将访问的时间转成int,再传给Elasticsearch。注:似乎没有double,只有float,这里我没有深入研究,总之写double不对。
mutate {
gsub => ["user_agent","[\"]",""] #将user_agent中的 " 换成空
convert => [ "response","integer" ]
convert => [ "body_bytes_sent","integer" ]
convert => [ "bytes_sent","integer" ]
convert => [ "upstream_response_time","float" ]
convert => [ "upstream_status","integer" ]
convert => [ "request_time","float" ]
convert => [ "port","integer" ]
}
}
}
output {
if [type] == "nginx_lehi_access" { #nginx-access
elasticsearch {
action => "index" #The operation on ES
hosts => [替换为你的ES服务器列表,字符串数组格式] #ElasticSearch host, can be array.
index => "logstash-nginx_lehi_access" #The index to write data to.
}
}
}
到此,就可以在ES中查看具体解析出来的数据是什么样子的了
进入http://你的IP:PORT/_plugin/head/
,点击数据浏览
,找到你的nginx索引,选一条点击查看生成的索引原始数据:
{
"_index": "logstash-nginx_lehi_access",
"_type": "nginx_lehi_access",
"_id": "AViqy6DEzXT_yrqr__Ka",
"_version": 1,
"_score": 1,
"_source": {
"message": "116.226.72.255 | 28/Nov/2016:19:57:00 +0800 | POST /api HTTP/1.1 | /api | 200 | 1314 | 1533 | - | - | "Dalvik/2.1.0(Linux;U;Android6.0;LetvX501Build/DBXCNOP5801810092S)" | - | 117.121.58.159:8001 | 0.023 | 200 | 0.023",
"@version": "1",
"@timestamp": "2016-11-28T11:57:00.000Z",
"path": "/letv/logs/nginx/lehi/access.log",
"host": "vm-29-19-pro01-bgp.bj-cn.vpc.letv.cn",
"type": "nginx_lehi_access",
"clientip": "116.226.72.255",
"timestamp": "28/Nov/2016:19:57:00 +0800",
"verb": "POST",
"request": "/api",
"http_version": "1.1",
"uripath": "/api",
"response": 200,
"body_bytes_sent": 1314,
"bytes_sent": 1533,
"gzip_ratio": "-",
"user_agent": "Dalvik/2.1.0 (Linux; U; Android 6.0; Letv X501 Build/DBXCNOP5801810092S)",
"upstream_addr": "117.121.58.159:8001",
"port": 8001,
"upstream_response_time": 0.023,
"upstream_status": 200,
"request_time": 0.023,
"geoip": {
"ip": "116.226.72.255",
"country_code2": "CN",
"country_code3": "CHN",
"country_name": "China",
"continent_code": "AS",
"region_name": "23",
"city_name": "Shanghai",
"latitude": 31.045600000000007,
"longitude": 121.3997,
"timezone": "Asia/Shanghai",
"real_region_name": "Shanghai",
"location": [121.3997,
31.045600000000007]
},
"userAgent": {
"name": "Android",
"os": "Android 6.0",
"os_name": "Android",
"os_major": "6",
"os_minor": "0",
"device": "Letv X501",
"major": "6",
"minor": "0"
}
}
}
这里请参考logstash日志分析的配置和使用(设计模板)
这是同一个log4j_to_es.conf
配置了多个input-file
并且配置了multiline
后出现的问题,应该把multiline
配置从filter中移动到input中去:
file {
type => "whatsliveapi"
#监听文件的路径
path => "/letv/logs/apps/api/whatslive/api_8001.log"
codec => multiline {
pattern => "^\d{4}-\d{1,2}-\d{1,2}\s\d{1,2}:\d{1,2}:\d{1,2}"
negate => true
what => "previous"
}
}
解决办法:进入kibana-settings-indices,点击你的索引,然后点击刷新按钮即可