昨日内容回顾:
(1)logstash的安装部署及常用的参数
-r:
热加载配置文件。
-t:
检查配置文件语法是否正确。
-f:
指定自定义的配置文件,若不指定,则读取pipline.yaml配置文件。
-e:
指定自定义的配置参数字符串,无需通过配置文件指定。
(2)logstash的组成部分:
- input:
负责数据采集的。
- stdin
- file
- tcp
- http
- beats
- elasticsearch
- kafka
- filter:
负责数据过滤。
- output:
负责数据的目的端。
- stdout
- file
- redis
- elasticsearch
(3)logstash的多分支语句
if 条件表达式 {
} else if 条件表达式 {
} else if 条件表达式 {
} else {
}
Q1: logstash启动时不加任何参数,及加了-e或者-f选项有啥区别?
logstash启动时不加任何参数,读取pipline.yaml配置文件。
logstash启动时-e参数,指定临时的配置参数,在配置参数较少的时候可以使用,一般情况下是测试的。使用该参数会自动忽略pipline.yaml配置文件。
logstash启动时-f参数,指定配置文件,在生产环境中使用较多。使用该参数会自动忽略pipline.yaml配置文件。
生产环境中,推荐使用pipline.yaml来管理logstash服务,相对来说比较简单。
Q2: 如何启动logstash的多实例呢?
实例1:
logstash -rf config/01-stdin-to-stdout.conf
实例2:
logstash -rf config/08-if-demo2.yaml --path.data /tmp/oldboyedu-linux82-logstash --pipeline.id oldboyedu-linux82
Q3: 请问logstash多实例和pipline的方式有何种区别呢?
1)pipline本质上只是启动了一个logstash实例,底层对应了一个JAVA虚拟机,可以以多个数据流管道来分开处理业务逻辑;
2)若使用logstash多实例,则会启动多个JAVA虚拟机,耗费的成本更大;
pipeline实战案例:
1)编写pipline
vim /oldboyedu/softwares/logstash-7.17.5/config/pipelines.yml
...
# 基于-e和-f参数指定。
# 管道的名称。
- pipeline.id: oldboyedu-linux82
# 在filter和output插件中有多少个线程工作。
pipeline.workers: 1
# pipline中有多少个事件(EVENT)批量发送给FILTERS+OUTPUTS。
pipeline.batch.size: 100
# 当pipeline.batch.size不足指定的事件时,只要延迟多少毫秒就发送数据给FILTERS+OUTPUTS。
pipeline.batch.delay: 50
# 相当于-e参数
config.string: "input { generator {} } filter { sleep { time => 1 } } output { stdout { codec => dots } }"
- pipeline.id: oldboyedu-shahe
# 队列的数据存储在磁盘,默认是在内存。
queue.type: persisted
# 相当于-f参数
path.config: "/tmp/logstash/*.config"
# 多行匹配的案例
- pipeline.id: oldboyedu-linux
config.string: |
input {
generator {
lines => [
"line 1 ---> 老男孩",
"line 2 ---> 苍老师"
]
# Emit all lines 3 times.
count => 3
}
}
filter {
sleep { time => 3 }
}
output {
pipeline {
send_to => ["oldboyedu-linux82"]
}
}
- pipeline.id: oldboyedu-python
config.string: |
input {
pipeline {
address => "oldboyedu-linux82"
}
}
filter {
sleep {
time => 2
}
}
output {
stdout {}
}
2)启动logstash实例
logstash -r
logstash跨集群迁移数据:
cat > config/09-es6-to-es7.conf <
elasticsearch {
hosts => ["10.0.0.101:19200","10.0.0.102:19200","10.0.0.103:19200"]
index => "teacher"
query => '{ "query": { "match": { "name": "oldboy" } } }'
}
}
output {
# stdout {}
elasticsearch {
hosts => ["10.0.0.101:9200","10.0.0.102:9200","10.0.0.103:9200"]
index => "oldboyedu-linux82-teacher"
}
}
EOF
logstash常用的filter插件-GROK
1)logstash
cat > config/10-beats-grok-es.conf <
beats {
port => 9999
}
}
filter {
grok {
match => {
"message" => "%{HTTPD_COMMONLOG}"
}
}
}
output {
stdout {}
elasticsearch {
hosts => ["10.0.0.101:9200","10.0.0.102:9200","10.0.0.103:9200"]
index => "oldboyedu-linux82-nginx"
}
}
EOF
2)filebeat
cat > config/21-nginx-to-logstash.yaml <
- type: log
paths:
- /var/log/nginx/access.log*
output.logstash:
hosts: ["10.0.0.103:9999"]
EOF
grok插件自定义匹配模式案例:
input {
http {
port => 8888
}
}
filter {
grok {
# 指定加载的自定义模式的目录,支持相对路径和绝对路径,但要注意一下几点:
# 1)指定的路径运行logstash用户必须有访问权限;
# 2)相对路径指的是logstash程序运行的所在目录,绝对路径并不受任何影响;
# 3)匹配模式的文件名称可以自定义,并不会有任何影响;
#
# 客户端发送的测试数据如下:
# {
# "school": "oldboyedu 2022-08-23 [WARING] 192.168.11.253",
# }
#
# 自定义的匹配模式如下:
# OLDBOYEDU_LINUX82_IP \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
# OLDBOYEDU_LINUX82_DATE [0-9]{4}-\d{2}-\d{2}
#
patterns_dir => ["./oldboyedu-linux82-patterns"]
match => {
# "school" => "%{OLDBOYEDU_LINUX82_DATE:linux82-date}"
# "school" => "%{OLDBOYEDU_LINUX82_IP:linux82-ip}"
# "school" => "%{OLDBOYEDU_LINUX82_DATE:linux82-date} \[WARING\] %{OLDBOYEDU_LINUX82_IP:linux82-ip}"
"school" => "%{OLDBOYEDU_LINUX82_DATE:linux82-date} .* %{OLDBOYEDU_LINUX82_IP:linux82-ip}"
}
}
}
output {
stdout {}
# elasticsearch {
# hosts => ["10.0.0.101:9200","10.0.0.102:9200","10.0.0.103:9200"]
# index => "oldboyedu-linux82-nginx"
# }
}
小彩蛋: 将"*.yaml"的文件更名为"*.conf"
rename .yaml .conf config/*.yaml
使用grok解析任意文本配合geoip解析IP地址:
cat > config/12-http-grok_geoip-es.conf <
http {
port => 8888
}
}
filter {
grok {
patterns_dir => ["./oldboyedu-linux82-patterns"]
match => {
"school" => "%{OLDBOYEDU_LINUX82_DATE:linux82-date} .* %{OLDBOYEDU_LINUX82_IP:linux82-ip}"
}
}
geoip {
# 指定IP地址的字段,要求该IP地址为公网地址
source => "linux82-ip"
# 将Geoip解析的数据放在一个字段中,若不指定,则默认放在"geoip"字段中.
target => "oldboyedu-linux82-ip"
}
}
output {
stdout {}
# elasticsearch {
# hosts => ["10.0.0.101:9200","10.0.0.102:9200","10.0.0.103:9200"]
# index => "oldboyedu-linux82-nginx"
# }
}
EOF
cat config/12-http-grok_geoip-es.conf
input {
http {
port => 8888
}
}
filter {
grok {
patterns_dir => ["./oldboyedu-linux82-patterns"]
match => {
# "school" => "%{OLDBOYEDU_LINUX82_DATE:linux82-date} .* %{OLDBOYEDU_LINUX82_IP:linux82-ip}"
"school" => "%{OLDBOYEDU_LINUX82_DATE:linux82-date} .* %{IP:linux82-ip}"
}
}
geoip {
# 指定IP地址的字段,要求该IP地址为公网地址
source => "linux82-ip"
# 将Geoip解析的数据放在一个字段中,若不指定,则默认放在"geoip"字段中.
target => "oldboyedu-linux82-ip"
# 添加字段
add_field => {
"address" => "北京市昌平区沙河镇老男孩IT教育."
}
# 删除字段
remove_field => [
"headers",
"@version"
]
# 添加tag
add_tag => [
"Linux82",
"LOL"
]
# 删除tag
remove_tag => [
"LOL"
]
}
}
output {
stdout {}
# elasticsearch {
# hosts => ["10.0.0.101:9200","10.0.0.102:9200","10.0.0.103:9200"]
# index => "oldboyedu-linux82-nginx"
# }
}
filter插件通用字段:
cat > config/12-http-grok_geoip-es.conf <
http {
port => 8888
}
}
filter {
grok {
patterns_dir => ["./oldboyedu-linux82-patterns"]
match => {
# "school" => "%{OLDBOYEDU_LINUX82_DATE:linux82-date} .* %{OLDBOYEDU_LINUX82_IP:linux82-ip}"
"school" => "%{OLDBOYEDU_LINUX82_DATE:linux82-date} .* %{IP:linux82-ip}"
}
}
geoip {
# 指定IP地址的字段,要求该IP地址为公网地址
source => "linux82-ip"
# 将Geoip解析的数据放在一个字段中,若不指定,则默认放在"geoip"字段中.
target => "oldboyedu-linux82-ip"
# 添加字段
add_field => {
"address" => "北京市昌平区沙河镇老男孩IT教育."
}
# 删除字段
remove_field => [
"headers",
"@version"
]
# 添加tag
add_tag => [
"Linux82",
"LOL"
]
# 删除tag
remove_tag => [
"LOL"
]
}
}
output {
stdout {}
# elasticsearch {
# hosts => ["10.0.0.101:9200","10.0.0.102:9200","10.0.0.103:9200"]
# index => "oldboyedu-linux82-nginx"
# }
}
EOF
时间插件-date:
cat > config/13-nginx-grok_geoip_date-es.conf <
beats {
port => 9999
}
}
filter {
grok {
remove_field => [
"ecs",
"agent",
"@version",
"tags",
"input"
]
match => {
"message" => "%{HTTPD_COMBINEDLOG}"
}
}
geoip {
source => "clientip"
target => "oldboyedu-linux82-ip"
}
date {
# 第一个字段匹配的是基于哪个字段解析时间格式,"21/Aug/2022:17:57:53 +0800"
# 第二个字段就是解析时间格式的语法
match => [
"timestamp",
"dd/MMM/yyyy:HH:mm:ss Z"
]
# 注意,如果match字段没有匹配时区,我们一定要显式设置时区,可以在kibana查看数据的时间是准确的!
# 如果设置为UTC时间,时区是不准确的!尽管在console中看不出有任何差距。
# timezone => "Asia/Shanghai"
# timezone => "UTC"
# 将解析后的结果存储到指定的字段,若不指定,则默认覆盖原有的"@timestamp"字段哟!通常情况下不建议修改!
# target => "oldboyedu-liunux82-date"
}
}
output {
stdout {}
elasticsearch {
hosts => ["10.0.0.101:9200","10.0.0.102:9200","10.0.0.103:9200"]
index => "oldboyedu-linux82-nginx"
}
}
EOF
使用useragent分析客户端设备类型:
cat > config/13-nginx-grok_geoip_date-es.conf <
beats {
port => 9999
}
}
filter {
geoip {
source => "clientip"
target => "oldboyedu-linux82-ip"
remove_field => [
"tags",
"input",
"ecs",
"@version",
"agent"
]
}
date {
# 第一个字段匹配的是基于哪个字段解析时间格式,"21/Aug/2022:17:57:53 +0800"
# 第二个字段就是解析时间格式的语法
match => [
"timestamp",
"dd/MMM/yyyy:HH:mm:ss Z"
]
# 注意,如果match字段没有匹配时区,我们一定要显式设置时区,可以在kibana查看数据的时间是准确的!
# 如果设置为UTC时间,时区是不准确的!尽管在console中看不出有任何差距。
# timezone => "Asia/Shanghai"
# timezone => "UTC"
# 将解析后的结果存储到指定的字段,若不指定,则默认覆盖原有的"@timestamp"字段哟!通常情况下不建议修改!
# target => "oldboyedu-liunux82-date"
}
useragent {
# 指定基于哪个字段分析客户端的设备类型
source => "http_user_agent"
# 将解析后的结果存储到指定的字段,若不指定,则将解析的字段放在顶级字段中
target => "oldboyedu-linux82-device"
}
}
output {
stdout {}
elasticsearch {
hosts => ["10.0.0.101:9200","10.0.0.102:9200","10.0.0.103:9200"]
index => "oldboyedu-linux82-nginx"
}
}
EOF
mutate组件数据准备-python脚本
cat > generate_log.py <
# -*- coding: UTF-8 -*-
# @author : oldboyedu-linux82
import datetime
import random
import logging
import time
import sys
LOG_FORMAT = "%(levelname)s %(asctime)s [com.oldboyedu.%(module)s] - %(message)s "
DATE_FORMAT = "%Y-%m-%d %H:%M:%S"
# 配置root的logging.Logger实例的基本配置
logging.basicConfig(level=logging.INFO, format=LOG_FORMAT, datefmt=DATE_FORMAT, filename=sys.argv[1], filemode='a',)
actions = ["浏览页面", "评论商品", "加入收藏", "加入购物车", "提交订单", "使用优惠券", "领取优惠券", "搜索", "查看订单", "付款", "清空购物车"]
while True:
time.sleep(random.randint(1, 5))
user_id = random.randint(1, 10000)
# 对生成的浮点数保留2位有效数字.
price = round(random.uniform(15000, 30000),2)
action = random.choice(actions)
svip = random.choice([0,1])
logging.info("DAU|{0}|{1}|{2} |{3}".format(user_id, action,svip,price))
EOF
nohup python generate_log.py /oldboyedu/logs/apps.log &
使用mutate插件分析文本数据:
cat > config/14-apps-mutate-es.conf <
beats {
port => 9999
}
}
filter {
mutate {
remove_field => [
"ecs",
"tags",
"input",
"agent",
"@version",
"log",
"host"
]
}
mutate {
# 对指定的字段按照指定的分隔符进行切割
split => { "message" => "|" }
add_field => {
user_id => "%{[message][1]}"
verb => "%{[message][2]}"
svip => "%{[message][3]}"
price => "%{[message][4]}"
}
}
mutate {
# 更改字段的名称,此处将"verb"更改为"action"
rename => { "verb" => "action" }
# 将svip字段两边的空格剔除
strip => ["svip"]
remove_field => ["message"]
}
mutate {
# 将字段的类型进行转换
convert => {
"user_id" => "integer"
"svip" => "boolean"
"price" => "float"
}
}
}
output {
stdout {}
elasticsearch {
hosts => ["10.0.0.101:9200","10.0.0.102:9200","10.0.0.103:9200"]
index => "oldboyedu-linux82-apps"
}
}
EOF
其他filter插件:
https://www.elastic.co/guide/en/logstash/7.17/filter-plugins.html
今日内容回顾:
(1)logstash的多实例;
(2)logstash的pipline;
(3)使用logstash跨集群迁移;
(4)filter常用插件:
- grok:
基于正则提取任意文本内容。
- geoip:
根据客户端的地址解析经纬度坐标,城市名称,所在城市等等。
- date:
处理日志相关的资源。
- userAgent:
分析客户端设备类型。
- mutate
处理事件,比如更名字段,转换数据,切割数据等等。
其他:
- sleep
- json
- drop