ELK 日志监控技术与实践
监控分类
- Metrics 用于记录可聚合的数据。例如,1、队列的当前深度可被定义为一个度量值,在元素入队或出队时被更新;HTTP 请求个数可被定义为一个计数器,新请求到来时进行累。2、列如获取当前CPU或者内存的值。 prometheus专注于Metrics领域。
- Logging 用于记录离散的事件。例如,应用程序的调试信息或错误信息。它是我们诊断问题的依据。比如我们说的ELK就是基于Logging。
- Tracing - 用于记录请求范围内的信息。例如,一次远程方法调用的执行过程和耗时。它是我们排查系统性能问题的利器。最常用的有Skywalking,ping-point,zipkin。
ELK监控架构
官方网站: https://www.elastic.co/
实践
实验环境
服务器 | 服务器IP | 服务器环境 | 安装软件 |
---|---|---|---|
Server01 | 10.0.0.101 | Ubuntu 18.04.5 Server | 测试服务、FileBeat |
Server02 | 10.0.0.102 | Ubuntu 18.04.5 Server | Logstash、Elasticsearch |
Server03 | 10.0.0.103 | Ubuntu 18.04.5 Server | elasticsearch-head、Kibana、elastalert 报警 |
安装与配置
Server 01
应用服务
- 日志包文件maven 日志 依赖
ch.qos.logback
logback-core
1.2.3
ch.qos.logback
logback-classic
1.2.3>
net.logstash.logback
logstash-logback-encoder
4.11>
- logback.xml 配置文件
${logPattern4Context}
UTF-8
${logPath4Context}/error.log
${logPath4Context}/archive/error.log.%d{yyyyMMdd}.gz
7
ERROR
${logPattern4Context}
UTF-8
${logPath4Context}/info.log
${logPath4Context}/archive/info.log.%d{yyyyMMdd}.gz
7
${logPattern4Context}
UTF-8
${logPath4Context}/time.log
${logPath4Context}/archive/time.log.%d{yyyyMMdd}.gz
7
${logPattern4Context}
UTF-8
${logPath4Context}/controller.log
${logPath4Context}/archive/controller.log.%d{yyyyMMdd}.gz
7
${logPattern4Context}
UTF-8
${logPath4Context}/logstash.log
${logPath4Context}/archive/logstash.log.%d{yyyyMMddHH}.gz
72
false
{"app_name":"${loggerAppName}"}
${logPath4Context}/httpclient.log
${logPath4Context}/archive/httpclient.log.%d{yyyyMMdd}.gz
7
${logPattern4Context}
UTF-8
${logPath4Context}/jpush.log
${logPath4Context}/archive/jpush.log.%d{yyyyMMdd}.gz
7
${logPattern4Context}
UTF-8
${logPath4Context}/alert.log
${logPath4Context}/archive/alert.log.%d{yyyyMMdd}.gz
7
${logPattern4Context}
UTF-8
${logPath4Context}/app_common.log
${logPath4Context}/archive/app_common.log.%d{yyyyMMdd}.gz
7
${logPattern4Context}
UTF-8
${logPath4Context}/all.log
${logPath4Context}/archive/all.log.%d{yyyyMMdd}.gz
7
${logPattern4Context}
UTF-8
- 启动服务
java -classpath /home/wl/demo-cloud/config:/home/wl/demo-cloud/core/*:/home/wl/demo-cloud/lib/* -server -Xms128m -Xmx128m -XX:+DisableExplicitGC -verbose:gc -Xloggc:/home/wl/demo-cloud/gc.log com.wxhtech.crescent.Application >> /home/wl/demo-cloud/start.log &
- logstash.log 日志内容
{"@timestamp":"2021-08-21T12:47:00.448+00:00","@version":1,"message":"Refreshing org.springframework.context.annotation.AnnotationConfigApplicationContext@6440112d: startup date [Sat Aug 21 12:47:00 UTC 2021]; root of context hierarchy","logger_name":"org.springframework.context.annotation.AnnotationConfigApplicationContext","thread_name":"main","level":"INFO","level_value":20000,"app_name":"demo-cloud"}
FileBeat
-
下载安装
Beat 类型 https://www.elastic.co/cn/beats/
Beat 类型: fileBeat、MetricBeat、PacketBeat...
FileBeat地址 https://www.elastic.co/cn/downloads/beats/filebeat#ga-release
tar -zxvf filebeat-7.9.3-linux-x86_64.tar.gz
- 配置filebeat.yml
- type: log
# Change to true to enable this input configuration.
enabled: true
# Paths that should be crawled and fetched. Glob based paths.
paths:
- /var/log/nginx/*access.log
fields:
log_source: nginx-access
- type: log
# Change to true to enable this input configuration.
enabled:true
# Paths that should be crawled and fetched. Glob based paths.
paths:
- /home/wl/demo-cloud/logs/demo-cloud/logstash.log
fields:
log_source: demo-cloud
output.logstash:
# The Logstash hosts
hosts: ["10.0.0.102:5044"]
# Optional SSL. By default is off.
# List of root certificates for HTTPS server verifications
#ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]
# Certificate for SSL client authentication
#ssl.certificate: "/etc/pki/client/cert.pem"
# Client Certificate Key
#ssl.key: "/etc/pki/client/cert.key"
- 启动
nohup /usr/local/filebeat/filebeat -e -c /usr/local/filebeat/filebeat.yml >/dev/null 2>&1 &
Server 02
Elasticsearch安装与配置
-
安装
介绍地址:https://www.elastic.co/cn/elasticsearch/
下载地址: https://www.elastic.co/cn/downloads/elasticsearch
#解压 tar -xzvf elasticsearch-7.9.3-linux-x86_64.tar.gz # elasticsearch不允许 root 用户启动,需要修改权限 chown -R wl elasticsearch # 修改 JDK配置 编辑 /usr/local/elasticsearch/bin/elasticsearch 顶部增加 配置 export JAVA_HOME=/usr/local/jdk11/ export PATH=$JAVA_HOME/bin:$PATH
-
配置 /usr/local/elasticsearch/config/elasticsearch.yml
node.name: node-1 network.host: 10.0.0.102 http.port: 9200 cluster.initial_master_nodes: ["node-1"] xpack.ml.enabled: false http.cors.enabled: true http.cors.allow-origin: "*"
-
配置 config/jvm.options
8-13:-XX:+UseConcMarkSweepGC 修改为 8-13:-XX:+UseG1GC
-
配置 /etc/sysctl.conf 文件结尾加入如下配置
vm.max_map_count=655360
执行生效命令
sysctl -p
-
配置 /etc/security/limits.conf
* hard nofile 65536 * soft nofile 65536
-
启动
./bin/elasticsearch #后台启动 nohup /usr/local/elasticsearch/bin/elasticsearch >/dev/null 2>&1 &
访问测试 http://10.0.0.102:9200
-
定期清理索引 delete_elk_indexs.sh
RETVAL=0 Port=9200 Ip=10.0.0.102 #Time1=$(date +%Y.%m.%d) #Time2=$(date +%Y.%m.%d -d "-7day") function delete_indices() { echo "delete index pre $1" comp_date=`date -d "6 day ago" +"%Y-%m-%d"` indexDate=$(echo $1 | sed "s/.*-\([0-9]*\.[0-9]*\.[0-9]*\).*/\1/g" | sed "s/\./-/g") date1="$indexDate 00:00:00" date2="$comp_date 00:00:00" t1=`date -d "$date1" +%s` t2=`date -d "$date2" +%s` #echo "t1:$t1" #echo "t2:$t2" #exit if [ $t1 -le $t2 ] then echo "删除{$1} 索引" curl -XDELETE http://$Ip:$Port/$1 fi } curl -s -XGET "http://$Ip:$Port/_cat/indices/?v"| grep -Ev "index|.kibana" |awk -F " " '{print $3}'| egrep "[0-9]*\.[0-9]*\.[0-9]*" | sort | while read LINE do delete_indices $LINE
Logstash 安装与配置
-
下载与安装
介绍:https://www.elastic.co/cn/logstash/
下载地址:https://www.elastic.co/cn/downloads/logstash
# 解压安装 tar -zxvf logstash-7.9.3.tar.gz
-
配置 /usr/local/logstash/config/logstash.conf
input { beats { port=>5044 codec=> "json" } } output { if "nginx-access" in [fields][log_source] { elasticsearch{ hosts=>["10.0.0.102:9200"] index=> "nginx-access-log-%{+YYYY.MM.dd}" } } if "demo-cloud" in [fields][log_source] { elasticsearch{ hosts=>["10.0.0.102:9200"] index=> "demo-cloud-log-%{+YYYY.MM.dd}" } } stdout { codec=>rubydebug } }
-
启动
/usr/local/logstash/bin/logstash -f /usr/local/logstash/config/logstash.conf
Server 03
Elasticsearch-head 安装与配置
-
Docker 方式安装
docker pull mobz/elasticsearch-head:5 docker run -d --name es_admin -p 9100:9100 --restart=always mobz/elasticsearch-head:5
-
修改源码,解决以下错误
使用的 elasticsearch-head docker 可以连接 es 但是查看索引信息返回
{“error”:“Content-Type header [application/x-www-form-urlencoded] is not supported”,“status”:406}
```
docker exec -it 编号 bash
cd /usr/src/app/_site
#需要对vendor.js 进行修改,发现没有 vim
#需要对 vim 进行安装,可以执行以下命令:
apt-get update
apt-get install vim
vim vendor.js
/application
#将搜索到的
contentType: "application/x-www-form-urlencoded"
#改为
contentType: "application/json"
```
- 访问 http://10.0.0.103:9100
Kibana 安装与配置
-
下载安装
介绍页: https://www.elastic.co/cn/kibana/
下载地址: https://www.elastic.co/cn/downloads/kibana
tar -zxvf kibana-7.9.3-linux-x86_64.tar.gz
-
配置 config/kibana.yml
# Kibana is served by a back end server. This setting specifies the port to use. server.port: 5601 # Specifies the address to which the Kibana server will bind. IP addresses and host names are both valid values. # The default is 'localhost', which usually means remote machines will not be able to connect. # To allow connections from remote users, set this parameter to a non-loopback address. server.host: "0.0.0.0" # Enables you to specify a path to mount Kibana at if you are running behind a proxy. # Use the `server.rewriteBasePath` setting to tell Kibana if it should remove the basePath # from requests it receives, and to prevent a deprecation warning at startup. # This setting cannot end in a slash. #server.basePath: "" # Specifies whether Kibana should rewrite requests that are prefixed with # `server.basePath` or require that they are rewritten by your reverse proxy. # This setting was effectively always `false` before Kibana 6.3 and will # default to `true` starting in Kibana 7.0. #server.rewriteBasePath: false # The maximum payload size in bytes for incoming server requests. #server.maxPayloadBytes: 1048576 # The Kibana server's name. This is used for display purposes. #server.name: "your-hostname" # The URLs of the Elasticsearch instances to use for all your queries. elasticsearch.hosts: ["http://10.0.0.102:9200"] # When this setting's value is true Kibana uses the hostname specified in the server.host # setting. When the value of this setting is false, Kibana uses the hostname of the host # that connects to this Kibana instance. #elasticsearch.preserveHost: true # Kibana uses an index in Elasticsearch to store saved searches, visualizations and # dashboards. Kibana creates a new index if the index doesn't already exist. kibana.index: ".kibana" # The default application to load. #kibana.defaultAppId: "home" # If your Elasticsearch is protected with basic authentication, these settings provide # the username and password that the Kibana server uses to perform maintenance on the Kibana # index at startup. Your Kibana users still need to authenticate with Elasticsearch, which # is proxied through the Kibana server. #elasticsearch.username: "kibana_system" #elasticsearch.password: "pass" # Enables SSL and paths to the PEM-format SSL certificate and SSL key files, respectively. # These settings enable SSL for outgoing requests from the Kibana server to the browser. #server.ssl.enabled: false #server.ssl.certificate: /path/to/your/server.crt #server.ssl.key: /path/to/your/server.key # Optional settings that provide the paths to the PEM-format SSL certificate and key files. # These files are used to verify the identity of Kibana to Elasticsearch and are required when # xpack.security.http.ssl.client_authentication in Elasticsearch is set to required. #elasticsearch.ssl.certificate: /path/to/your/client.crt #elasticsearch.ssl.key: /path/to/your/client.key # Optional setting that enables you to specify a path to the PEM file for the certificate # authority for your Elasticsearch instance. #elasticsearch.ssl.certificateAuthorities: [ "/path/to/your/CA.pem" ] # To disregard the validity of SSL certificates, change this setting's value to 'none'. #elasticsearch.ssl.verificationMode: full # Time in milliseconds to wait for Elasticsearch to respond to pings. Defaults to the value of # the elasticsearch.requestTimeout setting. #elasticsearch.pingTimeout: 1500 # Time in milliseconds to wait for responses from the back end or Elasticsearch. This value # must be a positive integer. #elasticsearch.requestTimeout: 30000 # List of Kibana client-side headers to send to Elasticsearch. To send *no* client-side # headers, set this value to [] (an empty list). #elasticsearch.requestHeadersWhitelist: [ authorization ] # Header names and values that are sent to Elasticsearch. Any custom headers cannot be overwritten # by client-side headers, regardless of the elasticsearch.requestHeadersWhitelist configuration. #elasticsearch.customHeaders: {} # Time in milliseconds for Elasticsearch to wait for responses from shards. Set to 0 to disable. #elasticsearch.shardTimeout: 30000 # Time in milliseconds to wait for Elasticsearch at Kibana startup before retrying. #elasticsearch.startupTimeout: 5000 # Logs queries sent to Elasticsearch. Requires logging.verbose set to true. #elasticsearch.logQueries: false # Specifies the path where Kibana creates the process ID file. #pid.file: /var/run/kibana.pid # Enables you to specify a file where Kibana stores log output. #logging.dest: stdout # Set the value of this setting to true to suppress all logging output. #logging.silent: false # Set the value of this setting to true to suppress all logging output other than error messages. #logging.quiet: false # Set the value of this setting to true to log all events, including system usage information # and all requests. #logging.verbose: false # Set the interval in milliseconds to sample system and process performance # metrics. Minimum is 100ms. Defaults to 5000. #ops.interval: 5000 # Specifies locale to be used for all localizable strings, dates and number formats. # Supported languages are the following: English - en , by default , Chinese - zh-CN . #i18n.locale: "en" i18n.locale: "en" xpack.reporting.capture.browser.chromium.disableSandbox: true xpack.reporting.capture.browser.chromium.proxy.enabled: false xpack.reporting.enabled: false
-
启动
./bin/kibana --allow-root
- 访问 http://10.0.0.103:5601
异常报警
elastalert
- 安装
地址 : https://github.com/Yelp/elastalert.git
文档地址:https://elastalert.readthedocs.io/en/latest/index.html
-
Dokcer 制作
Docker File
FROM ubuntu:18.04 RUN rm /etc/apt/sources.list COPY sources.list /etc/apt/sources.list RUN apt-get update && apt-get -y install build-essential python3.6 python3.6-dev python3-pip libssl-dev git locales RUN locale-gen en_US.UTF-8 ENV LANG en_US.UTF-8 ENV LANGUAGE en_US:en ENV LC_ALL en_US.UTF-8 ENV TZ Etc/UTC WORKDIR /home/elastalert ADD requirements*.txt ./ RUN pip3 install -r requirements.txt -i https://pypi.douban.com/simple/ RUN pip3 install "setuptools>=11.3" -i https://pypi.douban.com/simple/ ADD setup.py ./ ADD elastalert ./elastalert RUN python3 setup.py install #RUN elastalert-create-index #RUN echo "Asia/Shanghai" > /etc/timezone CMD ["sh", "-c", "elastalert-create-index && elastalert --verbos"]
-
docker-compose
version: '2' services: elastalert: build: context: . restart: always container_name: elastalert working_dir: /home/elastalert volumes: - ./cloth_alert:/home/elastalert/
-
config.yaml 配置
# This is the folder that contains the rule yaml files # Any .yaml file will be loaded as a rule rules_folder: ./rules # How often ElastAlert will query Elasticsearch # The unit can be anything from weeks to seconds run_every: #minutes: 1 seconds: 10 # ElastAlert will buffer results from the most recent # period of time, in case some log sources are not in real time buffer_time: minutes: 1 # The Elasticsearch hostname for metadata writeback # Note that every rule can have its own Elasticsearch host es_host: 10.0.0.102 # The Elasticsearch port es_port: 9200 # The AWS region to use. Set this when using AWS-managed elasticsearch #aws_region: us-east-1 # The AWS profile to use. Use this if you are using an aws-cli profile. # See http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html # for details #profile: test # Optional URL prefix for Elasticsearch #es_url_prefix: elasticsearch # Connect with TLS to Elasticsearch #use_ssl: True # Verify TLS certificates #verify_certs: True # GET request with body is the default option for Elasticsearch. # If it fails for some reason, you can pass 'GET', 'POST' or 'source'. # See http://elasticsearch-py.readthedocs.io/en/master/connection.html?highlight=send_get_body_as#transport # for details #es_send_get_body_as: GET # Option basic-auth username and password for Elasticsearch #es_username: someusername #es_password: somepassword # Use SSL authentication with client certificates client_cert must be # a pem file containing both cert and key for client #verify_certs: True #ca_certs: /path/to/cacert.pem #client_cert: /path/to/client_cert.pem #client_key: /path/to/client_key.key # The index on es_host which is used for metadata storage # This can be a unmapped index, but it is recommended that you run # elastalert-create-index to set a mapping writeback_index: elastalert_status writeback_alias: elastalert_alerts # If an alert fails for some reason, ElastAlert will retry # sending the alert until this time period has elapsed alert_time_limit: days: 10 # Custom logging configuration # If you want to setup your own logging configuration to log into # files as well or to Logstash and/or modify log levels, use # the configuration below and adjust to your needs. # Note: if you run ElastAlert with --verbose/--debug, the log level of # the "elastalert" logger is changed to INFO, if not already INFO/DEBUG. #logging: # version: 1 # incremental: false # disable_existing_loggers: false # formatters: # logline: # format: '%(asctime)s %(levelname)+8s %(name)+20s %(message)s' # # handlers: # console: # class: logging.StreamHandler # formatter: logline # level: DEBUG # stream: ext://sys.stderr # # file: # class : logging.FileHandler # formatter: logline # level: DEBUG # filename: elastalert.log # # loggers: # elastalert: # level: WARN # handlers: [] # propagate: true # # elasticsearch: # level: WARN # handlers: [] # propagate: true # # elasticsearch.trace: # level: WARN # handlers: [] # propagate: true # # '': # root logger # level: WARN # handlers: # - console # - file # propagate: false
-
rule 配置
# Alert when the rate of events exceeds a threshold # (Optional) # Elasticsearch host # es_host: elasticsearch.example.com # (Optional) # Elasticsearch port # es_port: 14900 # (OptionaL) Connect with SSL to Elasticsearch #use_ssl: True # (Optional) basic-auth username and password for Elasticsearch #es_username: someusername #es_password: somepassword # (Required) # Rule name, must be unique name: 服务报警 # (Required) # Type of alert. # the frequency rule type alerts when num_events events occur with timeframe time type: frequency # (Required) # Index to search, wildcard supported index: demo-cloud-log-* # (Required, frequency specific) # Alert when this many documents matching the query occur within a timeframe num_events: 3 # (Required, frequency specific) # num_events must occur within this amount of time to trigger an alert timeframe: minutes: 1 # (Required) # A list of Elasticsearch filters used for find events # These filters are joined with AND and nested in a filtered query # For more info: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl.html filter: #- term: # level: "ERROR" - query: query_string: query: "level: ERROR" alert_text: " 云端服务报警\n 服务:{}\n 服务器:测试服务器\n 出现次数: {}\n 错误信息: {}\n " alert_text_type: alert_text_only alert_text_args: - app_name - num_hits - message # (Required) # The alert is use when a match is found #alert: #- "dingtalk" #- "qywechat" #- "email" # (required, email specific) # a list of email addresses to send alerts to alert: - "elastalert_modules.qywechat_alert.QyWechatAlerter" qywechat_webhook: "https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=a42a0c33-410c-4600-a7d0-1e94bd6d0004" qywechat_msgtype: "text"
-
Type类型
不同的类型还有自己独特的配置选项。目前ElastAlert有以下几种自带ruletype:
-
any
:只要有匹配就报警; -
blacklist
:compare_key
字段的内容匹配上blacklist
数组里任意内容; -
whitelist
:compare_key
字段的内容一个都没能匹配上whitelist
数组里内容; -
change
:在相同query_key
条件下,compare_key
字段的内容,在timeframe
范围内发送变化; -
frequency
:在相同query_key
条件下,timeframe
范围内有num_events
个被过滤出来的异常; -
spike
:相同在query_key
条件下,两个前后timeframe
范围内数据量相差比例超过spike_height
。可以其中通过spike_type
设置具体涨跌方向的英文up
,down
,both
。可以还通过threshold_ref
设置要求上一个周期数据量的下限,threshold_cur
设置要求当前周期数据量的下限,如果数据量不到下限,也不触发; -
flatline
:timeframe
范围内,量数据小于threshold
阈值; -
new_term
:fields
字段新出现之前terms_window_size
(默认30天)范围内最多的terms_size
(默认50)个结果以外的数据; -
cardinality
:在相同query_key
条件下,timeframe
范围内cardinality_field
的值超过max_cardinality
或者低于min_cardinality
。
-
-
启动
docker-compose up -d
模拟测试报警通知
http://10.0.0.101:20001/test/error