1、搜索引擎简介:
索引组件:获取数据-->建立文档-->文档分析-->文档索引(倒排索引)
搜索组件:用户搜索接口-->建立查询(将用户键入的信息转换为可处理的查询对象)-->搜索查询-->展现结果
索引组件:Lucene
搜索组件:Solr, ElasticSearch
ElasticSearch:
是一种分布式的、可重复使用的搜索和分析引擎,能够解决越来越多
的用例。作为弹性堆栈的核心,它集中地存储你的数据,这样你就可
以发现预期和发现意外情况。
Logstash:
Logstash是一个开源的服务器端数据处理管道,它同时从多个源获取
数据,对其进行转换,然后将其发送到您最喜欢的“stash”(当然,我
们的是Elasticsearch)。
Beats:
搜集和分析日志的工具,比logstash消耗更少的资源
Beats一共分为六种:
Filebeat:主要用于收集日志数据。
Metricbeat:进行指标采集,主要用于监控系统和软件的性能。
Packetbeat:过网络抓包、协议分析,对一些请求响应式的系统通信进行监控和数据收集,可以收集到很多常规方式无法收集到的信息。
Winlogbeat:针对windows日志搜集
Heartbeat:系统间连通性检测,比如 icmp, tcp, http 等系统的连通性监控。
Kibana:
把搜索得到的结果可视化
2.安装
主机版本centos7,ELK6版本的安装包,ELK、Beats版本最好一致
主机:
三台机器都安装Elasticsearch,创建集群
10.10.10.1 Elasticsearch(master),Kibana
10.10.10.2 Elasticsearch(data-node1),logstash
10.10.10.3 Elasticsearch(data-node2),Beats
3、ElasticSearch的程序环境:
安装:
rpm -ivh elasticsearch-x.x.x.prm
配置文件:
/etc/elasticsearch/elasticsearch.yml 主配置文件
/etc/elasticsearch/jvm.options jvm参数配置文件
/etc/elasticsearch/log4j2.properties 日志配置文件
程序文件:
/usr/share/elasticsearch/bin/elasticsearch
/usr/share/elasticsearch/bin/elasticsearch-keystore:
端口:
搜索服务:9200/tcp
集群服务:9300/tcp
修改配置文件/etc/elasticsearch/elasticsearch.yml:
cluster.name: myelk #集群的名字,必须一致
node.name: mater-node #这个节点的名字
path.data: /data/els/data # 数据存储位置
path.logs: /data/els/logs #日志存储位置
network.host: 0.0.0.0 #监听地址,0.0.0.0监听所有
http.port: 9200 #对外开放的端口
discovery.zen.ping.unicast.hosts: ["node1", "node2", "node3"] #主节点的初始列表,当主节点启动时会探测其他节点
启动elasticsearch:
systemctl start elasticsearch
查看进程和端口:
ps -aux|grep elasticsearch
netstat -lntp
RESTful API:
curl -X '://:/?' -d ''
:json格式的请求主体;
:GET,POST,PUT,DELETE
:/index_name/type/Document_ID/
特殊PATH:/_cat, /_search, /_cluster
/_search:搜索所有的索引和类型;
/INDEX_NAME/_search:搜索指定的单个索引;
/INDEX1,INDEX2/_search:搜索指定的多个索引;
/s*/_search:搜索所有以s开头的索引;
/INDEX_NAME/TYPE_NAME/_search:搜索指定的单个索引的指定类型;
curl -XGET 'http://10.1.0.67:9200/_cluster/health?pretty=true'
curl -XGET 'http://10.1.0.67:9200/_cluster/stats?pretty=true'
curl -XGET 'http://10.1.0.67:9200/_cat/nodes?pretty'
curl -XGET 'http://10.1.0.67:9200/_cat/health?pretty'
curl -XGET 'http://10.1.0.67:9200/_cat/indices?v'
6、logstash
安装logstash同上
logstash配置:
主配置文件时logstash.yml
path.data: /var/lib/logstash
http.host: "10.10.10.2"
http.port: 9600
path.logs: /var/log/logstash
处理具体日志文件,配置在/etc/logstash/conf.d目录下,并以.conf结尾
input { 输入
...
}
filter{ 过滤
...
}
output { 输出
...
}
简单示例配置:
示例1:
input {
stdin {}
}
output {
stdout {
codec => rubydebug 显示在当前屏幕上
}
}
示例2:从文件输入数据,经grok过滤器插件过滤之后输出至标准输出:
input {
file {
path => ["/var/log/httpd/access_log"]
start_position => "beginning" 从头开始读取
}
}
filter {
grok {
match => {
"message" => "%{COMBINEDAPACHELOG}"
}
remove_field: "message"
}
}
output {
stdout {
codec => rubydebug
}
}
示例3:date filter插件示例:
filter {
grok {
match => { "message" => "%{HTTPD_COMBINEDLOG}"
}
remove_field => "message"
}
date {
match => ["timestamp","dd/MMM/YYYY:H:m:s Z"]
remove_field => "timestamp"
}
}
示例4:mutate filter插件
filter {
grok {
match => {
"message" => "%{HTTPD_COMBINEDLOG}"
}
}
date {
match => ["timestamp","dd/MMM/YYYY:H:m:s Z"]
}
mutate {
rename => {
"agent" => "user_agent"
}
}
}
示例5:geoip插件
filter {
grok {
match => { "message" => "%{HTTPD_COMBINEDLOG}"
}
}
date {
match => ["timestamp","dd/MMM/YYYY:H:m:s Z"]
}
mutate {
rename => { "agent" => "user_agent"
}
}
geoip {
source => "clientip"
target => "geoip"
database => "/etc/logstash/maxmind/GeoLite2-City.mmdb"
}
}
示例6:使用Redis
(1) 从redis加载数据
input {
redis {
batch_count => 1
data_type => "list"
key => "logstash-list"
host => "192.168.0.2"
port => 6379
threads => 5
}
}
(2) 将数据存入redis
output {
redis {
data_type => "channel"
key => "logstash-%{+yyyy.MM.dd}"
}
}
示例7:将数据写入els cluster
output {
elasticsearch {
hosts => ["http://node1:9200/","http://node2:9200/","http://node3:9200/"]
user => "ec18487808b6908009d3"
password => "efcec6a1e0"
index => "logstash-%{+YYYY.MM.dd}"
document_type => "apache_logs"
}
}
示例8:综合示例,启用geoip
input {
beats {
port => 5044
}
}
filter {
grok {
match => {
"message" => "%{COMBINEDAPACHELOG}"
}
remove_field => "message"
}
geoip {
source => "clientip"
target => "geoip"
database => "/etc/logstash/GeoLite2-City.mmdb"
}
}
output {
elasticsearch {
hosts => ["http://172.16.0.67:9200","http://172.16.0.68:9200","http://172.16.0.69:9200"]
index => "logstash-%{+YYYY.MM.dd}"
action => "index"
document_type => "apache_logs"
}
}
grok:
%{SYNTAX:SEMANTIC}
SYNTAX:预定义的模式名称;
SEMANTIC:给模式匹配到的文本所定义的键名;
1.2.3.4 GET /logo.jpg 203 0.12
%{IP:clientip} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}
{ clientip: 1.2.3.4, method: GET, request: /logo.jpg, bytes: 203, duration: 0.12}
%{IPORHOST:client_ip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:http_version})?|-)" %{HOST:domain} %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{QS:referrer} %{QS:agent} "(%{WORD:x_forword}|-)" (%{URIHOST:upstream_host}|-) %{NUMBER:upstream_response} (%{WORD:upstream_cache_status}|-) %{QS:upstream_content_type} (%{BASE16FLOAT:upstream_response_time}) > (%{BASE16FLOAT:request_time})
"message" => "%{IPORHOST:clientip} \[%{HTTPDATE:time}\] \"%{WORD:verb} %{URIPATHPARAM:request} HTTP/%{NUMBER:httpversion}\" %{NUMBER:http_status_code} %{NUMBER:bytes} \"(?\S+)\" \"(?\S+)\" \"(?\S+)\""
filter {
grok {
match => {
"message" => "%{IPORHOST:clientip} \[%{HTTPDATE:time}\] \"%{WORD:verb} %{URIPATHPARAM:request} HTTP/%{NUMBER:httpversion}\" %{NUMBER:http_status_code} %{NUMBER:bytes} \"(?\S+)\" \"(?\S+)\" \"(?\S+)\""
}
remote_field: message
}
}
nginx.remote.ip
[nginx][remote][ip]
filter {
grok {
match => { "message" => ["%{IPORHOST:[nginx][access][remote_ip]} - %{DATA:[nginx][access][user_name]} \[%{HTTPDATE:[nginx
][access][time]}\] \"%{WORD:[nginx][access][method]} %{DATA:[nginx][access][url]} HTTP/%{NUMBER:[nginx][access][http_version]}\
" %{NUMBER:[nginx][access][response_code]} %{NUMBER:[nginx][access][body_sent][bytes]} \"%{DATA:[nginx][access][referrer]}\" \"
%{DATA:[nginx][access][agent]}\""] }
remove_field => "message"
}
date {
match => [ "[nginx][access][time]", "dd/MMM/YYYY:H:m:s Z" ]
remove_field => "[nginx][access][time]"
}
useragent {
source => "[nginx][access][agent]"
target => "[nginx][access][user_agent]"
remove_field => "[nginx][access][agent]"
}
geoip {
source => "[nginx][access][remote_ip]"
target => "geoip"
database => "/etc/logstash/GeoLite2-City.mmdb"
}
}
output {
elasticsearch {
hosts => ["node1:9200","node2:9200","node3:9200"]
index => "logstash-ngxaccesslog-%{+YYYY.MM.dd}"
}
}
注意:
1、输出的日志文件名必须以“logstash-”开头,方可将geoip.location的type自动设定为"geo_point";
2、target => "geoip"
除了使用grok filter plugin实现日志输出json化之外,还可以直接配置服务输出为json格式;
示例:使用grok结构化nginx访问日志
filter {
grok {
match => {
"message" => "%{HTTPD_COMBINEDLOG} \"%{DATA:realclient}\""
}
remove_field => "message"
}
date {
match => ["timestamp","dd/MMM/YYYY:H:m:s Z"]
remove_field => "timestamp"
}
}
示例:使用grok结构化tomcat访问日志
filter {
grok {
match => {
"message" => "%{HTTPD_COMMONLOG}"
}
remove_field => "message"
}
date {
match => ["timestamp","dd/MMM/YYYY:H:m:s Z"]
remove_field => "timestamp"
}
8、 Nginx日志Json化:
log_format json '{"@timestamp":"$time_iso8601",'
'"@source":"$server_addr",'
'"@nginx_fields":{'
'"client":"$remote_addr",'
'"size":$body_bytes_sent,'
'"responsetime":"$request_time",'
'"upstreamtime":"$upstream_response_time",'
'"upstreamaddr":"$upstream_addr",'
'"request_method":"$request_method",'
'"domain":"$host",'
'"url":"$uri",'
'"http_user_agent":"$http_user_agent",'
'"status":$status,'
'"x_forwarded_for":"$http_x_forwarded_for"'
'}'
'}';
access_log logs/access.log json;
Conditionals
Sometimes you only want to filter or output an event under certain conditions. For that, you can use a conditional.
Conditionals in Logstash look and act the same way they do in programming languages. Conditionals support if, else if and else statements and can be nested.
The conditional syntax is:
if EXPRESSION {
...
} else if EXPRESSION {
...
} else {
...
}
What’s an expression? Comparison tests, boolean logic, and so on!
You can use the following comparison operators:
equality: ==, !=, <, >, <=, >=
regexp: =~, !~ (checks a pattern on the right against a string value on the left) inclusion: in, not in
The supported boolean operators are:
and, or, nand, xor
The supported unary operators are:
!
Expressions can be long and complex. Expressions can contain other expressions, you can negate expressions with !, and you can group them with parentheses (...).
filter {
if [type] == 'tomcat-accesslog' {
grok {}
}
if [type] == 'httpd-accesslog' {
grok {}
}
}