- 搜索引擎介绍
- Elasticsearch的使用
- Logstash的使用
- Filebeat的使用
- Kibana的使用
- Elastic Stack综合应用实例
一、搜索引擎介绍
(一)搜索引擎的主要组成:
索引组件:获取数据-->建立文档-->文档分析-->文档索引(倒排索引)
搜索组件:用户搜索接口-->建立查询(将用户键入的信息转换为可处理的查询对象)-->搜索查询-->展现结果
(二)搜索引擎主流开源软件
索引组件:Lucene, Solr, Elasticsearch
Lucene:提供构建索引功能的类库
Solr:在Lucene基础上开发的完整索引组件
Elasticsearch:分布式索引组件,同样在Lucene基础上开发搜索组件:Kibana
(三)Elastic Stack的组成
Elastic Stack包含一系列工具,目前主要使用:Elasticsearch, Logstash, Beats, Kibana
Elasticsearch:Elastic Stack的核心工具,索引组件
Logstash:数据的抽取、处理、输出,非常占用资源,目前数据的抽取过程已经被Beats代替
Beats:轻量级的数据采集平台,有诸多子工具:Filebeat, Metricbeat, Packetbeat, Winlogbeat, Heartbeat
Kibana:搜索组件,提供可视化的界面接收搜索命令并展示搜索结果
二、Elasticsearch的使用
(一)ES的核心组件
Elasticsearch集群:通过分片(Shard)机制实现分布式存储
集群状态:
green:主分片、副本分片都存在
yellow:分片丢失,但每个分片至少存在一个主分片或副本分片
red:分片丢失,主分片和副本分片都丢失Lucene的核心组件:
索引(Index):类比数据库(database)
类型(Type):类比表(table)
文档(Document):类比行(row)-
ElasticSearch 5的程序环境:
配置文件:
/etc/elasticsearch/elasticsearch.yml:配置文件
/etc/elasticsearch/jvm.options:jvm配置文件
/etc/elasticsearch/log4j2.properties:日志配置文件Unit File:elasticsearch.service
程序文件:
/usr/share/elasticsearch/bin/elasticsearch
/usr/share/elasticsearch/bin/elasticsearch-keystore:
/usr/share/elasticsearch/bin/elasticsearch-plugin:管理插件程序搜索服务:9200/tcp
集群服务:9300/tcp
els集群的工作逻辑:
多播、单播:9300/tcp
关键因素:clustername
所有节点选举一个主节点,负责管理整个集群的状态(green/yellow/red),以及各shards的分布方式
(二)RESTful API
Elasticsearch提供了RESTful的API接口,可以通过http协议与其进行交互
-
语法:
curl -X
' :// : / ? ' -d '' -
:json格式的请求主体
-
:GET(获取),POST(修改),PUT(创建),DELETE(删除) :/index_name/type/Document_ID/ - 特殊PATH:/_cat, /_search, /_cluster
- 创建文档:-XPUT -d '{"key1": "value1", "key2": value, ...}'
- /_search:搜索所有的索引和类型;
- /INDEX_NAME/_search:搜索指定的单个索引;
- /INDEX1,INDEX2/_search:搜索指定的多个索引;
- /s*/_search:搜索所有以s开头的索引;
- /INDEX_NAME/TYPE_NAME/_search:搜索指定的单个索引的指定类型;
-
-
使用举例:
curl -XGET 'http://192.168.136.230:9200/_cluster/health?pretty=true' curl -XGET 'http://192.168.136.230:9200/_cluster/stats?pretty=true' curl -XGET 'http://192.168.136.230:9200/_cat/nodes?pretty' curl -XGET 'http://192.168.136.230:9200/_cat/health?pretty'
(三)实验一:配置、管理Elasticsearch Cluster
实验环境:
三台节点:node0.hellopeiyang.com, node1.hellopeiyang.com, node3.hellopeiyang.com-
步骤1:准备工作
ntpdate 172.18.0.1 // 同步时间 vim /etc/hosts // 集群必须能够互相解析主机名,本实验采用hosts文件解决 192.168.136.230 node0 node0.hellopeiyang.com 192.168.136.130 node1 node1.hellopeiyang.com 192.168.136.132 node3 node3.hellopeiyang.com
-
步骤2:安装并启动Elasticsearch服务
yum install java-1.8.0-openjdk rpm -ivh elasticsearch-5.5.3.rpm mkdir /data/els/{logs,data} -pv chown -R elasticsearch:elasticsearch /data/els/ vim /etc/elasticsearch/elasticsearch.yaml cluster.name: myels // 集群名,每个节点相同 node.name: node0 // 节点名,每个节点不同 path.data: /data/els/data // 数据存储目录 path.logs: /data/els/logs // 日志存储目录 network.host: 192.168.136.230 // 监听IP http.port: 9200 // 监听端口 // 每个节点相同,包含所有节点的主机名 discovery.zen.ping.unicast.hosts: ["node0", "node1", "node2"] // 决定主节点归属所需的最少节点数 discovery.zen.minimum_master_nodes: 2 vim /etc/elasticsearch/jvm.options // elasticsearch占用内存较严重,一般将内存使用调大些 -Xms1g // java栈初始化占用内存 -Xmx1g // java栈最多占用内存 systemctl start elasticsearch.service
-
步骤3:测试Elasticsearch节点和集群是否正常运行
// 测试节点是否正常运行 curl -XGET 'http://192.168.136.230:9200/' // 测试集群是否正常运行 curl -XGET 'http://192.168.136.230:9200/_cat/nodes?pretty'
-
步骤4:添加、删除、查询数据至Elasticsearch集群
// 在索引books, 类型IT中建立文档1,2和3 curl -XPUT 'http://192.168.136.230:9200/books/IT/1?pretty' -d '{ "name": "Elasticsearch in Action", "date": "Dec 3, 2015", "author": "Radu Gheorghe and Matthew Lee Hinman" }' curl -XPUT 'http://192.168.136.230:9200/books/IT/2?pretty' -d '{ "name": "Redis Essentials", "date": "Sep 8, 2015", "author": "Maxwell Dayvson Da Silva and Hugo Lopes Tavares" }' curl -XPUT 'http://192.168.136.230:9200/books/IT/3?pretty' -d '{ "name": "Puppet 4.10 Beginner's Guide", "date": "May 31, 2017", "author": "John Arundel" }' // 删除索引books, 类型IT中的文档3 curl -XDELETE 'http://192.168.136.230:9200/books/IT/3?pretty' // 查询索引books, 类型IT中包含elasticsearch关键词的文档 curl -XGET 'http://192.168.136.230:9200/books/IT/_search?q=elasticsearch&pretty'
-
步骤5:安装elasticsearch-head
elasticsearch-head:elasticsearch的插件,实现通过浏览器管理集群,托管在GitHub上yum install git npm -y git clone https://github.com/mobz/elasticsearch-head.git cd elasticsearch-head/ npm install // 修改节点的elasticsearch服务配置 vim /etc/elasticsearch/elasticsearch.yml // 添加如下两行,在head中才能成功连接节点 http.cors.enabled: true http.cors.allow-origin: "*" systemctl restart elasticsearch.service npm run start &
输入要连接的节点地址(红框中),即可看到节点所处集群的基本情况,每个节点信息中加粗黑框为主分片,非加粗黑框为副本分片
三、Logstash的使用
(一)Logstash的安装
安装java-jre环境:
yum install java-1.8.0-openjdk -y
下载并安装Logstash软件包:
rpm -ivh logstash-5.5.3.rpm
Logstash的安装路径:
配置文件目录:/etc/logstash/conf.d/
可执行程序目录:/usr/share/logstash/bin
(二)配置文件格式
input { // 设置数据来源,必须设置
...
}
filter{ // 设置数据的过滤操作,经常设置
...
}
output { // 设置数据的输出位置,必须设置
...
}
(三)实验二:Logstash的基础使用
-
实验2-1:从标准输入获取数据,处理后输出至标准输出
vim /etc/logstash/conf.d/test.conf input { stdin {} } output { stdout { codec => rubydebug } } /usr/share/logstash/bin/logstash -t -f /etc/logstash/conf.d/test.conf // 检查配置文件语法 /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/test.conf // 执行配置文件
-
实验2-2:从httpd的access日志中获取数据,使用grok插件过滤将每一条日志信息切分,并输出至标准输出
// 安装、配置httpd服务 yum install httpd echo "hello index file" => /var/www/html/index.html echo "hello test file" => /var/www/html/test.html systemctl start httpd // 编辑Logstash配置文件 vim /etc/logstash/conf.d/test.conf input { file { path => ["/var/log/httpd/access_log"] start_position => "beginning" } } filter { grok { match => { "message" => "%{HTTPD_COMBINEDLOG}" } remove_field => "message" } } output { stdout{ codec => rubydebug } } /usr/share/logstash/bin/logstash -t -f /etc/logstash/conf.d/test.conf /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/test.conf
-
实验2-3:使用date插件调整时间戳信息格式
vim /etc/logstash/conf.d/test.conf input { file { path => ["/var/log/httpd/access_log"] start_position => "beginning" } } filter { grok { match => { "message" => "%{HTTPD_COMBINEDLOG}" } remove_field => "message" } date { match => ["timestamp","dd/MMM/YYYY:H:m:s Z"] remove_field => "timestamp" } } output { stdout{ codec => rubydebug } } /usr/share/logstash/bin/logstash -t -f /etc/logstash/conf.d/test.conf /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/test.conf
-
实验2-4:使用mutate插件修改Key值,将"agent"改为"user_agent"
vim /etc/logstash/conf.d/test.conf input { file { path => ["/var/log/httpd/access_log"] start_position => "beginning" } } filter { grok { match => { "message" => "%{HTTPD_COMBINEDLOG}" } remove_field => "message" } date { match => ["timestamp","dd/MMM/YYYY:H:m:s Z"] remove_field => "timestamp" } mutate { rename => { "agent" => "user_agent" } } } output { stdout{ codec => rubydebug } } /usr/share/logstash/bin/logstash -t -f /etc/logstash/conf.d/test.conf /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/test.conf
-
实验2-5:使用geoip插件根据ip地址查询所在位置的经纬度
vim /etc/logstash/conf.d/test.conf input { file { path => ["/var/log/httpd/access_log"] start_position => "beginning" } } filter { grok { match => { "message" => "%{HTTPD_COMBINEDLOG}" } remove_field => "message" } date { match => ["timestamp","dd/MMM/YYYY:H:m:s Z"] remove_field => "timestamp" } mutate { rename => { "agent" => "user_agent" } } geoip { source => "clientip" target => geoip database => "/etc/logstash/maxmind/GeoLite2-City.mmdb" // GeoLite2-City.mmdb数据库从maxmind官网下载,存储IP与地理信息的映射 } } output { stdout{ codec => rubydebug } } /usr/share/logstash/bin/logstash -t -f /etc/logstash/conf.d/test.conf /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/test.conf
-
实验2-6:从redis数据库采集数据
yum install redis vim /etc/redis.conf bind 0.0.0.0 systemctl start redis redis-cli SET mylog 15.15.15.15 vim /etc/logstash/conf.d/test.conf input { redis { host => "192.168.136.230" port => "6379" key => "mylog" data_type => "list" } } filter { grok { match => { "message" => "%{HTTPD_COMBINEDLOG}" } remove_field => "message" } date { match => ["timestamp","dd/MMM/YYYY:H:m:s Z"] remove_field => "timestamp" } mutate { rename => { "agent" => "user_agent" } } geoip { source => "clientip" target => geoip database => "/etc/logstash/maxmind/GeoLite2-City.mmdb" } } output { stdout{ codec => rubydebug } } /usr/share/logstash/bin/logstash -t -f /etc/logstash/conf.d/test.conf /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/test.conf
-
实验2-7:Logstash的处理结果写入redis数据库
vim /etc/logstash/conf.d/test.conf input { file { path => ["/var/log/httpd/access_log"] start_position => "beginning" } } filter { grok { match => { "message" => "%{HTTPD_COMBINEDLOG}" } remove_field => "message" } date { match => ["timestamp","dd/MMM/YYYY:H:m:s Z"] remove_field => "timestamp" } mutate { rename => { "agent" => "user_agent" } } geoip { source => "clientip" target => geoip database => "/etc/logstash/maxmind/GeoLite2-City.mmdb" } } output { redis { data_type => "channel" key => "logstash-%{+yyyy.MM.dd}" } } /usr/share/logstash/bin/logstash -t -f /etc/logstash/conf.d/test.conf /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/test.conf
-
实验2-8:Logstash的处理结果发送至实验一的Elasticsearch集群
vim /etc/logstash/conf.d/test.conf input { file { path => ["/var/log/httpd/access_log"] start_position => "beginning" } } filter { grok { match => { "message" => "%{HTTPD_COMBINEDLOG}" } remove_field => "message" } date { match => ["timestamp","dd/MMM/YYYY:H:m:s Z"] remove_field => "timestamp" } mutate { rename => { "agent" => "user_agent" } } geoip { source => "clientip" target => geoip database => "/etc/logstash/maxmind/GeoLite2-City.mmdb" } } output { elasticsearch { hosts => ["http://192.168.136.230/", "http://192.168.136.130"] document_type => "httpd-accesslog" index => "logstash-%{+yyyy.MM.dd}" } } /usr/share/logstash/bin/logstash -t -f /etc/logstash/conf.d/test.conf /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/test.conf
可以在elasticsearch-head中看到接收到的数据
四、Filebeat的使用
(一)Beats平台
- Beats 平台:集合了多种单一用途数据采集器。这些采集器安装后可用作轻量型代理,从成百上千或成千上万台机器向 Logstash 或 Elasticsearch 发送数据
- Filebeat:轻量型日志采集器,用于转发和汇总日志与文件
(二)Filebeat的文件结构
- /etc/filebeat/filebeat.yml:配置文件
- /etc/filebeat/filebeat.full.yml:配置文件模板
- /lib/systemd/system/filebeat.service:Unit文件
(三)实验三:Filebeat的使用
-
实验3-1:实现Filebeat收集数据传送至Logstash,并由Logstash转换后传送至Elasticsearch
实验环境:实验2-8配置完成的环境
包含三台Elasticsearch节点主机,一台Logstash主机,并再增加一台Filebeat主机步骤1:Filebeat主机配置
rpm -ivh filebeat-5.5.3-x86_64.rpm vim /etc/filebeat/filebeat.yml filebeat.prospectors: - input_type: log paths: - /var/log/httpd/access_log* // 设置监控的日志 output.logstash: hosts: ["192.168.136.230:5044"] // 指定Logstash服务器的IP和端口 systemctl start filebeat.service
- 步骤2:Logstash主机配置
vim /etc/logstash/conf.d/test.conf input { beats { port => 5044 } } filter { grok { match => { "message" => "%{HTTPD_COMBINEDLOG}" } remove_field => "message" } date { match => ["timestamp","dd/MMM/YYYY:H:m:s Z"] remove_field => "timestamp" } mutate { rename => { "agent" => "user_agent" } } geoip { source => "clientip" target => geoip database => "/etc/logstash/maxmind/GeoLite2-City.mmdb" } } output { elasticsearch { hosts => ["http://192.168.136.230/","http://192.168.136.130/"] document_type => "httpd-accesslog" index => "logstash-%{+yyyy.MM.dd}" } } /usr/share/logstash/bin/logstash -t -f /etc/logstash/conf.d/test.conf systemctl start logstash.service
- 步骤3:测试
echo '120.120.120.120 - - [14/Dec/2017:16:42:56 +0800] "GET / HTTP/1.1" 200 18 "-" "curl/7.29.0"' >> /var/log/httpd/access_log
-
实验3-2:实现Filebeat收集数据传送至Redis,由Redis传送至Logstash,并由Logstash转换后传送至Elasticsearch
实验环境:实验3-1配置完成的环境
包含三台Elasticsearch节点主机,一台Logstash主机,一台Filebeat主机,并再增加一台Redis主机步骤1:修改Filebeat主机配置
vim /etc/filebeat/filebeat.yml filebeat.prospectors: - input_type: log paths: - /var/log/httpd/access_log* // 设置监控的日志 output.redis: enabled: true hosts: [192.168.136.240] // Redis服务器地址 port: 6379 key: httpd-accesslog // key值必须要和Logstash主机的配置相同 db: 0 datatype: list systemctl restart filebeat.service
- 步骤2: 配置Redis主机
yum install redis vim /etc/redis.conf bind 0.0.0.0 systemctl start redis.service
- 步骤3:配置Logstash主机
vim /etc/logstash/conf.d/test.conf input { redis { host => '192.168.136.240' port => '6379' key => 'httpd-accesslog' // key值必须要和Filebeat主机的配置相同 data_type => 'list' } } filter { grok { match => { "message" => "%{HTTPD_COMBINEDLOG}" } remove_field => "message" } date { match => ["timestamp","dd/MMM/YYYY:H:m:s Z"] remove_field => "timestamp" } mutate { rename => { "agent" => "user_agent" } } geoip { source => "clientip" target => geoip database => "/etc/logstash/maxmind/GeoLite2-City.mmdb" } } output { elasticsearch { hosts => ["http://192.168.136.230/","http://192.168.136.130/"] document_type => "httpd-accesslog" index => "logstash-%{+yyyy.MM.dd}" } } /usr/share/logstash/bin/logstash -t -f /etc/logstash/conf.d/test.conf systemctl restart logstash.service
- 步骤4:测试
echo '135.136.137.138 - - [14/Dec/2017:16:42:56 +0800] "GET / HTTP/1.1" 200 18 "-" "curl/7.29.0"' >> /var/log/httpd/access_log
五、Kibana的使用
- Kibana:可视化 Elasticsearch 中的数据
(一)Kibana的文件结构
- /etc/kibana/kibana.yml:配置文件
- /etc/systemd/system/kibana.service:Unit文件
(二)实验四:Kibana的使用
实验4:使用Kibana将Elasticsearch中的数据可视化
实验环境:实验3-2配置完成的环境
包含三台Elasticsearch节点主机,一台Logstash主机,一台Filebeat主机,一台Redis主机,并再增加一台Kibana主机步骤1:配置Kibana
rpm -ivh kibana-5.5.3-x86_64.rpm
vim /etc/kibana/kibana.yml
server.port: 5601 // 监听端口
server.host: "0.0.0.0" // 监听ip
server.basePath: ""
server.name: "node3.hellopeiyang.com"
elasticsearch.url: "http://192.168.136.230:9200" // elasticsearch主机ip地址及端口
systemctl start kibana.service
-
步骤2:在web浏览器中访问Kibana主机的5601端口,进入初始化管理页面
要求填写索引名称,配置后点击Create进入管理平台:左侧为主要功能栏,当前在"Discover"功能中,上面的输入框中可以搜索,下面显示搜索结果
可以使用管理平台左侧的"Visualize"功能,建立统计图形,如下图中的饼图
可以使用管理平台左侧的"Visualize"功能,建立访问地区分布图
可以使用管理平台左侧的"Dashboard"功能,将多幅图并排显示在监控界面
六、Elastic Stack综合应用实例
(一)实验实例1:
实验目标:使用Filebeat, Logstash, Elasticsearch, Kibana等工具收集、处理、存储并可视化Tomcat日志数据
实验环境:包含三台Elasticsearch节点主机,一台Logstash主机,一台Filebeat主机,一台Redis主机和一台Kibana主机
-
步骤1:配置Filebeat主机
vim /etc/filebeat/filebeat.yml filebeat.prospectors: - input_type: log paths: - /var/log/tomcat/*access_log* // 监控的tomcat目录路径 document_type: tomcat-accesslog output.redis: enabled: true hosts: ["192.168.136.131"] port: 6379 key: tomcat-accesslog // 存储至redis的key名称 db: 0 datatype: list systemctl start filebeat.service
-
步骤2:配置Redis服务器
vim /etc/redis.conf bind 0.0.0.0 systemctl start redis.service
-
步骤3:配置Logstash服务器
vim /etc/logstash/conf.d/tomcat.conf input { redis { host => '192.168.136.131' port => '6379' key => 'tomcat-accesslog' // 与filebeat存储至redis数据库的key名称相同 data_type => 'list' } } filter { grok { match => { "message" => "%{HTTPD_COMMONLOG}" } remove_field => "message" } date { match => ["timestamp","dd/MMM/YYYY:H:m:s Z"] remove_field => "timestamp" } } output { elasticsearch { hosts => ["http://192.168.136.230/","http://192.168.136.130/"] document_type => "tomcat-accesslog" index => "logstash-%{+yyyy.MM.dd}" } } systemctl start logstash.service
-
步骤4:配置Elasticsearch Cluster
mkdir /data/els/{data,logs} -pv chown -R elasticsearch,elasticsearch /data/els vim /etc/elasticsearch/elasticsearch.yml cluster.name: myels node.name: node0 path.data: /data/els/data path.logs: /data/els/logs network.host: 192.168.136.230 http.port: 9200 discovery.zen.ping.unicast.hosts: ["node0", "node1", "node3"] discovery.zen.minimum_master_nodes: 2 http.cors.enabled: true http.cors.allow-origin: "* systemctl start elasticsearch
-
步骤5:启动Elasticsearch-head
npm run start &
在web管理页面中查看集群产生了相应的索引
-
步骤6:配置Kibana
vim /etc/kibana/kibana.yml server.port: 5601 server.host: "0.0.0.0" server.basePath: "" server.name: "node3.hellopeiyang.com" elasticsearch.url: "http://192.168.136.230:9200" systemctl start kibana.service
在Kibana的管理页面也看到了格式化的Tomcat日志统计数据
(二)实验实例2:
实验目标:使用Filebeat, Logstash, Elasticsearch, Kibana等工具收集、处理、存储并可视化Nginx日志数据
实验环境:包含三台Elasticsearch节点主机,一台Logstash主机,一台Filebeat主机,一台Redis主机和一台Kibana主机
-
步骤1:配置Filebeat主机
vim /etc/filebeat/filebeat.yml filebeat.prospectors: - input_type: log paths: - /var/log/nginx/access.log* // 监控的tomcat目录路径 document_type: nginx-accesslog output.redis: enabled: true hosts: ["192.168.136.131"] port: 6379 key: nginx-accesslog // 存储至redis的key名称 db: 0 datatype: list systemctl start filebeat.service
-
步骤2:配置Redis服务器
vim /etc/redis.conf bind 0.0.0.0 systemctl start redis.service
-
步骤3:配置Logstash服务器
在filter中使用grok插件时,在没有完全匹配模式的情况下,可以自定义:
例如,\"%{DATA:realclient}\"
,冒号前为数据格式,冒号后为给数据定义的名称vim /etc/logstash/conf.d/nginx.conf input { redis { host => '192.168.136.131' port => '6379' key => 'nginx-accesslog' // 与filebeat存储至redis数据库的key名称相同 data_type => 'list' } } filter { grok { match => { "message" => "%{HTTPD_COMBINEDLOG} \"%{DATA:realclient}\"" } remove_field => "message" } date { match => ["timestamp","dd/MMM/YYYY:H:m:s Z"] remove_field => "timestamp" } } output { elasticsearch { hosts => ["http://192.168.136.230/","http://192.168.136.130/"] document_type => "nginx-accesslog" index => "logstash-%{+yyyy.MM.dd}" } } systemctl start logstash.service
步骤4:配置Elasticsearch Cluster,与实验实例1的步骤4完全相同
-
步骤5:启动Elasticsearch-head
npm run start &
索引中的文档,除了可以看到按照给定模式分段的信息,也看到了自定义分段信息
-
步骤6:配置Kibana,与实验实例1的步骤6完全相同
在Kibana的管理页面可以看到格式化的Nginx日志统计数据,特别注意到自定义分段的信息也可以看到