一、部署zabbix
1、配置master节点
- 准备LAMP环境和zabbix的yum源
# yum install httpd php mariadb-server -y
# vim /etc/my.cnf
[mysqld]
log-bin=master-log
innodb_file_per_table=ON
skip_name_resolve=ON
# systemctl start mariadb
# systemctl enable mariadb
# vim /etc/yum.repos.d/zabbix.repo
[zabbix]
name=zabbix
baseurl=https://mirrors.tuna.tsinghua.edu.cn/zabbix/zabbix/3.4/rhel/7/x86_64/
gpgcheck=0
[non-supported]
name=non-supported
baseurl=https://mirrors.tuna.tsinghua.edu.cn/zabbix/non-supported/rhel/7/x86_64/
gpgcheck=0
- 安装并配置zabbix
# yum install zabbix-server-mysql zabbix-web-mysql zabbix-agent -y
# mysql
MariaDB [(none)]> create database zabbix character set utf8 collate utf8_bin;
MariaDB [(none)]> grant all privileges on zabbix.* to zabbix@localhost identified by 'zbxpass';
MariaDB [(none)]> grant all privileges on zabbix.* to [email protected] identified by 'zbxpass';
MariaDB [(none)]> grant all privileges on zabbix.* to zabbix@'192.168.0.%' identified by 'zbxpass';
MariaDB [(none)]> quit
# zcat /usr/share/doc/zabbix-server-mysql*/create.sql.gz | mysql -uzabbix -h192.168.0.8 -pzbxpass zabbix
# vim /etc/zabbix/zabbix_server.conf
DBHost=192.168.0.8
DBPassword=zbxpass
# systemctl start zabbix-server
# systemctl enable zabbix-server
# vim /etc/httpd/conf.d/zabbix.conf
php_value date.timezone Asia/Shanghai
# systemctl start httpd
# systemctl enable httpd
浏览器访问:http://192.168.0.8/zabbix/,默认用户名密码:Admin/zabbix
2、配置被监控节点
- 配置zabbix的yum源,和master节点一致
- 安装zabbix-agent
# yum install zabbix-agent zabbix-sender
- 配置agent参数
# vim /etc/zabbix/zabbix_agentd.conf
Server=192.168.0.8 #zabbix_master的IP地址,建议使用主机名
ServerActive=192.168.0.8
Hostname=node01.zabbix.com
- 启动agent
# systemctl start zabbix-agent
# systemctl enable zabbix-agent
二、监控系统
1、基本具有的功能
数据采集功能:
- ssh/telnet
- SNMP
- IPMI
- JMX
- agent
数据存储:
- SQL
- NoSQL
- rrd
可视化:
- grafana
告警:
2、zabbix
- zabbix server
- zabbix database(MySQL)
- zabbix web gui(LAMP)
- zabbix proxy
- zabbix agent
3、监控基本术语
主机(host) -- 主机组(host group)
监控项(item) -- 应用(application)
触发器(trigger):阈值,trigger event
动作(action):conditions(条件),operations(操作)
三、zabbix基本监控流程(以下为webGUI操作)
1、添加主机及主机组
Configuration -- Hosts -- Create host -- Add
Host name: node01.zabbix.com
Visible name: node01
New group: MyServers
Agent interfaces:
IP address: 192.168.0.9
Port: 10050
2、创建Item
Configuration -- Hosts -- Items -- Create item -- Add
Item:
Name: inbound packets
Type: Zabbix agent
Key: net.if.in[eth0,packets]
Host interface: 192.168.0.9:10050
Type of information: Numeric(unsigned) #无符号整数
Units: packets/second
Update interval: 5s
History storage period: 90d #历史数据保存90天
Trend storage period: 365d #趋势数据
Show value: As is #数据状态转换(不转换)
New application: net traffic
Populates host inventory field: None #是否加入资产清单
Preprocessing: #数据预处理
Preprocessing steps:
Name: Change per second #计算每秒钟的变化量
3、克隆Item
Configuration -- Hosts -- Items -- inbound packets -- Clone -- Add
Name: inbound bytes
Key: net.if.in[eth0,bytes]
Units: Bps
Name: outbound packets
Key: net.if.out[eth0,packets]
Units: packets/second
Name: outbound bytes
Key: net.if.out[eth0,bytes]
Units: Bps
4、创建Trigger
Configuration -- Hosts -- Triggers -- Create trigger -- Add
Name: inbound packets too fast
Expression: {node01.zabbix.com:net.if.in[eth0,packets].last(#1)}>100
Add:
item: node01: inbound packets
Function: last()-Last(most recent) T value
Last of(T): 1 Count
Result > 100
OK event generation: Expression #事件恢复处理
PROBLEM event generation mode: Single #事件报告触发一次
OK event closes: All problems #恢复后关闭事件
5、创建Action
action: event dirven 事件驱动,触发动作
conditions 条件
operations 操作
OK - PROBLEM operations
PROBLEM - OK recovery operations
ackownlegement operations
remote command 远程命令
send message 发送消息
1、在node01上安装nginx
# yum install nginx -y
# systemctl start nginx
2、将nginx加入item
Configuration -- Hosts -- Items -- Create item -- Add
Name: nginx service state
Key: net.tcp.port[192.168.0.9,80]
Update interval: 5s
Show value: Service state
New application: nginx status
3、定义一个Trigger
Configuration -- Hosts -- Triggers -- Create trigger -- Add
Name: nginx down
Severity: High
Expression: {node01.zabbix.com:net.tcp.port[192.168.0.9,80].last(#3)}=0
4、定义一个Action,Event source:Triggers
Configuration -- Actions -- Create action(注意:事件源选择Triggers event) -- Add
Action: 动作
Name: nginx service
Type of calculation: And #满足以下俩个条件触发操作
Conditions: 条件
Trigger = node01: nginx down
Maintenance status not in maintenance #非维护期间
Operations: 操作
New:
Steps: 1-1
Operation type: Remote command
Recovery operations: 恢复操作
Target list: Current host
Type: Custom script
Execute on: Zabbix agent
Commands: sudo /usr/bin/systemctl restart nginx.service
Acknowledgement operations: 确认操作
5、利用agent来执行远程命令时需要授予zabbix用户sudo权限,并且修改agent的配置文件,在node01做以下配置
[root@node01 ~]# visudo
root ALL=(ALL) ALL
zabbix ALL=(ALL) NOPASSWD: ALL
[root@node01 ~]# vim /etc/zabbix/zabbix_agentd.conf
EnableRemoteCommands=1
[root@node01 ~]# systemctl restart zabbix-agent
6、创建媒介,实现通过邮件发送报警信息
1、在master节点安装mailx,利用脚本发送邮件
[root@master ~]# yum install mailx -y
2、添加媒介
Administration -- Media types -- Create media type -- Add
Media type:
Name: local email
Type: Email
SMTP server: localhost
SMTP server port: 25
SMTP helo: localhost
SMTP email: zabbix@localhost
Connection security: None
Authentication: None
Options:
Concurrent sessions: Unlimited
3、为admin用户添加端点
Administration -- Admin -- Media -- Add -- Update
Type: local email
Send to: dongfei@localhost
4、继nginx service的Action添加报警升级操作
Configuration -- Actions -- nginx service -- Operations -- New -- Add -- Update
Steps: 2-2
Operation type: Send message
Send to Users: Admin (Zabbix Administrator)
Send only to: local email
Recovery operations: #恢复后发送邮件
Send to Users: Admin (Zabbix Administrator)
Send only to: local email
5、测试验证:可以将nginx的端口改成8080,然后将nginx进程杀掉,查看监控信息,到master节点切换到dongfei用户使用mail命令接受邮件查看报警
四、macro宏 -- 预设的文本替换模式
zabbix中宏有三个级别
- 全局级别
- 模板级别
- 主机级别
1、内建宏
引用方法:{MACRO_NAME}
参考:https://www.zabbix.com/documentation/3.4/manual/appendix/macros/supported_by_location
2、自定义宏
引用方法:{$MACRO_NAME}
全局宏定义:Administration -- General -- Macros(右侧下拉列表) -- Add -- Update
主机宏定义:Configuration -- Hosts -- node01 -- Macros -- Add -- Update
模板宏定义:Configuration -- Templates -- Template OS Linux -- Macros -- Add -- Update
五、Template 模板
1、将模板连接至主机:Configuration -- Hosts -- node01 -- Templates
2、自定义模板:Configuration -- Templates -- Create template
Template name: my template
Visible name: template for os linux
New group: my template
3、导入模板:Configuration -- Templates -- Import
4、到https://share.zabbix.com/下载模板,找到项目所在的GitHub站点
# yum install git -y
# git clone https://github.com/cuimingkun/zbx_tem_redis.git
# sz zbx_tem_redis/redis_templates_for_zbx_3.4.xml #导出到windows(本地),web导入到zabbix的模板
# scp zbx_tem_redis/userparameter_redis_lld_plus.conf node01:/etc/zabbix/zabbix_agentd.d/ #需要将自定义key的配置文件放到agent上
六、自定义key
1、直接定义key
1、在agent端定义
[root@node01 ~]# vim /etc/zabbix/zabbix_agentd.d/test.conf
UserParameter=memory.used,/usr/bin/free | /usr/bin/awk '/^Mem/{print $3}'
UserParameter=memory.shm,/usr/bin/free | /usr/bin/awk '/^Mem/{print $5}'
[root@node01 ~]# systemctl restart zabbix-agent.service
2、在master端测试
[root@master ~]# yum install zabbix-get -y
[root@master ~]# zabbix_get -s node01 -p 10050 -k "memory.used"
181076 #获取到的数据
[root@master ~]# zabbix_get -s node01 -p 10050 -k "memory.shm"
2、key的参数传递
[root@node01 ~]# vim /etc/zabbix/zabbix_agentd.d/memory.conf
UserParameter=memory.usage[*],/usr/bin/awk '/^$1/{print $$2}' /proc/meminfo #此处awk中的$2需要做逃逸
[root@node01 ~]# systemctl restart zabbix-agent.service
[root@master ~]# zabbix_get -s node01 -p 10050 -k "memory.usage[MemFree]"
[root@master ~]# zabbix_get -s node01 -p 10050 -k "memory.usage[Shmem]"
3、在host上创建item
Name: memory MemFree
Key: memory.usage[MemFree] #将参数MemFree传递给Key来获取空闲内存值
New application: memory stats
Name: memory Buffers
Key: memory.usage[Buffers]
七、Discovery 自动发现
1、创建自动发现扫描规则
Configuration -- Discovery -- Create discovery rule -- Add
Name: My Net 1
IP range: 192.168.0.1-20
Update interval: 30s #做测试用,30s扫描一次
Checks: Zabbix agent "system.uname"
Device uniqueness criteria: Zabbix agent "system.uname"
2、在node02上安装agent
[root@node02 ~]# yum install zabbix-agent zabbix-sender -y
[root@node02 ~]# vim /etc/zabbix/zabbix_agentd.conf
Server=master.zabbix.com
ServerActive=Server=master.zabbix.com
Hostname=node02.zabbix.com
[root@node02 ~]# systemctl start zabbix-agent.service
[root@node02 ~]# systemctl enable zabbix-agent.service
3、添加发现行为
Configuration -- Actions -- Event source(Discovery) -- Create action -- Add
Action:
Name: Add My Net Hosts
Type of calculation: And
Conditions:
A Discovery rule = My Net 1
B Discovery status = Discovered
Operations:
Operations:
Send message to users: Admin (Zabbix Administrator) via local email
Add host
Link to templates: Template OS Linux
八、主动监控方式(默认是被动方式)
在agent的基本配置:
- ServerActive=master.zabbix.com
- Hostname=node02.zabbix.com
- HostnameItem=system.hostname
1、主动检测的数据发送方式
Configuration -- Hosts -- Items -- Create item -- Add
Item:
Name: net traffic in bytes
Type: Zabbix agent (active) #agent主动向zabbix_server发送数据
Units: bps
Applications: net traffic
Preprocessing:
Preprocessing steps:
Change per second
2、zabbix_sender的数据发送方式
Configuration -- Hosts -- Items -- Create item -- Add
Name: test sender metric
Type: Zabbix trapper
Key: test.sender.metric
New application: sender data
在node02端定义发送的数据
[root@node02 ~]# zabbix_sender -z master.zabbix.com -s "node02.zabbix.com" -k "test.sender.metric" -o "875"
[root@node02 ~]# zabbix_sender -z master.zabbix.com -s "node02.zabbix.com" -k "test.sender.metric" -o "`free -m |awk '/^Mem/{print $3}'`"
九、web监控
监控指定的站点的资源下载速度,及页面响应时间,还有响应代码
web.test.in[Scenario,Step,bps]:传输速率
web.test.time[Scenario,Step]:响应时长
web.test.rspcode[Scenario,Step]:响应码
创建web监控:Configuration -- Hosts -- Web -- Create web scenario
Scenario:
Name: node02 web ui
New application: node02 web ui performance
Update interval: 10s
Agent: Chrome 38.0(Linux)
Steps: 1: home page 15s http://192.168.0.10/index.html 200
Add:
Name: home page
URL: http://192.168.0.10/index.html
Retrieve only headers: √
Required status codes: 200
十、SNMP监控
Simple Network Management Protocol:简单网络管理协议
- agent/manager
- Net-SNMP
- net-snmp-utils
SNMP的三个版本
- v1
- v2c:community name is the password , public
- v3:支持认证和加密传输
MIB:管理信息库,OID == Object Id
1、配置zabbix支持SNMP
在被监控主机中安装
# yum install net-snmp net-snmp-utils -y #net-snmp-utils用来测试用
配置启动服务
# vim /etc/snmp/snmpd.conf
#view systemview included .1.3.6.1.2.1.1
#view systemview included .1.3.6.1.2.1.25.1.1
view systemview included .1.3.6.1
# systemctl start snmpd
# systemctl enable snmpd
在本机测试
# snmptranslate -Tp .1.3.6.1.2.1 |more
# snmpget -v 2c -c public 192.168.0.10 .1.3.6.1.2.1.1.1.0 #获取系统描述信息
# snmpwalk -v 2c -c public 192.168.0.10 .1.3.6.1.2.1.25.4.2.1.2 #获取进程列表
2、在zabbix配置监控
Configuration -- Hosts -- Create Host -- Add
Host name: node02
Visible name: node02
New group: my linux servers
SNMP interfaces: 192.168.0.10 DNS 161
-- Item -- Create Item -- Add
Item:
Name: net traffic in bytes
Type: SNMPv2 agent
Key: net.if.in.bytes
SNMP OID: .1.3.6.1.2.1.2.2.1.10.2
SNMP community: public
Units: bps
Update interval: 5s
New application: net traffic
Preprocessing:
Preprocessing steps: Change per second
十一、JMX监控
JMX:java管理扩展
1、在node02上安装配置Tomcat
[root@node02 ~]# yum install java-1.8.0-openjdk-devel tomcat tomcat-admin-webapps tomcat-webapps tomcat-docs-webapps -y
[root@node02 ~]# vim /etc/tomcat/tomcat.conf #加入以下配置
CATALINA_OPTS="-Djava.rmi.server.hostname=192.168.0.10 -Djavax.management.bui
lder.initial= -Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxr
emote.port=12345 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.managem
ent.jmxremote.authenticate=false"
[root@node02 ~]# systemctl start tomcat
[root@node02 ~]# ss -tnl |grep 12345
LISTEN 0 50 :::12345
[root@node02 ~]# systemctl enable tomcat
2、在zabbix-server上安装配置zabbix-java-gateway(如果有大量的jvm需要被监控,那么java-gateway需要独立安装到一台服务器中)
[root@master ~]# yum install zabbix-java-gateway -y
[root@master ~]# vim /etc/zabbix/zabbix_java_gateway.conf
LISTEN_PORT=10052
START_POLLERS=5
[root@master ~]# systemctl start zabbix-java-gateway
[root@master ~]# ss -tnl |grep 10052
LISTEN 0 50 :::10052
[root@master ~]# vim /etc/zabbix/zabbix_server.conf
JavaGateway=192.168.0.8
JavaGatewayPort=10052
StartJavaPollers=5
[root@master ~]# systemctl restart zabbix-server
3、在zabbix的webGUI中配置监控
Configuration -- Hosts -- Create Host -- Add
JMX interfaces: 192.168.0.10 12345
Linked templates: Template App Apache Tomcat JMX
十二、zabbix的分布式监控
1、配置zabbix_proxy,192.168.0.11
[root@zabbix_proxy ~]# yum install mariadb-server zabbix-proxy-mysql zabbix-get zabbix-agent zabbix-sender -y
[root@zabbix_proxy ~]# vim /etc/my.cnf
[mysqld]
skip_name_resolve=0
[root@zabbix_proxy ~]# systemctl start mariadb
[root@zabbix_proxy ~]# systemctl enable mariadb
[root@zabbix_proxy ~]# mysql
MariaDB [(none)]> CREATE DATABASE zbxproxy character set utf8 collate utf8_bin;
MariaDB [(none)]> GRANT ALL PRIVILEGES ON zbxproxy.* TO zabbix@localhost IDENTIFIED BY 'zbxpass';
[root@zabbix_proxy ~]# zcat /usr/share/doc/zabbix-proxy-mysql-3.4.13/schema.sql.gz |mysql -uzabbix -pzbxpass zbxproxy
[root@zabbix_proxy ~]# vim /etc/zabbix/zabbix_proxy.conf
Server=192.168.0.8
Hostname=zabbix_proxy #注意此处的主机名必须可以被解析
ListenPort=10051
DBName=zbxproxy
DBUser=zabbix
DBPassword=zbxpass
HeartbeatFrequency=20
ConfigFrequency=10
DataSenderFrequency=1
[root@zabbix_proxy ~]# systemctl start zabbix-proxy.service
[root@zabbix_proxy ~]# systemctl enable zabbix-proxy.service
2、在zabbix的webGUI中配置proxy
Administration -- Proxies -- Create proxy -- Add
Proxy name: zabbix_proxy #此处的主机名一定需要被解析
3、加入被proxy代理的被监控端,注意:被监控的agent需要配置允许proxy监控
Configuration -- Hosts -- Create host -- Add
Host:
Host name: master.dongfei.tech
Visible name: k8s_master
New group: my linux servers
Agent interfaces: 192.168.0.12 10050
Monitored by proxy: zabbix_proxy
Templates:
Linked templates: Template OS Linux
4、由proxy的自动发现
Configuration -- Create discovery rule -- Add
Name: My Net 2
Discovery by proxy: zabbix_proxy
IP range: 192.168.0.1-20
Update interval: 1h
Checks: Zabbix agent "system.uname"
Device uniqueness criteria: Zabbix agent "system.uname"