监控:
传感器:

    数据采集  --> 数据存储  --> 数据展示
    报警:采集到的数据超出阈值
        时间序列数据

    开源监控工具:

    SNMP:Simple Network Managent Protocol
    SNMP的工作模式:
        NMS向agent采集数据
            agent向NMS报告数据
            NMS请求agent修改配置

    SNMP的组件:
        MIB: management information base 管理信息库
            SMI:MIB表示符号
            SNMP协议:

    SNMP协议的版本:
        v1, v2, v3
        v2c:NMS --> agent 
            v3:认真、加密、解密

    Linux: net-snmp 程序包

    NMS可发起操作:
        Get, GetNext, Set, Trap

            agent:Response

            UDP
                NMS:161
                    agent:162

            OID: 对象标识符

    agent:代理

分布式监控
著名的开源监控工具:zabbix, znnos, opennms, cacti, nagios(icinga), ganglia

监控功能的实现:
agent
ssh
SNMP
IPMI

只会平台管理接口(Intelligent Platform Management Interface)原本是一种Intel架构的企业系统的周边设备所采用的一种工业标准,IPMI亦是一个开放的免费标准。使用者无需支付额外的费用饥渴使用此标准。

zabbix:有专用agent的监控工具
监控主机:
Linux、Windows、FreeBSD
网络设备:
SNMP,SSH(并非所有)

可监控对象:
设备/软件
设备:服务器、路由器、交换机、IO系统
软件:OS、网络、应用程序
偶发性小故障:
主机down机、服务不可用、主机不可达
严重故障:
主机性能指标:
趣事:时间序列数据

数据存储:
cacti:rrd(round robin database)
zabbix:mysql,pgsql

zabbix架构中的组件:
zabbix-server:C语言
OS:zabiix-agent:C语言
zabbix-web:GUI,用于实现zabbix设定和展示
zabbix-proxy:分布式监控环境中的专用组件

    zabbix-database:MySQL,PGSQL(postgreSQL)、Oracle、DB2、SQLite

zabbix产生的数据主要由四部分组成:
配置数据
历史数据:50bytes
历史趋势数据:128bytes
事件数据:130bytes

自动化监控:
Zabbix监控(一)_第1张图片

What to monitor?
Devices/Software
Server,Router,Switchs,I/O systems etc.
Operating System,Networks,Applications,etc.
Incidents
DB down,Replication stopped,Server not reachable,etc.
Critical Events
Disk more than n% full or less than m Gbyte free,
Replication more than n seconds logging,Data node down,
100% CPU utilization,etc
Alert,immediate intervention,fire fighting
Trends(includes timel)
Graphs
How long does it take until
my disk is full?
my Index Memory is filled up?
When does it happen?
Peak? Backup?
How often does it happen? Does it happen periodically?
Once a day? Always at Sunday night?
How does it correlate to other informations?
I/O problems during our backup window?
Reading the patterns!
this can help us to find the root cause of problems
Basic solutions:
top, vmstat, iostat, mytop, innotop, SHOW GLOBAL
STATUS, SHOW INNODB STATUS
CLI, no graphs, no log term information,but good for adhoc analysisl
Graphical solutions
Nagios(Opsview, Icinga), Cacti, Zabbix,
Typically NOT specialised in DB monitoring

How Zabbix is progressing?
Zabbix监控(一)_第2张图片

Why use monitoring solution?
Zabbix监控(一)_第3张图片

What are the functionalities of MS?
Ddata gathering
Gathered using various methods, including SNMP, native agents, IPMI and others
Alerting
Gathered data can be compared data can be compared to thresholds and alerts sent out using different channels like e-mail or SMS
Data storage
Once we have gathered the data it doesn't make sense to throw it away, so we will often want to store it for later analysis
Visualisation
Humans are better at distinguishing visualised data, especially when there is huge amounts of data

What is Zabbix?
The Enterprise-class Monitoring Solution
Zabbix监控(一)_第4张图片

Why choose Zabbix?
Zabbix is an enterprise level monitoring software
Scales up-to 100 000 of monitored devices
Distributed monitoring
Supports virtually all platforms and methods of monitoring
True Open Source, no proprietary add-ons, and no "professional" or "enterprise" versions
Estimated number of users is more than 40 000, but could be several times greater

Which platforms does Zabbix support?
Zabbix监控(一)_第5张图片

Various Monitoring Functions
Zabbix监控(一)_第6张图片

What can be monitored on the Web?
Respanse time
Download speed
Response code
Availability of certain content
Complex web scenarios with login and logout capability
Support for HTTP and HTTPs

How you get notified?
Zabbix监控(一)_第7张图片

Zabbix Proxy
Zabbix监控(一)_第8张图片

Zabbix architecture
Zabbix监控(一)_第9张图片

Zabbix architecture
Zabbix监控(一)_第10张图片

Zabbix常用的术语
主机(host):要监控的网络设备,可由IP或DNS名称指定;
主机组(host group):主机饿逻辑容器,可以包含主机和模板,但一个组内的主机和模板不能相互链接;主机组通常在给用户或用户组指派监控权限时使用;
监控项(item):一个特定监控指标的相关的数据,这些数据来自于被监控对象;对于item是zabbix进行数据收集的核心,没有item,将没有数据;相对某监控对象来说,每个item都由"key"进行标识;
触发器(trigger):一个表达式,用于评估某监控对象的某特定item内所接收到的数据是否在合理范围内,即阀值;接收到的数据量大于阀值时,触发器状态将从"OK"转变为"Problem",当数据量再次回归到合理范围时,其状态将从"Problem"转换回"OK";
事件(event):即发生的一个值得关注的事情,例如触发器的状态转变,新的agent或重新上线的agent的自动注册等;
动作(action):指对于特定事件事先定义的处理方法,通过包含操作(如发送通知)和条件(何时执行操作);
报警升级(escalation):发送警报或制定远程命令的自定义方案,如每隔5分钟发送一次警报,共发送5次等;
媒介(media):发送通知的手段或通道,如Email、Jabber或SMS等;
通知(notification):通过选定的媒介向用户发送的有关某事件的信息;
远程命令(remote command):预定义的命令,可在被监控主机处于某特定条件下时自动执行;
模板(template):用于快速定义被监控主机的预设条目集合,通常包含了item、trigger、graph、screen、application以及low-level discovery rule;模板可以直接链接至单个主机;
应用(application):一组item的集合;
web场景(web scennario):用于检测web站点可用性的一个或多个HTTP请求;
前端(frontend):Zabbix的web接口;

Zabbix的逻辑架构
Zabbix监控(一)_第11张图片

Zabbix Server Processes
Zabbix监控(一)_第12张图片

Requirements
Hardware Examples
Zabbix监控(一)_第13张图片

Software - DBMS
Zabbix监控(一)_第14张图片

Software - Frontend
Zabbix监控(一)_第15张图片

Software - Server
Zabbix监控(一)_第16张图片

Install Zabbix
Create zabbix user
Untar source tarball
Create zabbix database and populate it
A MySQL(PostgreSQL,...) installation is needed...
./config ; make ; make install
Some packages may be missing...
Does not take too long (< 10 min)
Create configuration file for zabbix server
(misc/conf/zabbix_server.conf)
Start the zabbix server

Install the Zabbix web interface
Apache/PHP is required
Copy PHP file to $DocumentRoot/zabbix
http://localhost/zabbix
Change php.ini
Default settings are by far not enough!
data.timezone = Asia/Shanghai
Restart webserver
Finish configuration
Login with admin/zabbix

实验环境:
主机名称:node1.smoke.com master
操作系统:CentOS 6.5
内核版本:2.6.32-504.el6.x86_64
网卡1:vmnet0 172.16.100.7
网卡2:vmnet8 dhcp
主机名称:node2.smoke.com
操作系统:CentOS 6.5
内核版本:2.6.32-504.el6.x86_64
网卡1:vmnet0 172.16.100.8
网卡2:vmnet8 dhcp
主机名称:node3.smoke.com
操作系统:Windows xp
网卡1:172.16.100.9

系统配置:
node1:zabbix-server

[root@node1 ~]# hostname
node1.smoke.com
[root@node1 ~]# ip addr show
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:99:d9:9e brd ff:ff:ff:ff:ff:ff
    inet 172.16.100.7/24 brd 172.16.100.255 scope global eth0
    inet6 fe80::20c:29ff:fe99:d99e/64 scope link
       valid_lft forever preferred_lft forever
3: eth1:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:99:d9:a8 brd ff:ff:ff:ff:ff:ff
    inet 192.168.243.145/24 brd 192.168.243.255 scope global eth1
    inet6 fe80::20c:29ff:fe99:d9a8/64 scope link
       valid_lft forever preferred_lft forever
[root@node1 ~]# ip route show
172.16.100.0/24 dev eth0  proto kernel  scope link  src 172.16.100.7
192.168.243.0/24 dev eth1  proto kernel  scope link  src 192.168.243.145
169.254.0.0/16 dev eth0  scope link  metric 1002
169.254.0.0/16 dev eth1  scope link  metric 1003
default via 192.168.243.2 dev eth1
[root@node1 ~]# crontab -l
*/5 * * * * /usr/sbin/ntpdate time.nist.gov &> /dev/null
[root@node1 ~]# vim /etc/hosts
172.16.100.7   node1.smoke.com node1
172.16.100.8   node2.smoke.com node2
172.16.100.9   node3.smoke.com node3

node2:linux-agent

[root@node2 ~]# hostname
node2.smoke.com
[root@node2 ~]# ip addr show
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:d6:6a:92 brd ff:ff:ff:ff:ff:ff
    inet 172.16.100.8/24 brd 172.16.100.255 scope global eth0
    inet6 fe80::20c:29ff:fed6:6a92/64 scope link
       valid_lft forever preferred_lft forever
3: eth1:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:d6:6a:9c brd ff:ff:ff:ff:ff:ff
    inet 192.168.243.146/24 brd 192.168.243.255 scope global eth1
    inet6 fe80::20c:29ff:fed6:6a9c/64 scope link
       valid_lft forever preferred_lft forever
[root@node2 ~]# ip route show
172.16.100.0/24 dev eth0  proto kernel  scope link  src 172.16.100.8
192.168.243.0/24 dev eth1  proto kernel  scope link  src 192.168.243.146
169.254.0.0/16 dev eth0  scope link  metric 1002
169.254.0.0/16 dev eth1  scope link  metric 1003
default via 192.168.243.2 dev eth1
[root@node2 ~]# crontab -l
*/5 * * * * /usr/sbin/ntpdate time.nist.gov &> /dev/nulll
[root@node2 ~]# vim /etc/hosts
172.16.100.7   node1.smoke.com node1
172.16.100.8   node2.smoke.com node2
172.16.100.9   node3.smoke.com node3

node3:windows-agent

安装mariadb:
node1:zabbix-server

[root@node1 ~]# tar xf cmake-2.8.8.tar.gz
[root@node1 ~]# cd cmake-2.8.8
[root@node1 cmake-2.8.8]# ./bootstrap
[root@node1 cmake-2.8.8]# make && make install
[root@node1 cmake-2.8.8]# cd
[root@node1 ~]# groupadd -g 306 -r mysql
[root@node1 ~]# useradd -u 306 -g mysql -r -s /sbin/nologin mysql
[root@node1 ~]# mkdir /mydata/data -pv
[root@node1 ~]# yum -y install readline-devel zlib-devel openssl-devel
[root@node1 ~]# tar xf mariadb-10.0.10.tar.gz
[root@node1 ~]# cd mariadb-10.0.10
[root@node1 mariadb-10.0.10]# cmake . -DCMAKE_INSTALL_PREFIX=/usr/local/mysql \
> -DMYSQL_DATADIR=/mydata/data \
> -DWITH_INNOBASE_STORAGE_ENGINE=1 \
> -DWITH_ARCHIVE_STORAGE_ENGINE=1 \
> -DWITH_BLACKHOLE_STORAGE_ENGINE=1 \
> -DWITH_READLINE=1 \
> -DWITH_SSL=system \
> -DWITH_ZLIB=system \
> -DWITH_LIBWRAP=0 \
> -DMYSQL_UNIX_ADDR=/tmp/mysql.sock \
> -DDEFAULT_CHARSET=utf8 \
> -DDEFAULT_COLLATION=utf8_general_ci
[root@node1 mariadb-10.0.10]# make && make install
[root@node1 mariadb-10.0.10]# cd /usr/local/mysql/
[root@node1 mysql]# chgrp mysql ./*
[root@node1 mysql]# chown mysql:mysql /mydata/data
[root@node1 mysql]# scripts/mysql_install_db --user=mysql --datadir=/mydata/data
[root@node1 mysql]# cp support-files/mysql.server /etc/rc.d/init.d/mysqld
[root@node1 mysql]# chmod +x /etc/rc.d/init.d/mysqld
[root@node1 mysql]# chkconfig --add mysqld
[root@node1 mysql]# mv /etc/my.cnf /etc/my.cnf.bak
[root@node1 mysql]# cp support-files/my-large.cnf /etc/my.cnf
[root@node1 mysql]# vim /etc/my.cnf
log-bin=/mydata/binlogs/master-bin
innodb_file_per_table = ON
[root@node1 mysql]# mkdir -pv /mydata/binlogs/
[root@node1 mysql]# chown -R mysql.mysql /mydata/binlogs/
[root@node1 mysql]# service mysqld start
[root@node1 mysql]# vim /etc/profile.d/mysqld.sh
export PATH=/usr/local/mysql/bin:$PATH
[root@node1 mysql]# . /etc/profile.d/mysqld.sh
[root@node1 mysql]# mysql
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 4
Server version: 10.0.10-MariaDB-log Source distribution

Copyright (c) 2000, 2014, Oracle, SkySQL Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> use mysql
Database changed
MariaDB [mysql]> SELECT user,host,password FROM user;
+------+-----------------+----------+
| user | host            | password |
+------+-----------------+----------+
| root | localhost       |          |
| root | node1.smoke.com |          |
| root | 127.0.0.1       |          |
| root | ::1             |          |
|      | localhost       |          |
|      | node1.smoke.com |          |
+------+-----------------+----------+
6 rows in set (0.00 sec)

MariaDB [mysql]> DROP USER ""@'localhost';
Query OK, 0 rows affected (0.00 sec)

MariaDB [mysql]> DROP USER ""@'node1.smoke.com';
Query OK, 0 rows affected (0.00 sec)

MariaDB [mysql]> \q
Bye

安装zabbix:
node1:zabbix-server

[root@node1 mysql]# mysql
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 6
Server version: 10.0.10-MariaDB-log Source distribution

Copyright (c) 2000, 2014, Oracle, SkySQL Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> CREATE DATABASE zabbix CHARACTER SET utf8;
Query OK, 1 row affected (0.01 sec)

MariaDB [(none)]> GRANT ALL on zabbix.* TO 'zbxuser'@'172.16.%.%' IDENTIFIED BY 'zbxpass';
Query OK, 0 rows affected (0.00 sec)

MariaDB [(none)]> GRANT ALL on zabbix.* TO 'zbxuser'@'node1.smoke.com' IDENTIFIED BY 'zbxpass';
Query OK, 0 rows affected (0.01 sec)

MariaDB [(none)]> FLUSH PRIVILEGES;
Query OK, 0 rows affected (0.00 sec)

MariaDB [(none)]> \q
Bye
[root@node1 mysql]# cd /root/zabbix-2.4/
[root@node1 zabbix-2.4]# ls
zabbix-2.4.0-1.el6.x86_64.rpm               zabbix-proxy-pgsql-2.4.0-1.el6.x86_64.rpm    zabbix-web-2.4.0-1.el6.noarch.rpm
zabbix-agent-2.4.0-1.el6.x86_64.rpm         zabbix-proxy-sqlite3-2.4.0-1.el6.x86_64.rpm  zabbix-web-japanese-2.4.0-1.el6.noarch.rpm
zabbix-get-2.4.0-1.el6.x86_64.rpm           zabbix-sender-2.4.0-1.el6.x86_64.rpm         zabbix-web-mysql-2.4.0-1.el6.noarch.rpm
zabbix-java-gateway-2.4.0-1.el6.x86_64.rpm  zabbix-server-2.4.0-1.el6.x86_64.rpm         zabbix-web-pgsql-2.4.0-1.el6.noarch.rpm
zabbix-proxy-2.4.0-1.el6.x86_64.rpm         zabbix-server-mysql-2.4.0-1.el6.x86_64.rpm
zabbix-proxy-mysql-2.4.0-1.el6.x86_64.rpm   zabbix-server-pgsql-2.4.0-1.el6.x86_64.rpm
[root@node1 zabbix-2.4]# yum -y install zabbix-server-2.4.0-1.el6.x86_64.rpm zabbix-server-mysql-2.4.0-1.el6.x86_64.rpm zabbix-get-2.4.0-1.el6.x86_64.rpm zabbix-2.4.0-1.el6.x86_64.rpm zabbix-web-2.4.0-1.el6.noarch.rpm zabbix-web-mysql-2.4.0-1.el6.noarch.rpm zabbix-agent-2.4.0-1.el6.x86_64.rpm zabbix-sender-2.4.0-1.el6.x86_64.rpm

安装报错:

Error: Package: zabbix-server-mysql-2.4.0-1.el6.x86_64 (/zabbix-server-mysql-2.4.0-1.el6.x86_64)
           Requires: libiksemel.so.3()(64bit)
Error: Package: zabbix-server-2.4.0-1.el6.x86_64 (/zabbix-server-2.4.0-1.el6.x86_64)
           Requires: iksemel
Error: Package: zabbix-server-2.4.0-1.el6.x86_64 (/zabbix-server-2.4.0-1.el6.x86_64)
           Requires: fping
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest

[root@node1 ~]# ls *.rpm
fping-2.4b2-10.el6.x86_64.rpm  iksemel-1.4-2.el6.x86_64.rpm  iksemel-devel-1.4-2.el6.x86_64.rpm  libiksemel3-1.4-2_2.el6.x86_64.rpm
[root@node1 ~]# yum localinstall  iksemel-1.4-2.el6.x86_64.rpm iksemel-devel-1.4-2.el6.x86_64.rpm fping-2.4b2-10.el6.x86_64.rpm

配置zabbix:
node1:zabbix-server

[root@node1 ~]# service httpd start
[root@node1 ~]# cd /usr/share/doc/zabbix-server-mysql-2.4.0/create/
[root@node1 create]# ls
data.sql  images.sql  schema.sql
[root@node1 create]# mysql zabbix < schema.sql    #导入数据库要有顺序,先schema、images、data;
[root@node1 create]# mysql zabbix < images.sql
[root@node1 create]# mysql zabbix < data.sql
[root@node1 ~]# vim /etc/zabbix/zabbix_server.conf
DBHost=172.16.100.7
DBUser=zbxuser
DBPassword=zbxpass
DBSocket=/tmp/mysql.sock
[root@node1 ~]# service zabbix-server start
[root@node1 ~]# ss -tnl
State       Recv-Q Send-Q                            Local Address:Port                              Peer Address:Port
LISTEN      0      128                                          :::22                                          :::*
LISTEN      0      128                                           *:22                                           *:*
LISTEN      0      128                                          :::10051                                       :::*
LISTEN      0      128                                           *:10051                                        *:*
LISTEN      0      128                                           *:3306                                         *:*
LISTEN      0      128                                          :::80                                          :::*
[root@node1 ~]# vim /etc/php.ini
date.timezone = Asia/Chongqing
[root@node1 ~]# service httpd restart

配置zabbix-web:
通过windows的浏览器输入172.16.100.7/zabbix;
Zabbix监控(一)_第17张图片

点击next,所有检查通过,点击next,配置数据库连接,数据库主机172.16.100.7、数据库用户zbxuser、密码zbxpass,点击next;
Zabbix监控(一)_第18张图片

配置zabbix主机地址172.16.100.7,name为node1.smoke.com,点击next,到finish,自动跳转到zabbix登录界面;
Zabbix监控(一)_第19张图片

默认账号admin,密码zabbix;
Zabbix监控(一)_第20张图片

zabbix-server监控自己:
node1:zabbix-server

[root@node1 ~]# vim /etc/zabbix/zabbix_agentd.conf
Server=127.0.0.1,172.16.100.7
ServerActive=127.0.0.1,172.16.100.7
Hostname=node1.smoke.com
[root@node1 ~]# service zabbix-agent start
[root@node1 ~]# ss -tnl
State       Recv-Q Send-Q                            Local Address:Port                              Peer Address:Port
LISTEN      0      128                                          :::22                                          :::*
LISTEN      0      128                                           *:22                                           *:*
LISTEN      0      128                                          :::10050                                       :::*
LISTEN      0      128                                           *:10050                                        *:*
LISTEN      0      128                                          :::10051                                       :::*
LISTEN      0      128                                           *:10051                                        *:*
LISTEN      0      128                                           *:3306                                         *:*
LISTEN      0      128                                          :::80                                          :::*

在zabbix-web上面启用对node1.smoke.com的监控,点击Configuration--Hosts,默认zabbix-server已经把自己监控了,点击Disable启用本机监控;
Zabbix监控(一)_第21张图片

zabbix-server监控linux-agent:
node2:linux-agent

[root@node2 ~]# cd zabbix-2.4/
[root@node2 zabbix-2.4]# yum install zabbix-2.4.0-1.el6.x86_64.rpm zabbix-agent-2.4.0-1.el6.x86_64.rpm zabbix-sender-2.4.0-1.el6.x86_64.rpm
[root@node2 zabbix-2.4]# vim /etc/zabbix/zabbix_agentd.conf
Server=172.16.100.7
ServerActive=172.16.100.7
Hostname=node2.smoke.com
[root@node2 zabbix-2.4]# service zabbix-agent start

在zabbix-web上面添加linux-agent主机监控,点击Configuration--Hosts--Create-host,填写完成,点击Add;
Host:主机
Host name:主机名,172.16.100.8;
Visible name:node2.smoke.com
New group:创建组
Agent interfaces:IP address 172.16.100.8,Connect to 选择IP,Port 默认10050
SNMP interfaces:通过snmp监控
JMX interfaces:监控java;
IPMI interfaces:监控服务器硬件;
Monitored by proxy:是否使用代理,no proxy
Templates:模板;
IPMI:监控硬件
Macros:宏,变量;
Host inventory:资产清单
Zabbix监控(一)_第22张图片

点击Monitoring--Graphs(图形),Group选择all,Host选择all,Graph选择Zabbix data qathering process busy %;
Zabbix监控(一)_第23张图片

点击Monitoring--Screens,可以自定义多张图片显示在一张页面;
Zabbix监控(一)_第24张图片