运维监控系统之Open-Falcon

运维监控系统之Open-Falcon

一、Open-Falcon介绍

open-falcon是一款用golang和python写的监控系统,由小米启动这个项目。

1、监控系统,可以从运营级别(基本配置即可),以及应用级别(二次开发,通过端口进行日志上报),对服务器、操作系统、中间件、应用进行全面的监控,及报警,对我们的系统正常运行的作用非常重要。

2、基础监控

CPU、Load、内存、磁盘、IO、网络相关、内核参数、ss 统计输出、端口采集、核心服务的进程存活信息采集、关键业务进程资源消耗、NTP offset采集、DNS解析采集,这些指标,都是open-falcon的agent组件直接支持的。

Linux运维基础采集项:http://book.open-falcon.org/zh/faq/linux-metrics.html

对于这些基础监控选项全部理解透彻的时刻,也就是对Linux运行原理及命令进阶的时刻。

3、第三方监控

术业有专攻,运行在OS上的应用甚多,Open-Falcon的开发团队不可能把所有的第三方应用的监控全部做完,这个就需要开源社区提供更多的插件,当前对于很多常用的第三方应用都有相关插件了。

4、JVM监控

对于Java作为主要开发语言的大多数公司,对于JVM的监控不可或缺。

每个JVM应用的参数,比如GC、类加载、JVM内存、进程、线程,都可以上报给Falcon,而这些参数的获得,都可以通过MxBeans实现。

使用 Java 平台管理 bean:http://www.ibm.com/developerworks/cn/java/j-mxbeans/

5、业务应用监控

对于业务需要监控的接口,比如响应时间等。可以根据业务的需要,上报相关数据到Falcon,并通过Falcon查看结果。

官方网址:http://open-falcon.org/

中文文档:https://book.open-falcon.org/zh_0_2/

中英文档:https://book.open-falcon.org

软件下载:https://github.com/open-falcon/falcon-plus/releases

二、Open-Falcon编写的整个脑洞历程

Open-Falcon编写的整个脑洞历程

三、环境准备

1.系统环境

[root@open-falcon-server ~]# cat /etc/redhat-release 
CentOS Linux release 7.2.1511 (Core)

2.系统优化

#安装下载软件
yum install wget -y
 
#更换aliyun源
mv /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.backup
wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
 
#下载epel源
yum install epel-release.noarch -y
rpm -Uvh http://mirrors.aliyun.com/epel/epel-release-latest-7.noarch.rpm
yum clean all
yum makecache
 
#下载常用软件
yum install git telnet net-tools tree nmap sysstat lrzsz dos2unix tcpdump ntpdate -y
 
#配置时间同步
ntpdate cn.pool.ntp.org
 
#更改主机名
hostnamectl set-hostname open-falcon-server
hostname open-falcon-server
 
#开启缓存
sed -i 's#keepcache=0#keepcache=1#g' /etc/yum.conf 
grep keepcache /etc/yum.conf
 
#关闭selinux
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
setenforce 0
 
#关闭防火墙
systemctl stop firewalld.service
systemctl disable firewalld.service

3.软件环境准备

(1)redis准备

#安装 redis
yum install redis -y 

#redis常用命令

redis-server           redis 服务端

redis-cli           redis 命令行客户端

redis-benchmark        redis 性能测试工具

redis-check-aof       AOF文件修复工具

redis-check-dump       RDB文件修复工具

redis-sentinel       Sentinel 服务端 

#启动redis
[root@open-falcon-server ~]# redis-server &
[1] 1662
[root@open-falcon-server ~]# 1662:C 27 Jul 14:44:56.463 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
1662:M 27 Jul 14:44:56.464 * Increased maximum number of open files to 10032 (it was originally set to 1024).
                _._                                                  
           _.-``__ ''-._                                             
      _.-``    `.  `_.  ''-._           Redis 3.2.10 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._                                   
 (    '      ,       .-`  | `,    )     Running in standalone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6379
 |    `-._   `._    /     _.-'    |     PID: 1662
  `-._    `-._  `-./  _.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |           http://redis.io        
  `-._    `-._`-.__.-'_.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |                                  
  `-._    `-._`-.__.-'_.-'    _.-'                                   
      `-._    `-.__.-'    _.-'                                       
          `-._        _.-'                                           
              `-.__.-'                                               

1662:M 27 Jul 14:44:56.464 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1662:M 27 Jul 14:44:56.464 # Server started, Redis version 3.2.10
1662:M 27 Jul 14:44:56.464 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1662:M 27 Jul 14:44:56.464 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1662:M 27 Jul 14:44:56.464 * The server is now ready to accept connections on port 6379

(2)mysql准备

#安装mysql
yum install mariadb mariadb-server -y

#启动mysql
systemctl start mariadb
systemctl enable mariadb

#登录数据库测试
[root@open-falcon-server ~]# mysql -uroot -p
Enter password: 
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 4
Server version: 5.5.56-MariaDB MariaDB Server

Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> exit
Bye

#检查服务
[root@open-falcon-server ~]# netstat -lntp|egrep "3306|6379"
tcp        0      0 0.0.0.0:3306            0.0.0.0:*               LISTEN      1978/mysqld         
tcp        0      0 0.0.0.0:6379            0.0.0.0:*               LISTEN      1662/redis-server * 
tcp6       0      0 :::6379                 :::*                    LISTEN      1662/redis-server * 

#初始化MySQL表结构
cd /tmp/ && git clone https://github.com/open-falcon/falcon-plus.git
cd /tmp/falcon-plus/scripts/mysql/db_schema/
mysql -h 127.0.0.1 -u root -p < 1_uic-db-schema.sql
mysql -h 127.0.0.1 -u root -p < 2_portal-db-schema.sql
mysql -h 127.0.0.1 -u root -p < 3_dashboard-db-schema.sql
mysql -h 127.0.0.1 -u root -p < 4_graph-db-schema.sql
mysql -h 127.0.0.1 -u root -p < 5_alarms-db-schema.sql
rm -rf /tmp/falcon-plus/

#设置数据库密码
mysqladmin -uroot password "123456"

#检查导入的数据库
[root@open-falcon-server ~]# mysql -uroot -p
Enter password: 
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 11
Server version: 5.5.56-MariaDB MariaDB Server

Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| alarms             |
| dashboard          |
| falcon_portal      |
| graph              |
| mysql              |
| performance_schema |
| test               |
| uic                |
+--------------------+
9 rows in set (0.00 sec)

MariaDB [(none)]> exit
Bye

(3)Go安装

#安装go语言开发包
yum install golang -y

#检查版本
[root@open-falcon-server ~]# go version
go version go1.9.4 linux/amd64

#查看Go安装路径
[root@open-falcon-server ~]# find / -name go
/etc/alternatives/go
/var/lib/alternatives/go
/usr/bin/go
/usr/lib/golang/src/cmd/go   #需要这个路径
/usr/lib/golang/src/go
/usr/lib/golang/bin/go
/usr/lib/golang/pkg/linux_amd64/cmd/go
/usr/lib/golang/pkg/linux_amd64/go

四、Open-Falcon后端

#创建工作目录
export FALCON_HOME=/home/work
export WORKSPACE=$FALCON_HOME/open-falcon
mkdir -p $WORKSPACE

#下载解压二进制包
wget https://github.com/open-falcon/falcon-plus/releases/download/v0.2.1/open-falcon-v0.2.1.tar.gz
tar xf open-falcon-v0.2.1.tar.gz -C $WORKSPACE

#查看解压结果
[root@open-falcon-server ~]# cd $WORKSPACE
[root@open-falcon-server open-falcon]# ll
总用量 3896
drwxrwxr-x 7 501 501      67 8月  15 2017 agent
drwxrwxr-x 5 501 501      40 8月  15 2017 aggregator
drwxrwxr-x 5 501 501      40 8月  15 2017 alarm
drwxrwxr-x 6 501 501      51 8月  15 2017 api
drwxrwxr-x 5 501 501      40 8月  15 2017 gateway
drwxrwxr-x 6 501 501      51 8月  15 2017 graph
drwxrwxr-x 5 501 501      40 8月  15 2017 hbs
drwxrwxr-x 5 501 501      40 8月  15 2017 judge
drwxrwxr-x 5 501 501      40 8月  15 2017 nodata
-rwxrwxr-x 1 501 501 3987469 8月  15 2017 open-falcon
lrwxrwxrwx 1 501 501      16 8月  15 2017 plugins -> ./agent/plugins/
lrwxrwxrwx 1 501 501      15 8月  15 2017 public -> ./agent/public/
drwxrwxr-x 5 501 501      40 8月  15 2017 transfer
模块 文件所在路径
aggregator /home/work/aggregator/config/cfg.json
graph /home/work/graph/config/cfg.json
hbs /home/work/hbs/config/cfg.json
nodata /home/work/nodata/config/cfg.json
api /home/work/api/config/cfg.json
alarm /home/work/alarm/config/cfg.json
#修改配置文件
sed -i 's#root:@tcp(127.0.0.1:3306)#root:123456@tcp(127.0.0.1:3306)#g' `find ./ -type f -name "cfg.json"|egrep "alarm|api|nodata|hbs|graph|aggregator"`
cat `find ./ -type f -name "cfg.json"|egrep "alarm|api|nodata|hbs|graph|aggregator"` |grep 'root:123456@tcp(127.0.0.1:3306)'

#启动后端模块
[root@open-falcon-server open-falcon]# cd /home/work/open-falcon
[root@open-falcon-server open-falcon]# ./open-falcon start
[falcon-graph] 5583
[falcon-hbs] 5592
[falcon-judge] 5600
[falcon-transfer] 5606
[falcon-nodata] 5613
[falcon-aggregator] 5620
[falcon-agent] 5628
[falcon-gateway] 5635
[falcon-api] 5641
[falcon-alarm] 5653

#检查服务启动状态
[root@open-falcon-server open-falcon]# ./open-falcon check
        falcon-graph         UP            5583 
          falcon-hbs         UP            5592 
        falcon-judge         UP            5600 
     falcon-transfer         UP            5606 
       falcon-nodata         UP            5613 
   falcon-aggregator         UP            5620 
        falcon-agent         UP            5628 
      falcon-gateway         UP            5635 
          falcon-api         UP            5641 
        falcon-alarm         UP            5653

#更多命令行工具用法
# ./open-falcon [start|stop|restart|check|monitor|reload] module
./open-falcon start agent
 
./open-falcon check
        falcon-graph         UP           53007
          falcon-hbs         UP           53014
        falcon-judge         UP           53020
     falcon-transfer         UP           53026
       falcon-nodata         UP           53032
   falcon-aggregator         UP           53038
        falcon-agent         UP           53044
      falcon-gateway         UP           53050
          falcon-api         UP           53056
        falcon-alarm         UP           53063
 
#For debugging , You can check $WorkDir/$moduleName/log/logs/xxx.log
至此后端部署完成。

#其他用法
重载配置(备注:修改vi cfg.json配置文件后,可以用下面命令重载配置)

curl 127.0.0.1:1988/config/reload

五、Open-Falcon前端

#创建工作目录
export HOME=/home/work
export WORKSPACE=$HOME/open-falcon
mkdir -p $WORKSPACE
cd $WORKSPACE

#克隆前端组件代码
git clone https://github.com/open-falcon/dashboard.git

#安装依赖包
yum install -y python-virtualenv
yum install -y python-devel
yum install -y openldap-devel
yum install -y mysql-devel
yum groupinstall "Development tools" -y

#下载ez_setup.py
cd ~
wget --no-check-certificate https://bootstrap.pypa.io/ez_setup.py
python ez_setup.py --insecure

#下载安装pip
wget https://pypi.python.org/packages/11/b6/abcb525026a4be042b486df43905d6893fb04f05aac21c32c638e939e447/pip-9.0.1.tar.gz#md5=35f01da33009719497f01a4ba69d63c9
tar xf pip-9.0.1.tar.gz 
cd pip-9.0.1
python setup.py install

#解决pip安装慢
mkdir -p ~/.pip
echo '[global]' >>~/.pip/pip.conf
echo 'index-url = https://pypi.tuna.tsinghua.edu.cn/simple' >>~/.pip/pip.conf

#测试是否可用
[root@open-falcon-server ~]# cd /home/work/open-falcon/dashboard
[root@open-falcon-server dashboard]# pip -V
pip 9.0.1 from /usr/lib/python2.7/site-packages/pip-9.0.1-py2.7.egg (python 2.7)
[root@open-falcon-server dashboard]# pip

Usage:   
  pip  [options]

Commands:
  install                     Install packages.
  download                    Download packages.
  uninstall                   Uninstall packages.
  freeze                      Output installed packages in requirements format.
  list                        List installed packages.
  show                        Show information about installed packages.
  check                       Verify installed packages have compatible dependencies.
  search                      Search PyPI for packages.
  wheel                       Build wheels from your requirements.
  hash                        Compute hashes of package archives.
  completion                  A helper command used for command completion.
  help                        Show help for commands.

General Options:
  -h, --help                  Show help.
  --isolated                  Run pip in an isolated mode, ignoring environment variables and user configuration.
  -v, --verbose               Give more output. Option is additive, and can be used up to 3 times.
  -V, --version               Show version and exit.
  -q, --quiet                 Give less output. Option is additive, and can be used up to 3 times (corresponding to WARNING, ERROR, and CRITICAL logging levels).
  --log                 Path to a verbose appending log.
  --proxy              Specify a proxy in the form [user:passwd@]proxy.server:port.
  --retries          Maximum number of retries each connection should attempt (default 5 times).
  --timeout              Set the socket timeout (default 15 seconds).
  --exists-action     Default action when a path already exists: (s)witch, (i)gnore, (w)ipe, (b)ackup, (a)bort.
  --trusted-host    Mark this host as trusted, even though it does not have valid or any HTTPS.
  --cert                Path to alternate CA bundle.
  --client-cert         Path to SSL client certificate, a single file containing the private key and the certificate in PEM format.
  --cache-dir            Store the cache data in .
  --no-cache-dir              Disable the cache.
  --disable-pip-version-check
                              Don't periodically check PyPI to determine whether a new version of pip is available for download. Implied with --no-index.

#查看需要安装模块
[root@open-falcon-server dashboard]# cat pip_requirements.txt 
Flask==0.10.1
Flask-Babel==0.9
Jinja2==2.7.2
Werkzeug==0.9.4
gunicorn==19.5.0
python-dateutil==2.2
requests==2.3.0
mysql-python
python-ldap

#安装模块
pip install -r pip_requirements.txt

#修改配置文件

配置说明:

dashboard的配置文件为: 'rrd/config.py',根据实际情况修改:

# API_ADDR 表示后端api组件的地址
API_ADDR = "http://127.0.0.1:8080/api/v1"
 
# 根据实际情况,修改PORTAL_DB_*, 默认用户名为root,默认密码为""
# 根据实际情况,修改ALARM_DB_*, 默认用户名为root,默认密码为""

配置修改:

cp rrd/config.py{,.bak}
vim rrd/config.py

修改内容:

# Falcon+ API
API_ADDR = os.environ.get("API_ADDR","http://10.0.0.100:8080/api/v1")

# portal database
# TODO: read from api instead of db
PORTAL_DB_HOST = os.environ.get("PORTAL_DB_HOST","10.0.0.100")
PORTAL_DB_PORT = int(os.environ.get("PORTAL_DB_PORT",3306))
PORTAL_DB_USER = os.environ.get("PORTAL_DB_USER","root")
PORTAL_DB_PASS = os.environ.get("PORTAL_DB_PASS","123456")
PORTAL_DB_NAME = os.environ.get("PORTAL_DB_NAME","falcon_portal")

# alarm database
# TODO: read from api instead of db
ALARM_DB_HOST = os.environ.get("ALARM_DB_HOST","10.0.0.100")
ALARM_DB_PORT = int(os.environ.get("ALARM_DB_PORT",3306))
ALARM_DB_USER = os.environ.get("ALARM_DB_USER","root")
ALARM_DB_PASS = os.environ.get("ALARM_DB_PASS","123456")
ALARM_DB_NAME = os.environ.get("ALARM_DB_NAME","alarms")

#启动服务
[root@open-falcon-server dashboard]# virtualenv ./env
New python executable in /home/work/open-falcon/dashboard/env/bin/python
Installing setuptools, pip, wheel...done.
[root@open-falcon-server dashboard]# source env/bin/activate
(env) [root@open-falcon-server dashboard]# ./control start
falcon-dashboard started..., pid=20814
(env) [root@open-falcon-server dashboard]# ./control tail
[2018-07-27 16:37:02 +0000] [20814] [INFO] Starting gunicorn 19.5.0
[2018-07-27 16:37:02 +0000] [20814] [INFO] Listening at: http://0.0.0.0:8081 (20814)
[2018-07-27 16:37:02 +0000] [20814] [INFO] Using worker: sync
[2018-07-27 16:37:02 +0000] [20819] [INFO] Booting worker with pid: 20819
[2018-07-27 16:37:02 +0000] [20820] [INFO] Booting worker with pid: 20820
[2018-07-27 16:37:02 +0000] [20821] [INFO] Booting worker with pid: 20821
[2018-07-27 16:37:02 +0000] [20826] [INFO] Booting worker with pid: 20826

^C
(env) [root@open-falcon-server dashboard]# deactivate

六、访问网站

http://10.0.0.100:8081

运维监控系统之Open-Falcon_第1张图片
访问网站
#dashbord用户管理
dashbord没有默认创建任何账号包括管理账号,需要你通过页面进行注册账号。
想拥有管理全局的超级管理员账号,需要手动注册用户名为root的账号(第一个帐号名称为root的用户会被自动设置为超级管理员)。
超级管理员可以给普通用户分配权限管理。

小提示:注册账号能够被任何打开dashboard页面的人注册,所以当给相关的人注册完账号后,需要去关闭注册账号功能。只需要去修改api组件的配置文件cfg.json,将signup_disable配置项修改为true,重启api即可。当需要给人开账号的时候,再将配置选项改回去,用完再关掉即可。
运维监控系统之Open-Falcon_第2张图片
首页

七、Open-Falcon客户端

#服务端操作
[root@open-falcon-server ~]# cd /home/work/open-falcon
[root@open-falcon-server open-falcon]# scp -r agent [email protected]:/home/
[root@open-falcon-server open-falcon]# scp -r open-falcon [email protected]:/home/

#客户端操作
[root@open-falcon-client ~]# mkdir -p /home/work/open-falcon
[root@open-falcon-client ~]# mkdir -p /home/work/open-falcon
[root@open-falcon-client ~]# mv /home/open-falcon /home/agent /home/work/open-falcon
[root@open-falcon-client ~]# cd /home/work/open-falcon
[root@open-falcon-client open-falcon]# vim agent/config/cfg.json

修改内容:

{
    "debug": true,  # 控制一些debug信息的输出,生产环境通常设置为false
    "hostname": "", # agent采集了数据发给transfer,endpoint就设置为了hostname,默认通过`hostname`获取,如果配置中配置了hostname,就用配置中的
    "ip": "", # agent与hbs心跳的时候会把自己的ip地址发给hbs,agent会自动探测本机ip,如果不想让agent自动探测,可以手工修改该配置
    "plugin": {
        "enabled": false, # 默认不开启插件机制
        "dir": "./plugin",  # 把放置插件脚本的git repo clone到这个目录
        "git": "https://github.com/open-falcon/plugin.git", # 放置插件脚本的git repo地址
        "logs": "./logs" # 插件执行的log,如果插件执行有问题,可以去这个目录看log
    },
    "heartbeat": {
        "enabled": true,  # 此处enabled要设置为true
        "addr": "10.0.0.100:6030", # hbs的地址,端口是hbs的rpc端口
        "interval": 60, # 心跳周期,单位是秒
        "timeout": 1000 # 连接hbs的超时时间,单位是毫秒
    },
    "transfer": {
        "enabled": true, 
        "addrs": [
            "10.0.0.100:18433"
        ],  # transfer的地址,端口是transfer的rpc端口, 可以支持写多个transfer的地址,agent会保证HA
        "interval": 60, # 采集周期,单位是秒,即agent一分钟采集一次数据发给transfer
        "timeout": 1000 # 连接transfer的超时时间,单位是毫秒
    },
    "http": {
        "enabled": true,  # 是否要监听http端口
        "listen": ":1988",
        "backdoor": false
    },
    "collector": {
        "ifacePrefix": ["eth", "em"], # 默认配置只会采集网卡名称前缀是eth、em的网卡流量,配置为空就会采集所有的,lo的也会采集。可以从/proc/net/dev看到各个网卡的流量信息
        "mountPoint": []
    },
    "default_tags": {
    },
    "ignore": {  # 默认采集了200多个metric,可以通过ignore设置为不采集
        "cpu.busy": true,
        "df.bytes.free": true,
        "df.bytes.total": true,
        "df.bytes.used": true,
        "df.bytes.used.percent": true,
        "df.inodes.total": true,
        "df.inodes.free": true,
        "df.inodes.used": true,
        "df.inodes.used.percent": true,
        "mem.memtotal": true,
        "mem.memused": true,
        "mem.memused.percent": true,
        "mem.memfree": true,
        "mem.swaptotal": true,
        "mem.swapused": true,
        "mem.swapfree": true
    }
}

#启动服务
./open-falcon start agent  启动进程
./open-falcon stop agent  停止进程
./open-falcon monitor agent  查看日志

看var目录下的log是否正常,或者浏览器访问其1988端口。另外agent提供了一个--check参数,可以检查agent是否可以正常跑在当前机器上

cd /home/work/open-falcon/agent/bin/
./falcon-agent --check

进入监控界面查看:

运维监控系统之Open-Falcon_第3张图片
监控界面

八、参考文档

## Open-Falcon
# 运维监控系统之Open-Falcon
https://www.cnblogs.com/nulige/p/7741580.html

# open-falcon安装使用监控树莓派
https://yq.aliyun.com/articles/437196

# 小米运维架构服务监控Open-Falcon
https://blog.csdn.net/qq_27384769/article/details/79234270

# 架构师的成长之路-博客-导图
https://github.com/csy512889371/learnDoc

# Open-Falcon编写的整个脑洞历程
http://mp.weixin.qq.com/s?__biz=MjM5OTcxMzE0MQ==&mid=400225178&idx=1&sn=c98609a9b66f84549e41cd421b4df74d

你可能感兴趣的:(运维监控系统之Open-Falcon)