最近机房刚上了一批机器(有100台左右),需要使用Nagios对这一批机器进行监控。领导要求两天时间完成所有主机的监控。从原来的经验来看,两天时间肯定完成不了。那怎么办?按照之前的想法,肯定是在nagios配置文件逐一添加每台客户端的监控信息,工作量巨大。突然,想到一个想法,是否可以通过脚本来实现批量对主机进行监控,也就是运维自动化。
写脚本,最重要的就是思路。思路压倒一切,经过思考最终决定就这么做了。先贴出来一张网路拓扑图:
整个过程可以分为三部分。
cmdb端:主要用来实现对数据的收集,采用两个API,一个是提供给客户机的API。用于将客户端的数据上传的cmdb服务器;另外一个API是nagios通过此API可以得到要监控主机的信息,然后对该信息进行整理,做成nagios监控模板。
Client端:通过Python脚本收集本机器要监控的软硬件信息,然后通过cmdb端提供的API接口将数据上传到cmdb端的数据库。
Nagios端:通过cmdb端提供的API接口实现对cmdb收集到的信息进行抓取,然后将数据写入到模板,最后copy到naigos指定的objects目录,最终实现对客户机的监控。
这三部分最重要的应该是CMDB端。接下来通过安装django和编写API接口实现cmdb可以正常工作。可以将cmdb端分为三个步骤来完成:
安装django
配置django
编写API接口
首先来进行安装django:
在安装django之前首先应该安装python(版本建议2.7.)
1
2
3
4
5
6
|
1.
下载django软件包
可以到django官方网站下载最新django软件包(https:
/
/
www.djangoproject.com).
2.
解压缩并安装软件包
tar
-
zxvf Django
-
1.5
.
1.tar
.gz
cd Django
-
1.5
.
1
python setup.py install
|
创建项目和应用:
1
2
3
4
|
1.
创建一个项目
python startproject simplecmdb
2.
创建一个应用
python startapp hostinfo
|
配置django:
1.修改setting.py
DATABASES = {'ENGIN':'django.db.backends.sqlite','name':path.join('CMDB.db')} #使用的数据库及数据库名
INSTALLED_APPS =(hostinfoINSTALLED_APPS = ('hostinfo')
INSTALLED_APPS = ('hostinfo') #应用的名称
2.修改urls.py
url(r'^api/gethost\.json$','hostinfo.views.gethosts'), #nagios客户端访问API接口地址
url(r'^api/clooect$','hostinfo.views.collect'), #客户端访问API进行上传数据的API
url(r'^admin/',include(admin.site.urls)), #django后台管理登入url
from django.contrib import admin
admin.autodiscover()
3.修改项目hostinfo下的views.py
代码如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
|
# Create your views here.
#包含以下模块
from django.shortcuts import render_to_response
from django.http import HttpResponse
from models import Host, HostGroup
#包含json模块
try
:
import json
except ImportError,e:
import simplejson
as
json
#用来接收客户端服务器发送过来的数据
def collect(request):
req = request
if
req.POST:
vendor = req.POST.get(
'Product_Name'
)
sn = req.POST.get(
'Serial_Number'
)
product = req.POST.get(
'Manufacturer'
)
cpu_model = req.POST.get(
'Model_Name'
)
cpu_num = req.POST.get(
'Cpu_Cores'
)
cpu_vendor = req.POST.get(
'Vendor_Id'
)
memory_part_number = req.POST.get(
'Part_Number'
)
memory_manufacturer = req.POST.get(
'Manufacturer'
)
memory_size = req.POST.get(
'Size'
)
device_model = req.POST.get(
'Device_Model'
)
device_version = req.POST.get(
'Firmware_Version'
)
device_sn = req.POST.get(
'Serial_Number'
)
device_size = req.POST.get(
'User_Capacity'
)
osver = req.POST.get(
'os_version'
)
hostname = req.POST.get(
'os_name'
)
os_release = req.POST.get(
'os_release'
)
ipaddrs = req.POST.get(
'Ipaddr'
)
mac = req.POST.get(
'Device'
)
link = req.POST.get(
'Link'
)
mask = req.POST.get(
'Mask'
)
device = req.POST.get(
'Device'
)
host = Host()
host.hostname = hostname
host.product = product
host.cpu_num = cpu_num
host.cpu_model = cpu_model
host.cpu_vendor = cpu_vendor
host.memory_part_number = memory_part_number
host.memory_manufacturer = memory_manufacturer
host.memory_size = memory_size
host.device_model = device_model
host.device_version = device_version
host.device_sn = device_sn
host.device_size = device_size
host.osver = osver
host.os_release = os_release
host.vendor = vendor
host.sn = sn
host.ipaddr = ipaddrs
host.save() #将客户端传过来的数据通过POST接收,存入数据库
return
HttpResponse(
'OK'
) #如果插入成功,返回
'ok'
else
:
return
HttpResponse(
'no post data'
)
#提供给NAGIOS 的API
def gethosts(req):
d = []
hostgroups = HostGroup.objects.all()
for
hg in hostgroups:
ret_hg = {
'hostgroup'
:hg.name,
'members'
:[]}
members = hg.members.all()
for
h in members:
ret_h = {
'hostname'
:h.hostname, #API接口返回的数据
'ipaddr'
:h.ipaddr
}
ret_hg[
'members'
].append(ret_h)
d.append(ret_hg)
ret = {
'status'
:0,
'data'
:d,
'message'
:
'ok'
}
return
HttpResponse(json.dumps(ret))
|
4.修改model.py 文件
代码如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
|
from django.db import models
# Create your models here.
#插入数据库的Host表,主要存储客户端主机的信息
class
Host(models.Model):
""
"store host information"
""
vendor = models.CharField(max_length=30,null=True)
sn = models.CharField(max_length=30,null=True)
product = models.CharField(max_length=30,null=True)
cpu_model = models.CharField(max_length=50,null=True)
cpu_num = models.CharField(max_length=2,null=True)
cpu_vendor = models.CharField(max_length=30,null=True)
memory_part_number = models.CharField(max_length=30,null=True)
memory_manufacturer = models.CharField(max_length=30,null=True)
memory_size = models.CharField(max_length=20,null=True)
device_model = models.CharField(max_length=30,null=True)
device_version = models.CharField(max_length=30,null=True)
device_sn = models.CharField(max_length=30,null=True)
device_size = models.CharField(max_length=30,null=True)
osver = models.CharField(max_length=30,null=True)
hostname = models.CharField(max_length=30,null=True)
os_release = models.CharField(max_length=30,null=True)
ipaddr = models.IPAddressField(max_length=15)
def __unicode__(self):
return
self.hostname
#主机组表,用来对主机进行分组
class
HostGroup(models.Model):
name = models.CharField(max_length=30)
members = models.ManyToManyField(Host)
|
5.修改admin.py文件
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
|
#from models import Host, IPaddr
from models import Host, HostGroup
from django.contrib import admin
#设置在django在admin后天显示的名称
class
HostAdmin(admin.ModelAdmin):
list_display = [
'vendor'
,
'sn'
,
'product'
,
'cpu_model'
,
'cpu_num'
,
'cpu_vendor'
,
'memory_part_number'
,
'memory_manufacturer'
,
'memory_size'
,
'device_model'
,
'device_version'
,
'device_sn'
,
'device_size'
,
'osver'
,
'hostname'
,
'os_release'
]
#在django后台amdin显示的组名称
class
HostGroupAdmin(admin.ModelAdmin):
list_display = [
'name'
,]
#将如上两个类的数据展示到django的后台
admin.site.register(HostGroup,HostGroupAdmin)
admin.site.register(Host, HostAdmin)
|
6.创建数据库
python manager.py syncdb #创建数据库
7.启动应用
python manager.py runserver 0.0.0.0:8000
8.测试
http://132.96.77.12:8000/admin
通过上图可以看到,django已经配置成功。
接下来可以在客户端编写收集主机信息的脚本了,主要抓取cpu、内存、硬盘、服务器型号、服务器sn、ip地址、主机名称、操作系统版本等信息,共7个脚本:
1.cpu抓取脚本:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
|
#!/usr/local/src/python/bin/python
#-*- coding:utf-8 -*-
from subprocess import PIPE,Popen
import re
def getCpuInfo():
p = Popen([
'cat'
,
'/proc/cpuinfo'
],shell=False,stdout=PIPE)
stdout, stderr = p.communicate()
return
stdout.strip()
def parserCpuInfo(cpudata):
pd = {}
model_name = re.compile(r
'.*model name\s+:\s(.*)'
)
vendor_id = re.compile(r
'vendor_id\s+:(.*)'
)
cpu_cores = re.compile(r
'cpu cores\s+:\s([\d]+)'
)
lines = [line
for
line in cpudata.split(
'\n'
)]
for
line in lines:
model = re.match(model_name,line)
vendor = re.match(vendor_id,line)
cores = re.match(cpu_cores,line)
if
model:
pd[
'Model_Name'
] = model.groups()[0].strip()
if
vendor:
pd[
'Vendor_Id'
] = vendor.groups()[0].strip()
if
cores:
pd[
'Cpu_Cores'
] = cores.groups()[0]
else
:
pd[
'Cpu_Cores'
] = int(
'1'
)
return
pd
if
__name__ ==
'__main__'
:
cpudata = getCpuInfo()
print
parserCpuInfo(cpudata)
|
2.硬盘抓取脚本:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
|
#!/usr/local/src/python/bin/python
#-*- coding:utf-8 -*-
from subprocess import PIPE,Popen
import re
def getDiskInfo():
disk_dev = re.compile(r
'Disk\s/dev/[a-z]{3}'
)
disk_name = re.compile(r
'/dev/[a-z]{3}'
)
p = Popen([
'fdisk'
,
'-l'
],shell=False,stdout=PIPE)
stdout, stderr = p.communicate()
for
i in stdout.split(
'\n'
):
disk = re.match(disk_dev,i)
if
disk:
dk = re.search(disk_name,disk.group()).group()
n = Popen(
'smartctl -i %s'
% dk,shell=True,stdout=PIPE)
stdout, stderr = n.communicate()
return
stdout.strip()
def parserDiskInfo(diskdata):
ld = []
pd = {}
device_model = re.compile(r
'(Device Model):(\s+.*)'
)
serial_number = re.compile(r
'(Serial Number):(\s+[\d\w]{1,30})'
)
firmware_version = re.compile(r
'(Firmware Version):(\s+[\w]{1,20})'
)
user_capacity = re.compile(r
'(User Capacity):(\s+[\d\w, ]{1,50})'
)
for
line in diskdata.split(
'\n'
):
serial = re.search(serial_number,line)
device = re.search(device_model,line)
firmware = re.search(firmware_version,line)
user = re.search(user_capacity,line)
if
device:
pd[
'Device_Model'
] = device.groups()[1].strip()
if
serial:
pd[
'Serial_Number'
] = serial.groups()[1].strip()
if
firmware:
pd[
'Firmware_Version'
] = firmware.groups()[1].strip()
if
user:
pd[
'User_Capacity'
] = user.groups()[1].strip()
return
pd
if
__name__ ==
'__main__'
:
diskdata = getDiskInfo()
print
parserDiskInfo(diskdata)
|
3.内存抓取脚本:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
|
#!/usr/local/src/python/bin/python
#-*- coding:utf-8 -*-
from subprocess import PIPE,Popen
import re
import sys
def getMemInfo():
p = Popen([
'dmidecode'
],shell=False,stdout=PIPE)
stdout, stderr = p.communicate()
return
stdout.strip()
def parserMemInfo(memdata):
line_in = False
mem_str =
''
pd = {}
fd = {}
for
line in memdata.split(
'\n'
):
if
line.startswith(
'Memory Device'
)
and
line.endswith(
'Memory Device'
):
line_in = True
mem_str+=
'\n'
continue
if
line.startswith(
'\t'
)
and
line_in:
mem_str+=line
else
:
line_in = False
for
i in mem_str.split(
'\n'
)[1:]:
lines = i.replace(
'\t'
,
'\n'
).strip()
for
ln in lines.split(
'\n'
):
k, v = [i
for
i in ln.split(
':'
)]
pd[k.strip()] = v.strip()
if
pd[
'Size'
] !=
'No Module Installed'
:
mem_info =
'Size:%s ; Part_Number:%s ; Manufacturer:%s'
% (pd[
'Size'
],pd[
'Part Number'
],pd[
'Manufacturer'
])
for
line in mem_info.split(
'\n'
):
for
word in line.split(
';'
):
k, v = [i.strip()
for
i in word.split(
':'
)]
fd[k] = v.strip()
yield fd
if
__name__ ==
'__main__'
:
memdata = getMemInfo()
for
i in parserMemInfo(memdata):
print
i
|
4.抓取服务器信息脚本:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
|
#!/usr/local/src/python/bin/python
# -*- coding:utf-8 -*-
from subprocess import PIPE,Popen
import urllib, urllib2
def getDMI():
p = Popen(
'dmidecode'
,shell=True,stdout=PIPE)
stdout, stderr = p.communicate()
return
stdout
def parserDMI(dmidata):
pd = {}
fd = {}
line_in = False
for
line in dmidata.split(
'\n'
):
if
line.startswith(
'System Information'
):
line_in = True
continue
if
line.startswith(
'\t'
)
and
line_in:
k, v = [i.strip()
for
i in line.split(
':'
)]
pd[k] = v
else
:
line_in = False
name =
"Manufacturer:%s ; Serial_Number:%s ; Product_Name:%s"
% (pd[
'Manufacturer'
],pd[
'Serial Number'
],pd[
'Product Name'
])
for
i in name.split(
';'
):
k, v = [j.strip()
for
j in i.split(
':'
)]
fd[k] = v
return
fd
if
__name__ ==
'__main__'
:
dmidata = getDMI()
postdata = parserDMI(dmidata)
print
postdata
|
5.抓取主机信息
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
#!/usr/local/src/python/bin/python
#-*- coding:utf-8 -*-
import platform
def getHostInfo():
pd ={}
version = platform.dist()
os_name = platform.node()
os_release = platform.release()
os_version =
'%s %s'
% (version[0],version[1])
pd[
'os_name'
] = os_name
pd[
'os_release'
] = os_release
pd[
'os_version'
] = os_version
return
pd
if
__name__ ==
'__main__'
:
print
getHostInfo()
|
6.抓取ip地址:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
|
#!/usr/local/src/python/bin/python
#-*- coding:utf-8 -*-
from subprocess import PIPE,Popen
import re
def getIpaddr():
p = Popen([
'ifconfig'
],shell=False,stdout=PIPE)
stdout, stderr = p.communicate()
return
stdout.strip()
def parserIpaddr(ipdata):
device = re.compile(r
'(eth\d)'
)
ipaddr = re.compile(r
'(inet addr:[\d.]{7,15})'
)
mac = re.compile(r
'(HWaddr\s[0-9A-Fa-f:]{17})'
)
link = re.compile(r
'(Link encap:[\w]{3,14})'
)
mask = re.compile(r
'(Mask:[\d.]{9,15})'
)
for
lines in ipdata.split(
'\n\n'
):
pd = {}
eth_device = re.search(device,lines)
inet_ip = re.search(ipaddr,lines)
hw = re.search(mac,lines)
link_encap = re.search(link,lines)
_mask = re.search(mask,lines)
if
eth_device:
if
eth_device:
Device = eth_device.groups()[0]
if
inet_ip:
Ipaddr = inet_ip.groups()[0].split(
':'
)[1]
if
hw:
Mac = hw.groups()[0].split()[1]
if
link_encap:
Link = link_encap.groups()[0].split(
':'
)[1]
if
_mask:
Mask = _mask.groups()[0].split(
':'
)[1]
pd[
'Device'
] = Device
pd[
'Ipaddr'
] = Ipaddr
pd[
'Mac'
] = Mac
pd[
'Link'
] = Link
pd[
'Mask'
] = Mask
yield pd
if
__name__ ==
'__main__'
:
ipdata = getIpaddr()
for
i in parserIpaddr(ipdata):
print
i
|
7.对这些信息进行合并,并通过API形式将数据发送给cmdb端
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
|
#!/usr/local/src/python/bin/python
import urllib, urllib2
from cpuinfo import *
from diskinfo import *
from meminfo import *
from product import *
from hostinfo import *
from ipaddress import *
def getHostTotal():
ld = []
cpuinfo = parserCpuInfo(getCpuInfo())
diskinfo = parserDiskInfo(getDiskInfo())
for
i in parserMemInfo(getMemInfo()):
meminfo = i
productinfo = parserDMI(getDMI())
hostinfo = getHostInfo()
ipaddr = parserIpaddr(getIpaddr())
for
i in ipaddr:
ip = i
for
k in cpuinfo.iteritems():
ld.append(k)
for
i in diskinfo.iteritems():
ld.append(i)
for
j in meminfo.iteritems():
ld.append(j)
for
v in productinfo.iteritems():
ld.append(v)
for
x in hostinfo.iteritems():
ld.append(x)
for
y in ip.iteritems():
ld.append(y)
return
ld
def parserHostTotal(hostdata):
pg = {}
for
i in hostdata:
pg[i[0]] = i[1]
return
pg
def urlPost(postdata):
data = urllib.urlencode(postdata)
req = urllib2.Request(
'http://132.96.77.12:8000/api/collect'
,data)
response = urllib2.urlopen(req)
return
response.read()
if
__name__ ==
'__main__'
:
hostdata = getHostTotal()
postdata = parserHostTotal(hostdata)
print
urlPost(postdata)
|
到目前为止,cmdb系统已经可以将所有客户端的主机信息写入到数据库,并且可以通过nagios端的API接口直接调到数据:
http://132.96.77.12:8000/api/gethosts.json
通过图可以看到,已经成功调用到API接口的数据。
接下来可以在nagios端进行调用API接口的数据,对数据进行格式化。并写入文件。
1.nagios脚本
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
|
#!/opt/data/py/bin/python
#!-*- coding:utf-8 -*-
import urllib, urllib2
import json
import os
import shutil
CURR_DIR = os.path.abspath(os.path.dirname(
__file__
))
HOST_CONF_DIR = os.path.join(CURR_DIR,
'hosts'
)
HOST_TMP =
""
"define host {
use
linux-server
host_name %(hostname)s
check_command check-host-alive
alias %(hostname)s
address %(ipaddr)s
contact_groups admins
}
""
"
def getHosts():
url =
'http://132.96.77.12:8000/api/gethosts.json'
return
json.loads(urllib2.urlopen(url).read())
def initDir():
if
not os.path.exists(HOST_CONF_DIR):
os.
mkdir
(HOST_CONF_DIR)
def writeFile(f,s):
with open(f,
'w'
)
as
fd:
fd.write(s)
def genNagiosHost(hostdata):
initDir()
conf = os.path.join(HOST_CONF_DIR,
'hosts.cfg'
)
hostconf =
""
for
hg in hostdata:
for
h in hg[
'members'
]:
hostconf+=HOST_TMP %h
writeFile(conf,hostconf)
return
"ok"
def main():
result = getHosts()
if
result[
'status'
] == 0:
print
genNagiosHost(result[
'data'
])
else
:
print
'Error: %s'
% result[
'message'
]
if
os.path.exists(os.path.join(HOST_CONF_DIR,
'hosts.cfg'
)):
os.
chdir
(HOST_CONF_DIR)
shutil.copyfile(
'hosts.cfg'
,
'/etc/nagios/objects/hosts.cfg'
)
if
__name__ ==
"__main__"
:
main()
|
现在已经生成nagios主机的配置文件,并copy到nagios/objects目录下hosts.cfg。接下来可以测试是否nagios配置有问题,如果没有问题,就可以启动nagios服务
[root@yetcomm-v2 bin]# ./nagios -v /etc/nagios/nagios.cfg
通过测试,nagios没有发生错误或警告信息,现在可以启动nagios服务:
[root@yetcomm-v2 bin]# service nagios restart
最后,可以通过浏览器查看nagios的监控界面:
通过上图,可以看到已经将一台主机加入到监控组。由于是生产环境,所有只能拿测试服务器进行测试。其实测试环境和生产环境的代码完全一致。