Prometheus容器状态监控

目录 ==>

测试开发系列文章

准备工作不重要,不感兴趣可直接从第二节开始

准备工作

代码见于 https://gitee.com/bbjg001/stress_locust

在Locust压力测试中,为了测试启动了一个简单的http服务。为了使CPU、内存的使用效果更明显,这里在其基础上做了一点改动,改变原来读数据库的方式为读写文件。

#!/usr/local/bin/python
import numpy as np
from socketserver import ThreadingMixIn
from http.server import HTTPServer
from http.server import SimpleHTTPRequestHandler
from sys import argv
import logging
from .data_generate import getEmployee

def dataOperateFile(eid):
    with open('/data/work_data/data.data') as f:
        lines = f.read().splitlines()
    ee = []
    for line in lines:
        ei = line.split(' ')
        if eid==str(ei[0]):
            ee = ei
            break
    if ee:
        lines.remove(line)
        lines.append(getEmployee(id=eid))
        with open('/data/work_data/data.data', 'w') as f:
            f.write('\n'.join(lines))
        return {'name': ee[1], 'age': ee[2], 'sex': '女' if ee[3] == '0' else '男'}
    return 'no this employee by eid={}'.format(eid)

class ThreadingServer(ThreadingMixIn, HTTPServer):
    pass

class RequestHandler(SimpleHTTPRequestHandler):
    def do_GET(self):
        self.send_response(200)
        self.send_header('Content-type', 'text/plain;charset=utf-8')
        try:
            eid = int(self.path[1:])
        except:
            eid = -1
        response = dataOperateFile(eid)
        logging.info('request of {} by eid={}'.format(response, eid))
        self.end_headers()
        self.wfile.write(str(response).encode())

def run(server_class=ThreadingServer, handler_class=RequestHandler, port=8888):
    server_address = ('0.0.0.0', port)
    httpd = server_class(server_address, handler_class)
    logging.info('server start in http://{}:{}'.format(*server_address))
    try:
        httpd.serve_forever()
    except KeyboardInterrupt:
        pass
    httpd.server_close()

def main():
    logging.basicConfig(level=logging.INFO, format='%(levelname)-8s %(asctime)s: %(message)s',
                        datefmt='%m-%d %H:%M')
    if len(argv) == 2:
        run(port=int(argv[1]))
    else:
        run()

if __name__ == '__main__':
    main()

这次通过起一个容器来部署这个http服务,Dockerfile文件如下。

FROM python:3.9

ARG wd=/data/work_data

WORKDIR $wd
ENV WORKDIR $wd
ENV SERVERMOD file

RUN mkdir -p $wd

COPY . ${wd}/

RUN pip install pipreqs && \
    pip install -r requirements.txt

ENV TZ=Asia/Shanghai
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone

ENTRYPOINT ["python"]
# 构建镜像
docker build -t iserver -f Dockerfile .
# 构建容器,8088端口给http服务,9100给node_exporter
docker run -tid --name tserver -p 8088:8088 -p 9100:9100 --net host --cpus 2 -m 2G iserver
# 进入容器
docker exec -it tserver bash	

# 构建数据文件
python data_generate.py
# 启动http服务
python testserver.py 8088

Prometheus

https://prometheus.io/

开始:https://prometheus.io/docs/prometheus/latest/getting_started/

是一个中间节点,可以接收各种监控器穿上来的监控数据

Prometheus容器状态监控_第1张图片

压缩包安装

可以在这里拿到压缩包的地址,下载并解压

wget https://github.com/prometheus/prometheus/releases/download/v2.32.0-rc.1/prometheus-2.32.0-rc.1.linux-amd64.tar.gz
tar -zxf prometheus-2.32.0-rc.1.linux-amd64.tar.gz
# 鉴于这个名字太冗长了,重命名下
mv prometheus-2.32.0-rc.1.linux-amd64 prometheus
cd prometheus
ls
# prometheus			# 这个是启动文件
# prometheus.yml	# 这个是启动依赖的配置文件

prometheus.yml(部分)

其中的 localhost:9090 是prometheus的启动端口

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:9090"]

启动它

./prometheus --config.file=prometheus.yml

最后一条日志显示已经启动

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-GGSQxVqa-1642353046174)(/Users/darcyzhang/Library/Application%20Support/typora-user-images/image-20211211001856185.png)]

此时即可通过前端访问prometheus的管理监控界面

http://192.168.5.217:9090/			# 请换成自己的ip

docker安装

作为另外的安装方式,通过上面的方式安装了不必要再装一次

构造配置文件,会映射给docker容器

vim prometheus.yml

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']
# 启动它
docker run  -d \
	--name prometheus \
  -p 9090:9090 \
  -v /data/zy/work_data/moni_dev/configs/prometheus.yml:/etc/prometheus/prometheus.yml  \
  prom/prometheus

exporter

个人理解Prometheus是一个中间组件,真正需要采集数据的是exporter。prometheus提供了一系列的exporter供选择。这里使用node_exporter做演示。

找到node_export,拿到链接,去把它下载到容器上启动

Prometheus容器状态监控_第2张图片

wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz
tar -zxf node_exporter-1.3.1.linux-amd64.tar.gz
mv node_exporter-1.3.1.linux-amd64 node_exporter		# 没啥,就是换个简单的名字
cd node_exporter
# 启动它
./node_exporter

image-20211211232424001

日志显示成功启动,默认监听9100端口

./node_exporter --help		# 查看参数信息
# 其中设置端口的参数
--web.listen-address=":9100"         Address on which to expose metrics and web interface.

下面可以访问这个地址 http://192.168.5.217:9100/metrics查看node_exporter采集到的信息

Prometheus容器状态监控_第3张图片

这只有采集到的数据,下面配置Prometheus来采集这些数据

修改prometheus文件夹下的prometheus.yml文件,在最下面添加监听node exporter,注意缩进,job_name与上一个job_name同级。

scrape_configs:
  # The job name is added as a label `job=` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:9090"]

  # 这里是新添加的,采集node exporter监控数据
  - job_name: 'node'
    static_configs:
      - targets: ['192.168.5.217:9100']		# 注意改掉localhost,这个地址要在起Prometheus的机器上能访问到

修改完成后重启prometheus

Grafana

官网 https://grafana.com/

下载 https://grafana.com/grafana/download

前端展示面板,展示各种监控指标

安装

这里我直接通过docker启动了

docker run -d \
  --name grafana \
  -p 3000:3000 \
  -v /data/zy/work_data/moni_dev/data/grafana:/var/lib/grafana \
  --privileged \
  grafana/grafana 

可以通过对应的端口进行访问,登录的默认用户名和密码都是 admin

Prometheus容器状态监控_第4张图片

添加数据源Prometheus

Prometheus容器状态监控_第5张图片

选择Prometheus

Prometheus容器状态监控_第6张图片

下滑到最下方 Save & test

Prometheus容器状态监控_第7张图片

选择Prometheus

Prometheus容器状态监控_第8张图片

视自己需求选择导入的仪表盘

Prometheus容器状态监控_第9张图片

导入仪表盘样式

grana也提供了很多仪表盘样式,这里选择下载比较多的Node Exporter Full做演示。

Prometheus容器状态监控_第10张图片

点击左侧Create->Import,

Prometheus容器状态监控_第11张图片

在弹出界面填入仪表盘id 1860,点击Load。选择Prometheus,点击Import

Prometheus容器状态监控_第12张图片

导入后即可弹出仪表盘,再次进入可以从Dashboards->Browser进入。(其中发生变动的节点是启动了Locust压力测试中的压测)

Prometheus容器状态监控_第13张图片

如果要增加对其他机器的监控,按exporter部分重复操作即可(在对应机器上启动exporter,在prometheus.yml配置文件中配置这个exporter)

参考

Grafana监控系统之Prometheus+Grafana监控系统搭建

https://prometheus.io/docs/introduction/overview/

你可能感兴趣的:(测试开发,容器,运维,监控,Prometheus,Grafana)