prometheus+grafana+alertmanager监控平台docker-compose安装

version: "3"

networks:
     monitor:
        driver: bridge

services:
    prometheus:
        image: prom/prometheus
        container_name: prometheus
        restart: always
        user: "id"             #注意修改为宿主机用户id
        volumes:
            - ./prometheusConfig:/etc/prometheus
            - ./data/prometheus:/prometheus
        ports:
            - "9090:9090"
        hostname: prometheus
        networks:
            - monitor
        command:
            - "--config.file=/etc/prometheus/prometheus.yml"
            - "--storage.tsdb.path=/prometheus"
            - "--web.console.libraries=/usr/share/prometheus/console_libraries"
            - "--web.console.templates=/usr/share/prometheus/consoles"
            - "--storage.tsdb.retention.time=3d"
            - "--web.enable-lifecycle"

    alertmanager:
        image: prom/alertmanager
        container_name: alertmanager
        hostname: alertmanager
        restart: always
        user: "id"
        ports:
            - '9093:9093'
        volumes:
            - ./data/alertmanager:/alertmanager/data
            - ./data/alertmanager.yml:/alertmanager.yml
        command:
            - "--config.file=/alertmanager.yml"
        networks:
            - monitor

    grafana:
        image: grafana/grafana
        container_name: grafana
        restart: always
        user: "id"
        ports:
            - "3000:3000"
        volumes:
            - ./data/grafana:/var/lib/grafana
        networks:
            - monitor

如果出现以下问题:

查看宿主机用户id,修改docker-compose.yml文件

prometheus+grafana+alertmanager监控平台docker-compose安装_第1张图片

如果找不到某文件,根据docker-compose.yml文件在宿主机上创建

 

2.prometheus.yml配置

scrape_configs:
    - job_name: 'xingneng'
      scrape_interval: 10s
      static_configs:
         - targets: ['192.168.0.47:9100','192.168.0.74:9100'] #node_exporter运行的机器ip
    #多个job格式一样

3.在被监测机器上安装运行node_export

4.grafana界面配置

创建data source

prometheus+grafana+alertmanager监控平台docker-compose安装_第2张图片

引入模板 id:8919

4.alertmanager.yml配置

https://www.cnblogs.com/gschain/p/11697200.html

钉钉报警

邮件报警 参考:cnblogs.com/Me1onRind/p/12000103.html

# alertmanager.yml
global:
	# 邮件报警设置 都可以在 receiver单独配置
	smtp_from: [email protected]
	smtp_smarthost: smtp.example.org:587
	smtp_auth_username: [email protected]
	smtp_auth_password: password
	smtp_require_tls: false   # 协议是否使用tls 需要注意默认是 true
	
	# 报警时调用api 暂略
	
route:   # 路由 这将会是一个树形数据结构, 如果不满足任何子节点 才会使用本节点配置 核心数据是规则的label
	receiver: default      # 接收者  下面会定义
	group_by: ['serverity'] # 分组使用的labels 属性  如果是 ... 则使用所有labels分组
	
	group_interval: 1m   # 针对该组发送报警邮件的间隔 间隔内多封报警会集合后发送一封
	repeat_interval: 20m # 相同报警邮件发送频率间隔 如报警a b两个规则邮件发送后, c也触发了则算是新的报警 abc 1m 后一起发送
	
	match:  # 全等匹配
		serverity: notice
	
	match_re: # 正则匹配
		serverity: warning | error
	
  continue: true   # 是否尝试匹配子节点路由 注意 默认是false
  
  routes: # 子节点路由
  	- [route] # 也是route结构
		

receivers:  # 接收者
	- name: default
	  email_config:   # 接受者的邮件设置
	  	to: [email protected]

 

你可能感兴趣的:(工具)