prometheus简单入门

prometheus监控系统

最近由于公司要建立大数据平台,按照顾问的要求更换prometheus监控系统。
prometheus官网:https://prometheus.io/
个人理解(不一定对):prometheus监控由三部分构成,prometheus(server)、exporter(agent)、以及alertmanager(告警)。
其中,prometheus的核心是一个时间序列数据库,我们可以通过它抓取并存储数据,并通过prometheus定义的一些查询语句来获取我们需要的数据; exporter的核心是一个静态web,通过不断更新的静态web暴露metric值;alertmanager是一个报警接口,接收prometheus推送的告警,并通过自己定义的一些规则去进行告警。
刚才有提到,prometheus核心是一个数据库,所以我们如果需要展示,则需要搭配grafana进行使用,可以做出很漂亮的界面。这一块的内容我会在下一篇提到。
prometheus的优势在于,它是一个基于服务的告警系统,针对不同的服务,有不同的exporter,可以实现不一样的效果。由于本人也是初次使用,尚未使用过其他的exporter,想了解的朋友可以去看看官网。
下面是一些简单的配置,重要配置我做了一些注释,可以初步搭建一个prometheus监控系统,监控一些基础信息。

server端

部署

cd /usr/local/
wget http://1.1.17.28/software/linux/prometheus/prometheus-1.7.1.linux-amd64.tar.gz
tar  -zxvf prometheus-1.7.1.linux-amd64.tar.gz
cd prometheus-1.7.1.linux-amd64
nohup  ./prometheus   &
echo "/usr/local/prometheus-1.7.1.linux-amd64/prometheus"" >> /etc/rc.local

主配置文件:prometheus.yml

[root@prometheus local]# cat  prometheus-1.7.1.linux-amd64/prometheus.yml
global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.
  evaluation_interval: 15s # Evaluate rules every 15 seconds.

  # Attach these extra labels to all timeseries collected by this Prometheus instance.
  external_labels:
    monitor: 'codelab-monitor'
# 报警规则文件
rule_files:
  - 'prometheus.rules'

scrape_configs:
# 监控自身,可配可不配
  - job_name: 'prometheus'

    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s

    static_configs:
      - targets: ['localhost:9090']
# node_exporter target配置,抓取node的基础信息(CPU,内存等等),可根据不同服务建立job,打上lable
  - job_name:       'node'

    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s

    static_configs:

      - targets: ['1.1.17.28:9100']
        labels:
          severity: 'all'
          group: 'tool'
          hostname: 'yum-server'

      - targets: ['1.1.11.27:9100']
        labels: 
          severity: 'all'
          group: 'dev'
          hostname: 'app1'

      - targets: ['1.1.11.28:9100']
        labels: 
          severity: 'all'
          group: 'dev'
          hostname: 'app2'    
      - targets: ['1.1.11.15:9100']
        labels:
          severity: 'all'
          group: 'hadoop'
          hostname: 'hadoop1'
      - targets: ['1.1.11.16:9100']
        labels:
          severity: 'all'
          group: 'hadoop'
          hostname: 'hadoop2'
      - targets: ['1.1.11.17:9100']
        labels:
          severity: 'all'
          group: 'hadoop'
          hostname: 'hadoop2'

      - targets: ['1.1.10.12:9100']
        labels:
          severity: 'all'
          group: 'db_anl'
          hostname: 'DB_ETL'
# alertmanager配置
alerting:
   alertmanagers: 
   - scheme: http
     static_configs:
     - targets: 
        - "1.1.17.17:9093"

告警规则:prometheus.rules

# CPU告警 
ALERT cpu_overload
  IF node_load1 >= 0.8
  FOR 3m
  LABELS { severity = "all" }
  ANNOTATIONS {
    summary = "Instance {{ $labels.instance }} cpu_load1 over 80% for 3 minutes",
    description = "{{ $labels.instance }} of job {{ $labels.job }} cpu_load1 over 80% for 3 minutes.",
  }


# 内存告警
ALERT memory_overload
  IF (node_memory_MemTotal-node_memory_MemFree)/node_memory_MemTotal >= 0.8
  FOR 3m
  LABELS { severity = "all" }
  ANNOTATIONS {
    summary = "Instance {{ $labels.instance }} memory_load over 80% for 3 minutes",
    description = "{{ $labels.instance }} of job {{ $labels.job }} memory_load over 80% for 3 minutes.",
  }

node-export部署

node-export仅通过静态web暴露metric,安装后启动即可,无需配置

cd /usr/local/
wget http://1.1.17.28/software/linux/prometheus/node_exporter-0.14.0.linux-amd64.tar.gz
tar -zxvf  node_exporter-0.14.0.linux-amd64.tar.gz 
cd node_exporter-0.14.0.linux-amd64
nohup ./node_exporter &
#写入开机启动
echo "/usr/local/node_exporter-0.14.0.linux-amd64/node_exporter"  >>  /etc/rc.local

alert

部署

cd /usr/local
wget http://1.1.17.28/software/linux/prometheus/alertmanager-0.8.0.linux-amd64.tar.gz
tar -zxvf   alertmanager-0.8.0.linux-amd64.tar.gz
cd alertmanager-0.8.0.linux-amd64
nohup   ./alertmanager   & 
echo "/usr/local/alertmanager-0.8.0.linux-amd64/alertmanager" >> /etc/rc.local

告警通知配置文件

只配置了邮件告警

[root@prometheus local]# cat alertmanager-0.8.0.linux-amd64/alertmanager.yml
global:
  smtp_smarthost: 'smtp.xxx.com:25'
  resolve_timeout: 5m
  smtp_from: '[email protected]'
  smtp_auth_username: '[email protected]'
  smtp_auth_password: '123123123'
  smtp_require_tls: false
#templates: 
#- '/usr/local/alertmanager-0.8.0.linux-amd64/alert_templates/123.tmpl'
route:
  group_by: ['alertname', 'cluster', 'service']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 3h 
  receiver: 'hwj'
  routes:
 # - match_re:
 #     service: ^(foo1|foo2|baz)$
 #   receiver: hwj
 #   routes:
  - match:
      severity: 'all' 
    receiver: 'hwj'
receivers:
- name: 'hwj'
  email_configs:
  - to: '[email protected]'
    send_resolved: true
  - to: '[email protected]'
    send_resolved: true

你可能感兴趣的:(prometheus)