Prometheus安装与部署

Prometheus简介

Prometheus受启发于Google的Brogmon监控系统(相似的Kubernetes是从Google的Brog系统演变而来),从 2012 年开始由前Google工程师在Soundcloud 以开源软件的 形式进行研发,并且于 2015 年早期对外发布早期版本。Prometheus具有以下特点:易于管理、监控服务的内部运行状态、强大的数据模型、所有采集的监控数据均以指标(metric)的形式保存在内置的时间序列数据库当中(TSDB)。最新的Grafana可视化工具也已经提供了完整的Prometheus支持,基于Grafana可以创建更加精美的监控图标。

Prometheus架构

1、Prometheus 生态圈组件
Prometheus Server:主服务器,负责收集和存储时间序列数据
client libraies:应用程序代码插桩,将监控指标嵌入到被监控应用程序中
Pushgateway:推送网关, 为支持 short-lived 作业提供一个推送网关
exporter:专门为一些应用开发的数据摄取组件—exporter,例如: HAProxy、 StatsD、Graphite 等等。
Alertmanager:专门用于处理 alert 的组件

2、架构理解
Prometheus Server,里面包含了存储引擎和计算引擎。
Retrieval 组件为取数组件,它会主动从 Pushgateway 或者 Exporter 拉取指标数据。
Service discovery,可以动态发现要监控的目标。
TSDB,数据核心存储与查询。
HTTP server,对外提供 HTTP 服务。

3、采集层
采集层分为两类,一类是生命周期较短的作业,还有一类是生命周期较长的作业。
短作业:直接通过 API,在退出时间指标推送给 Pushgateway。
长作业:Retrieval 组件直接从 Job 或者 Exporter 拉取数据。

4、应用层
应用层主要分为两种,一种是 AlertManager,另一种是数据可视化。

集群规划

IP

服务

hostname

192.168.255.101

Prometheus Server、Pushgateway、Alertmanager、Node Exporter

node01

192.168.255.102

Node Exporter

node02

192.168.255.103

Node Exporter

node03

安装Prometheus

1、获取安装包
[root@node01 ~]# wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz

2、解压缩
[root@node01 ~]# tar -zxf prometheus-2.29.1.linux-amd64.tar.gz -C /usr/local/cluster/

3、创建软连接
[root@node01 ~]# ln -s /usr/local/cluster/prometheus-2.29.1.linux-amd64/ /usr/local/cluster/prometheus

4、修改配置文件
[root@node01 ~]# vim /usr/local/cluster/prometheus/prometheus.yml 

  - job_name: "prometheus"
    static_configs:
      - targets: ["192.168.255.101:9090"]

  - job_name: "pushgateway"
    static_configs:
      - targets: ["192.168.255.101:9091"]
        labels:
        instance: pushgateway

  - job_name: "node exporter"
    static_configs:
      - targets: ["192.168.255.101:9100","192.168.255.102:9100","192.168.255.103:9100"]
      

安装Pushgateway

1、获取安装包
[root@node01 ~]# wget https://github.com/prometheus/pushgateway/releases/download/v1.6.0/pushgateway-1.6.0.linux-amd64.tar.gz

2、解压缩
[root@node01 ~]# tar -zxf pushgateway-1.4.1.linux-amd64.tar.gz -C /usr/local/cluster/

3、创建软链接
[root@node01 ~]# ln -s /usr/local/cluster/pushgateway-1.4.1.linux-amd64/ /usr/local/cluster/pushgateway

安装Alertmanager

[root@node01 ~]# wget https://github.com/prometheus/alertmanager/releases/download/v0.25.0/alertmanager-0.25.0.linux-amd64.tar.gz
[root@node01 ~]# tar -zxf alertmanager-0.23.0.linux-amd64.tar.gz -C /usr/local/cluster/
[root@node01 ~]# ln -s /usr/local/cluster/alertmanager-0.23.0.linux-amd64/ /usr/local/cluster/alertmanager

安装Node Exporter

集群节点都要安装

1、获取安装包
[root@node01 ~]# wget https://github.com/prometheus/node_exporter/releases/download/v1.6.1/node_exporter-1.6.1.linux-amd64.tar.gz

2、解压缩
[root@node01 ~]# tar -zxf node_exporter-1.2.2.linux-amd64.tar.gz -C /usr/local/cluster/

3、创建软链接
[root@node01 ~]# ln -s /usr/local/cluster/node_exporter-1.2.2.linux-amd64/ /usr/local/cluster/node_exporter
 
4、启动服务
[root@node01 ~]# nohup /usr/local/cluster/node_exporter/node_exporter > /usr/local/cluster/node_exporter/node_exporter.log 2>&1 &

5、配置systemctl管理服务
[root@node01 ~]# vim /usr/lib/systemd/system/node_exporter.service
[Unit]
Description=node_export
Documentation=https://github.com/prometheus/node_exporter
After=network.target
[Service]
Type=simple
User=root
ExecStart=/usr/local/cluster/node_exporter/node_exporter
Restart=on-failure
[Install]
WantedBy=multi-user.target

[root@node01 ~]# systemctl start node_exporter.service

Node Exporter

Prometheus安装与部署_第1张图片

Prometheus安装与部署_第2张图片

 Prometheus安装与部署_第3张图片

启动Prometheus Server

1、后台方式运行Prometheus Server 
[root@node01 ~]# nohup /usr/local/cluster/prometheus/prometheus --config.file=/usr/local/cluster/prometheus/prometheus.yml > /usr/local/cluster/prometheus/prometheus.log 2>&1 &
[3] 106868

2、启动失败
[root@node01 ~]# ps -ef | grep prometheus
root     130614  62112  0 11:59 pts/0    00:00:00 grep --color=auto prometheus
[root@node01 ~]# netstat -anp | grep 106868

3、查看日志
[root@node01 ~]# more /usr/local/cluster/prometheus/prometheus.log 
nohup: ignoring input
level=error ts=2023-07-22T03:57:46.081Z caller=main.go:350 msg="Error loading config (--config.file=/usr
/local/cluster/prometheus/prometheus.yml)" err="parsing YAML file /usr/local/cluster/prometheus/promethe
us.yml: yaml: unmarshal errors:\n  line 31: field instance not found in type struct { Targets []string \
"yaml:\\\"targets\\\"\"; Labels model.LabelSet \"yaml:\\\"labels\\\"\" }"

4、配置文件第31行格式有问题instance: pushgateway
[root@node01 ~]# vim /usr/local/cluster/prometheus/prometheus.yml 
  - job_name: "pushgateway"
    static_configs:
      - targets: ["192.168.255.101:9091"]
        labels:
          instance: pushgateway
               
5、修改配置文件后再次启动          
[root@node01 ~]# nohup /usr/local/cluster/prometheus/prometheus --config.file=/usr/local/cluster/prometheus/prometheus.yml > /usr/local/cluster/prometheus/prometheus.log 2>&1 &
[1] 66642

6、查看进程
[root@node01 ~]# ps -ef | grep prometheus
root      66642  87617  0 02:04 pts/2    00:00:00 /usr/local/cluster/prometheus/prometheus --config.file=/usr/local/cluster/prometheus/prometheus.yml
root      75184  87617  0 02:05 pts/2    00:00:00 grep --color=auto prometheus

7、查看日志Server is ready to receive web requests
[root@node01 ~]# tail -100f /usr/local/cluster/prometheus/prometheus.log 
nohup: ignoring input
level=info ts=2023-07-22T18:04:30.186Z caller=main.go:390 msg="No time or size retention was set so using the default time retention" duration=15d
level=info ts=2023-07-22T18:04:30.186Z caller=main.go:428 msg="Starting Prometheus" version="(version=2.29.1, branch=HEAD, revision=dcb07e8eac34b5ea37cd229545000b857f1c1637)"
level=info ts=2023-07-22T18:04:30.186Z caller=main.go:433 build_context="(go=go1.16.7, user=root@364730518a4e, date=20210811-14:48:27)"
level=info ts=2023-07-22T18:04:30.186Z caller=main.go:434 host_details="(Linux 3.10.0-1062.el7.x86_64 #1 SMP Wed Aug 7 18:08:02 UTC 2019 x86_64 node01 (none))"
level=info ts=2023-07-22T18:04:30.186Z caller=main.go:435 fd_limits="(soft=1024, hard=4096)"
level=info ts=2023-07-22T18:04:30.186Z caller=main.go:436 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2023-07-22T18:04:30.192Z caller=web.go:541 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2023-07-22T18:04:30.197Z caller=main.go:812 msg="Starting TSDB ..."
level=info ts=2023-07-22T18:04:30.201Z caller=head.go:815 component=tsdb msg="Replaying on-disk memory mappable chunks if any"
level=info ts=2023-07-22T18:04:30.201Z caller=head.go:829 component=tsdb msg="On-disk memory mappable chunks replay completed" duration=18.856µs
level=info ts=2023-07-22T18:04:30.201Z caller=head.go:835 component=tsdb msg="Replaying WAL, this may take a while"
level=info ts=2023-07-22T18:04:30.202Z caller=head.go:892 component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0
level=info ts=2023-07-22T18:04:30.202Z caller=head.go:898 component=tsdb msg="WAL replay completed" checkpoint_replay_duration=34.866µs wal_replay_duration=903.49µs total_replay_duration=976.097µs
level=info ts=2023-07-22T18:04:30.203Z caller=tls_config.go:191 component=web msg="TLS is disabled." http2=false
level=info ts=2023-07-22T18:04:30.204Z caller=main.go:839 fs_type=XFS_SUPER_MAGIC
level=info ts=2023-07-22T18:04:30.204Z caller=main.go:842 msg="TSDB started"
level=info ts=2023-07-22T18:04:30.204Z caller=main.go:969 msg="Loading configuration file" filename=/usr/local/cluster/prometheus/prometheus.yml
level=info ts=2023-07-22T18:04:30.216Z caller=main.go:1006 msg="Completed loading of configuration file" filename=/usr/local/cluster/prometheus/prometheus.yml totalDuration=12.121714ms db_storage=902ns remote_storage=4.468µs web_handler=341ns query_engine=2.685µs scrape=11.038657ms scrape_sd=124.555µs notify=31.409µs notify_sd=14.867µs rules=3.366µs
level=info ts=2023-07-22T18:04:30.216Z caller=main.go:784 msg="Server is ready to receive web requests."                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         

Prometheus web

http://192.168.255.101:9090/
当前成功启动3个node exporter和Prometheus

Prometheus安装与部署_第4张图片Prometheus安装与部署_第5张图片

启动Pushgateway

1、nohup方式启动
[root@node01 ~]# nohup /usr/local/cluster/pushgateway/pushgateway --web.listen-address=":9091" > /usr/local/cluster/pushgateway/pushgateway.log 2>&1 &
[3] 94973

2、查看进程
[root@node01 ~]# ps -ef | grep pushgateway
root      94973  87617  0 02:20 pts/2    00:00:00 /usr/local/cluster/pushgateway/pushgateway --web.listen-address=:9091
root      96236  87617  0 02:20 pts/2    00:00:00 grep --color=auto pushgateway

3、查看日志
[root@node01 ~]# tail -100f /usr/local/cluster/pushgateway/pushgateway.log 
nohup: ignoring input
level=info ts=2023-07-22T18:20:39.894Z caller=main.go:85 msg="starting pushgateway" version="(version=1.4.1, branch=HEAD, revision=6fa509bbf4f082ab8455057aafbb5403bd6e37a5)"
level=info ts=2023-07-22T18:20:39.894Z caller=main.go:86 build_context="(go=go1.16.4, user=root@da864be5f3f0, date=20210528-14:30:10)"
level=info ts=2023-07-22T18:20:39.896Z caller=main.go:139 listen_address=:9091
level=info ts=2023-07-22T18:20:39.901Z caller=tls_config.go:191 msg="TLS is disabled." http2=false

Prometheus安装与部署_第6张图片

启动Alertmanager

1、nohup方式启动
[root@node01 ~]# nohup /usr/local/cluster/alertmanager/alertmanager --config.file=/usr/local/cluster/alertmanager/alertmanager.yml > /usr/local/cluster/alertmanager/alertmanager.log 2>&1 &
[5] 128248

2、查看进程
[root@node01 ~]# ps -ef | grep alertmanager
root     128248  87617  1 02:23 pts/2    00:00:00 /usr/local/cluster/alertmanager/alertmanager --config.file=/usr/local/cluster/alertmanager/alertmanager.yml
root     129880  87617  0 02:23 pts/2    00:00:00 grep --color=auto alertmanager

3、查看日志
[root@node01 ~]# tail -100f /usr/local/cluster/alertmanager/alertmanager.log 
nohup: ignoring input
level=info ts=2023-07-22T18:23:49.256Z caller=main.go:225 msg="Starting Alertmanager" version="(version=0.23.0, branch=HEAD, revision=61046b17771a57cfd4c4a51be370ab930a4d7d54)"
level=info ts=2023-07-22T18:23:49.256Z caller=main.go:226 build_context="(go=go1.16.7, user=root@e21a959be8d2, date=20210825-10:48:55)"
level=info ts=2023-07-22T18:23:49.262Z caller=cluster.go:184 component=cluster msg="setting advertise address explicitly" addr=192.168.255.101 port=9094
level=info ts=2023-07-22T18:23:49.270Z caller=cluster.go:671 component=cluster msg="Waiting for gossip to settle..." interval=2s
level=info ts=2023-07-22T18:23:49.331Z caller=coordinator.go:113 component=configuration msg="Loading configuration file" file=/usr/local/cluster/alertmanager/alertmanager.yml
level=info ts=2023-07-22T18:23:49.331Z caller=coordinator.go:126 component=configuration msg="Completed loading of configuration file" file=/usr/local/cluster/alertmanager/alertmanager.yml
level=info ts=2023-07-22T18:23:49.334Z caller=main.go:518 msg=Listening address=:9093
level=info ts=2023-07-22T18:23:49.334Z caller=tls_config.go:191 msg="TLS is disabled." http2=false
level=info ts=2023-07-22T18:23:51.270Z caller=cluster.go:696 component=cluster msg="gossip not settled" polls=0 before=0 now=1 elapsed=2.000230156s
level=info ts=2023-07-22T18:23:59.277Z caller=cluster.go:688 component=cluster msg="gossip settled; proceeding" elapsed=10.006653682s

你可能感兴趣的:(Linux,prometheus,linux)