背景:

Prometheus 本身无法做集群方案,存在单点故障;但是Prometheus 官方给出以下几种方案

1、HA：部署两套Prometheus 同时采集相同的数据,前端通过LB进行负载均衡

2、HA+远程存储,通过Prometheus 存储插件把收集的数据通过Remote write 写入到远端存储中如influxdb,解决存储持久化

3、联邦集群: Federation,按照功能进行分区，不同的shared采集不同的数据，由Global节点统一存放

Thanos 架构

Thanos 有两种工作模式分别为Sidecar 模式与 Receiver 模式,当然最常用的是 sidecar，Thanos 组件如下:

query

1、封装prometheus Query Api，支持PromQL

2、暴露query服务，实现Store Api，可查询来自四类endpoint的数据：

包括：rule节点record数据，sidecar节点prometheus原生数据，store gateway代理的object storage数据，receiver收集到的数据。

3、无状态，可水平扩展（高可用）

sidecar

1、prometheus sidecar，实现Store API，提供grpc给query组件进行指标查询；

2、使用时与prometheus放在一个pod里，共享生命周期；

3、prometheus每两个小时把数据存到硬盘一次，此时sidecar shipper同时把数据上传到对象存储；

4、此方案，如果prometheus节点down，会丢失最近两个小时的数据；

receiver

1、在prometheus remote-write基础上实现；

2、prometheus服务会把数据实时写到receiver；

3、receiver可分布式部署，实现一致性hash，（疑问：多个prometheus同时pull数据并上传到receiver，配置了external_labels，receiver是否能够实现一致性hash进行去重；）

4、此方案，一个prometheus down掉以后，仍然保证数据完整，目前thanos还没有推出比较稳定的版本。

5、receiver也会把数据写入object storage。（疑问：什么时候，什么频率）

6、远程写的同时，prometheus本地磁盘依然会写入数据。

store gateway

1、query查询object storage数据的唯一入口；

2、实现Query Api；

3、缓存对象存储索引，优化查询。

Compactor

1、单例

2、object storage数据压缩

3、object storage数据降采样，个人理解就是把数据重新整理，生成采样间隔更长的数据block，并上传至对象存储，可优化查询。

4、官方推荐100G磁盘空间用作临时数据处理空间

5、上传deletion-mark.json来标记删除对象存储里的block，三个重要参数–retention.resolution-raw，–retention.resolution-5m，–retention.resolution-1h

rule

1、类似于prometheus的rule，可根据配置文件提前生成用户配置的metric，已经对altermanagement发去告警信息；

2、rule聚合生成的数据也会上传至object storage；

3、rule的数据源是通过thanos query至prometheus查询到的，可见于thanos query互相查询。

组件与配置

Prometheus

thanos 是无侵入的，只是上层套件，因此你还是需要部署你的 Prometheus，这里不再赘述，默认你已经有一个单机的 Prometheus 在运行，可以是 pod 也可以是主机部署，取决于你的运行环境，我们是在 k8s 集群外，因此是主机部署。Prometheus 采集的是地域A的监控数据

1、启动参数:

exec ./prometheus --config.file=/apps/conf/prometheus/prometheus.yml --storage.tsdb.path=/apps/dbdat/prometheus --storage.tsdb.max-block-duration=2h --storage.tsdb.min-block-duration=2h --storage.tsdb.wal-compression --storage.tsdb.retention.time=12h --web.enable-lifecycle &

--web.enable-lifecycle一定要开，用于热加载时 reload 你的配置，retention 保留 2 小时，Prometheus 默认 2 小时会生成一个 block，Thanos 会把这个 block 上传到对象存储

2、prometheus配置文件

global:

scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.

evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.

# scrape_timeout is set to the global default (10s).

external_labels:

region: 'IDC01'

replica: 0

必须要声明external_labels，标注地域,如果你是多副本运行，需要声明你的副本标识，如 0号，1，2 三个副本采集一模一样的数据，另外2个 Prometheus 就可以同时运行，只是 replica 值不同而已

部署sidecar 组件

1、Sidecar 组件与 Prometheus server 部署于同一个 pod 中。他有两个作用：

它使用 Prometheus 的 Remote Read API，实现了 Thanos 的 Store API。这使后面要介绍的Query 组件可以将 Prometheus 服务器视为时间序列数据的另一个来源，而无需直接与 Prometheus API交互（这就是 Sidecar 的拦截作用）

可选配置：在 Prometheus 每2小时生成一次 TSDB 块时，Sidecar 将 TSDB 块上载到对象存储桶中。这使得 Prometheus 服务器可以以较低的保留时间运行，同时使历史数据持久且可通过对象存储查询。

当然，这不意味着 Prometheus 可以是完全无状态的，因为如果它崩溃并重新启动，您将丢失2个小时的指标，不过如果你的 Prometheus 也是多副本，可以减少这 2h 数据的风险

2、启动参数:

exec bin/thanos sidecar --tsdb.path "/apps/dbdat/prometheus" --prometheus.url "http://localhost:9090" --http-address 0.0.0.0:19191 --grpc-address 0.0.0.0:19091 --objstore.config-file etc/config.yaml --shipper.upload-compacted &

3、配置对象存储

cat etc/conf.yaml

type: S3

config:

bucket: thanos

endpoint: 10.65.6.1:19000

access_key: thanos

insecure: true

signature_version2: false

secret_key: thanos2021

put_user_metadata: {}

http_config:

idle_conn_timeout: 1m30s

response_header_timeout: 2m

insecure_skip_verify: false

tls_handshake_timeout: 10s

expect_continue_timeout: 1s

max_idle_conns: 100

max_idle_conns_per_host: 100

max_conns_per_host: 0

压缩：官方文档有提到，使用sidecar时，需要将 prometheus 的–storage.tsdb.min-block-duration 和 --storage.tsdb.max-block-duration，这两个值设置为2h，两个参数相等才能保证prometheus关闭了本地压缩，其实这两个参数在 prometheus -help 中并没有体现，prometheus 作者也说明这只是为了开发测试才用的参数，不建议用户修改。而 thanos 要求关闭压缩是因为 prometheus 默认会以2，25，25*5的周期进行压缩，如果不关闭，可能会导致 thanos 刚要上传一个 block，这个 block 却被压缩中，导致上传失败。

不过你也不必担心，因为在 sidecar 启动时，会坚持这两个参数，如果不合适，sidecar会启动失败

部署store gateway 组件

在sidecar 配置中，如果你配置了对象存储 objstore.config-file，你的数据就会定时上传到 bucket 中，本地只留 2 小时，那么要想查询 2 小时前的数据怎么办呢？数据不被 Prometheus 控制了，应该如何从 bucket 中拿回来，并提供一模一样的查询呢？

1、Store gateway 组件：store gateway 主要与对象存储交互，从对象存储获取已经持久化的数据。与sidecar一样，store gateway也实现了 store api，query 组可以从 store gateway 查询历史数据

2、启动命令

exec bin/thanos store --data-dir /apps/dbdat/thanos/store --objstore.config-file etc/config.yaml --http-address 0.0.0.0:39191 --grpc-address 0.0.0.0:39090 --index-cache-size=250MB --sync-block-duration=5m --min-time=-2w --max-time=-1h &

Store gateway需要从网络上拉取大量历史数据加载到内存，因此会大量消耗 cpu 和内存

部署query组件

1、Query 组件（也称为“查询”）实现了 Prometheus 的 HTTP v1 API，可以像 Prometheus 的 graph一样，通过 PromQL 查询 Thanos 集群中的数据。

简而言之，sidecar 暴露了 StoreAPI，Query 从多个 StoreAPI 中收集数据，查询并返回结果。Query 是完全无状态的，可以水平扩展

2、启动命令

exec bin/thanos query --http-address "0.0.0.0:29090" --grpc-address "0.0.0.0:29091" --query.replica-label "IDC01" --store "10.65.4.2:19091" --store "10.65.6.9:19091" --store "10.65.6.9:39090" --store "127.0.0.1:39090" &

部署Compact

启动命令

exec bin/thanos compact --data-dir /apps/dbdat/thanos/compact --http-address 0.0.0.0:19192 --objstore.config-file etc/config.yaml &

Thanos UI

访问THanosUI 验证查询数据 http://10.65.4.2:29090

Prometheus高可用方案之Thanos