weixin_33943347

Prometheus

转载自：https://songjiayang.gitbooks.io/prometheus/content/pushgateway/why.html

Prometheus 实战

v0.1.0

在过去一年左右时间里，我们使用 Prometheus 完成了对几个机房的基础和业务监控，大大提高了服务质量以及 oncall 水平，在此特别感谢 Promethues 这样优秀的开源软件。

当初选择 Prometheus 并不是偶然，因为：

Prometheus 是按照 Google SRE 运维之道的理念构建的，具有实用性和前瞻性。
Prometheus 社区非常活跃，基本稳定在 1个月1个版本的迭代速度，从 2016 年 v1.01 开始接触使用以来，到目前发布的 v1.8.2 以及最新最新的 v2.1 ，你会发现 Prometheus 一直在进步、在优化。
Go 语言开发，性能不错，安装部署简单，多平台部署兼容性好。
丰富的数据收集客户端，官方提供了各种常用 exporter。
丰富强大的查询能力。

+

Prometheus 作为监控后起之秀，虽然还有做的不够好的地方，但是不妨碍我们使用和喜爱它。根据我们长期的使用经验来看，它足以满足大多数场景需求，只不过对于新东西，往往需要花费更多力气才能发挥它的最大能力而已。

本书主要根据个人过去一年多的使用经验总结而成，内容主要包括 Prometheus 基本知识、进阶、实战以及常见问题列表等方面，希望对大家有所帮助。

本开源书籍既适用于具备基础 Linux 知识的运维初学者，也可供渴望理解 Prometheus 原理和实现细节的高级用户参考，同时也希望书中给出的实践案例在实际部署监控中对大家有所帮助。

你准备好了吗？接下来就让我们一起开始这段神奇旅行吧！

一、什么是 Prometheus

Prometheus 是由 SoundCloud 开源监控告警解决方案，从 2012 年开始编写代码，再到 2015 年 github 上开源以来，已经吸引了 9k+ 关注，以及很多大公司的使用；2016 年 Prometheus 成为继 k8s 后，第二名 CNCF(Cloud Native Computing Foundation) 成员。

作为新一代开源解决方案，很多理念与 Google SRE 运维之道不谋而合。

主要功能

多维数据模型（时序由 metric 名字和 k/v 的 labels 构成）。
灵活的查询语句（PromQL）。
无依赖存储，支持 local 和 remote 不同模型。
采用 http 协议，使用 pull 模式，拉取数据，简单易懂。
监控目标，可以采用服务发现或静态配置的方式。
支持多种统计数据模型，图形化友好。

核心组件

Prometheus Server，主要用于抓取数据和存储时序数据，另外还提供查询和 Alert Rule 配置管理。
client libraries，用于对接 Prometheus Server, 可以查询和上报数据。
push gateway ，用于批量，短期的监控数据的汇总节点，主要用于业务数据汇报等。
各种汇报数据的 exporters ，例如汇报机器数据的 node_exporter, 汇报 MongoDB 信息的 MongoDB exporter 等等。
用于告警通知管理的 alertmanager 。

基础架构

一图胜千言，先来张官方的架构图

从这个架构图，也可以看出 Prometheus 的主要模块包含， Server, Exporters, Pushgateway, PromQL, Alertmanager, WebUI 等。

它大致使用逻辑是这样：

Prometheus server 定期从静态配置的 targets 或者服务发现的 targets 拉取数据。
当新拉取的数据大于配置内存缓存区的时候，Prometheus 会将数据持久化到磁盘（如果使用 remote storage 将持久化到云端）。
Prometheus 可以配置 rules，然后定时查询数据，当条件触发的时候，会将 alert 推送到配置的 Alertmanager。
Alertmanager 收到警告的时候，可以根据配置，聚合，去重，降噪，最后发送警告。
可以使用 API， Prometheus Console 或者 Grafana 查询和聚合数据。

注意

Prometheus 的数据是基于时序的 float64 的值，如果你的数据值有更多类型，无法满足。
Prometheus 不适合做审计计费，因为它的数据是按一定时间采集的，关注的更多是系统的运行瞬时状态以及趋势，即使有少量数据没有采集也能容忍，但是审计计费需要记录每个请求，并且数据长期存储，这个 Prometheus 无法满足，可能需要采用专门的审计系统。

二、为什么选择 Prometheus

在前言中，简单介绍了我们选择 Prometheus 的理由，以及使用后给我们带来的好处。

在这里主要和其他监控方案对比，方便大家更好的了解 Prometheus。

Prometheus vs Zabbix

Zabbix 使用的是 C 和 PHP, Prometheus 使用 Golang, 整体而言 Prometheus 运行速度更快一点。
Zabbix 属于传统主机监控，主要用于物理主机，交换机，网络等监控，Prometheus 不仅适用主机监控，还适用于 Cloud, SaaS, Openstack，Container 监控。
Zabbix 在传统主机监控方面，有更丰富的插件。
Zabbix 可以在 WebGui 中配置很多事情，但是 Prometheus 需要手动修改文件配置。

Prometheus vs Graphite

Graphite 功能较少，它专注于两件事，存储时序数据，可视化数据，其他功能需要安装相关插件，而 Prometheus 属于一站式，提供告警和趋势分析的常见功能，它提供更强的数据存储和查询能力。
在水平扩展方案以及数据存储周期上，Graphite 做的更好。

Prometheus vs InfluxDB

InfluxDB 是一个开源的时序数据库，主要用于存储数据，如果想搭建监控告警系统，需要依赖其他系统。
InfluxDB 在存储水平扩展以及高可用方面做的更好, 毕竟核心是数据库。

Prometheus vs OpenTSDB

OpenTSDB 是一个分布式时序数据库，它依赖 Hadoop 和 HBase，能存储更长久数据，如果你系统已经运行了 Hadoop 和 HBase, 它是个不错的选择。
如果想搭建监控告警系统，OpenTSDB 需要依赖其他系统。

Prometheus vs Nagios

Nagios 数据不支持自定义 Labels, 不支持查询，告警也不支持去噪，分组, 没有数据存储，如果想查询历史状态，需要安装插件。
Nagios 是上世纪 90 年代的监控系统，比较适合小集群或静态系统的监控，显然 Nagios 太古老了，很多特性都没有，相比之下Prometheus 要优秀很多。

Prometheus vs Sensu

Sensu 广义上讲是 Nagios 的升级版本，它解决了很多 Nagios 的问题，如果你对 Nagios 很熟悉，使用 Sensu 是个不错的选择。
Sensu 依赖 RabbitMQ 和 Redis，数据存储上扩展性更好。

总结

Prometheus 属于一站式监控告警平台，依赖少，功能齐全。
Prometheus 支持对云或容器的监控，其他系统主要对主机监控。
Prometheus 数据查询语句表现力更强大，内置更强大的统计函数。
Prometheus 在数据存储扩展性以及持久性上没有 InfluxDB，OpenTSDB，Sensu 好。

三、安装Prometheus

本章将介绍 Prometheus 两种安装方式: 传统二进制包安装和 Docker 安装方式。

1、二进制包安装

我们可以到 Prometheus 二进制安装包下载页面，根据自己的操作系统选择下载对应的安装包。下面我们将以 ubuntu server 作为演示。

环境准备

linux amd64 (ubuntu server)
prometheus 1.6.2

下载 Prometheus Server

创建下载目录,以便安装过后清理掉

mkdir ~/Download
cd ~/Download

使用 wget 下载 Prometheus 的安装包

wget https://github.com/prometheus/prometheus/releases/download/v1.6.2/prometheus-1.6.2.linux-amd64.tar.gz

创建 Prometheus 目录，用于存放所有 Prometheus 相关的运行服务

mkdir ~/Prometheus
cd ~/Prometheus

使用 tar 解压缩 prometheus-1.6.2.linux-amd64.tar.gz

tar -xvzf ~/Download/prometheus-1.6.2.linux-amd64.tar.gz
cd prometheus-1.6.2.linux-amd64

当解压缩成功后，可以运行 version 检查运行环境是否正常

./prometheus version

如果你看到类似输出，表示你已安装成功:

prometheus, version 1.6.2 (branch: master, revision: xxxx)
  build user:       xxxx
  build date:       xxxx
  go version:       go1.8.1

启动 Prometheus Server

./prometheus

如果 prometheus 正常启动，你将看到如下信息：

INFO[0000] Starting prometheus (version=1.6.2, branch=master, revision=b38e977fd8cc2a0d13f47e7f0e17b82d1a908a9a)  source=main.go:88
INFO[0000] Build context (go=go1.8.1, user=root@c99d9d650cf4, date=20170511-13:03:00)  source=main.go:89
INFO[0000] Loading configuration file prometheus.yml     source=main.go:251
INFO[0000] Loading series map and head chunks...         source=storage.go:421
INFO[0000] 0 series loaded.                              source=storage.go:432
INFO[0000] Starting target manager...                    source=targetmanager.go:61
INFO[0000] Listening on :9090                            source=web.go:259

通过启动日志，可以看到 Prometheus Server 默认端口是 9090。

当 Prometheus 启动后，你可以通过浏览器来访问 http://IP:9090，将看到如下页面

在默认配置中，我们已经添加了 Prometheus Server 的监控，所以我们现在可以使用 PromQL （Prometheus Query Language）来查看，比如：

总结

可以看出 Prometheus 二进制安装非常方便，没有依赖，自带查询 web 界面。
在生产环境中，我们可以将 Prometheus 添加到 init 配置里，或者使用 supervisord 作为服务自启动。

2、Docker 安装

首先确保你已安装了最新版本的 Docker, 如果没有安装请点击这里。

下面我将以 Mac 版本的 Docker 作为演示。

安装

Docker 镜像地址 Quay.io

执行命令安装:

$ docker run --name prometheus -d -p 127.0.0.1:9090:9090 quay.io/prometheus/prometheus

如果安装成功你可以访问 127.0.0.1:9090 查看到该页面:

Docker 管理 prometheus

运行 docker ps 查看所有服务:

CONTAINER ID        IMAGE                           COMMAND                  CREATED             STATUS              PORTS                      NAMES
e9ebc2435387        quay.io/prometheus/prometheus   "/bin/prometheus -..."   26 minutes ago      Up 26 minutes       127.0.0.1:9090->9090/tcp   prometheus

运行 docker start prometheus 启动服务

运行 docker stats prometheus 查看 prometheus 状态

运行 docker stop prometheus 停止服务

四、Prometheus基础概念

本章将介绍 Prometheus 一些基础概念，包括：

数据模型
四种 Metric Type
作业与实例

1、数据模型

Prometheus 存储的是时序数据, 即按照相同时序(相同的名字和标签)，以时间维度存储连续的数据的集合。

时序索引

时序(time series) 是由名字(Metric)，以及一组 key/value 标签定义的，具有相同的名字以及标签属于相同时序。

时序的名字由 ASCII 字符，数字，下划线，以及冒号组成，它必须满足正则表达式 [a-zA-Z_:][a-zA-Z0-9_:]*, 其名字应该具有语义化，一般表示一个可以度量的指标，例如: http_requests_total, 可以表示 http 请求的总数。

时序的标签可以使 Prometheus 的数据更加丰富，能够区分具体不同的实例，例如 http_requests_total{method="POST"} 可以表示所有 http 中的 POST 请求。

标签名称由 ASCII 字符，数字，以及下划线组成，其中 __ 开头属于 Prometheus 保留，标签的值可以是任何 Unicode 字符，支持中文。

时序样本

按照某个时序以时间维度采集的数据，称之为样本，其值包含：

一个 float64 值
一个毫秒级的 unix 时间戳

格式

Prometheus 时序格式与 OpenTSDB 相似：

{=, ...}

其中包含时序名字以及时序的标签。

2、时序 4 种类型

Prometheus 时序数据分为 Counter, Gauge, Histogram, Summary 四种类型。

Counter

Counter 表示收集的数据是按照某个趋势（增加／减少）一直变化的，我们往往用它记录服务请求总量、错误总数等。

例如 Prometheus server 中 http_requests_total, 表示 Prometheus 处理的 http 请求总数，我们可以使用 delta, 很容易得到任意区间数据的增量，这个会在 PromQL 一节中细讲。

Gauge

Gauge 表示搜集的数据是一个瞬时的值，与时间没有关系，可以任意变高变低，往往可以用来记录内存使用率、磁盘使用率等。

例如 Prometheus server 中 go_goroutines, 表示 Prometheus 当前 goroutines 的数量。

Histogram

Histogram 由 _bucket{le=""}，_bucket{le="+Inf"}, _sum，_count 组成，主要用于表示一段时间范围内对数据进行采样（通常是请求持续时间或响应大小），并能够对其指定区间以及总数进行统计，通常它采集的数据展示为直方图。

例如 Prometheus server 中 prometheus_local_storage_series_chunks_persisted, 表示 Prometheus 中每个时序需要存储的 chunks 数量，我们可以用它计算待持久化的数据的分位数。

Summary

Summary 和 Histogram 类似，由 {quantile="<φ>"}，_sum，_count 组成，主要用于表示一段时间内数据采样结果（通常是请求持续时间或响应大小），它直接存储了 quantile 数据，而不是根据统计区间计算出来的。

例如 Prometheus server 中 prometheus_target_interval_length_seconds。

Histogram vs Summary

都包含 _sum，_count
Histogram 需要通过 _bucket 计算 quantile, 而 Summary 直接存储了 quantile 的值。

3、作业和实例

Prometheus 中，将任意一个独立的数据源（target）称之为实例（instance）。包含相同类型的实例的集合称之为作业（job）。如下是一个含有四个重复实例的作业：

- job: api-server
    - instance 1: 1.2.3.4:5670
    - instance 2: 1.2.3.4:5671
    - instance 3: 5.6.7.8:5670
    - instance 4: 5.6.7.8:5671

自生成标签和时序

Prometheus 在采集数据的同时，会自动在时序的基础上添加标签，作为数据源（target）的标识，以便区分：

job: The configured job name that the target belongs to.
instance: The : part of the target's URL that was scraped.

如果其中任一标签已经在此前采集的数据中存在，那么将会根据 honor_labels 设置选项来决定新标签。详见官网解释： scrape configuration documentation

对每一个实例而言，Prometheus 按照以下时序来存储所采集的数据样本：

up{job="", instance=""}: 1 表示该实例正常工作
up{job="", instance=""}: 0 表示该实例故障

scrape_duration_seconds{job="", instance=""} 表示拉取数据的时间间隔

scrape_samples_post_metric_relabeling{job="", instance=""} 表示采用重定义标签（relabeling）操作后仍然剩余的样本数

scrape_samples_scraped{job="", instance=""}  表示从该数据源获取的样本数

其中 up 时序可以有效应用于监控该实例是否正常工作。

五、PromQL

本章将介绍 PromQL 基本用法、示例，并将其与 SQL 进行比较。

1、PromQL 基本使用

PromQL (Prometheus Query Language) 是 Prometheus 自己开发的数据查询 DSL 语言，语言表现力非常丰富，内置函数很多，在日常数据可视化以及rule 告警中都会使用到它。

在页面 http://localhost:9090/graph 中，输入下面的查询语句，查看结果，例如：

http_requests_total{code="200"}

字符串和数字

字符串: 在查询语句中，字符串往往作为查询条件 labels 的值，和 Golang 字符串语法一致，可以使用 "", '', 或者 `` , 格式如：

"this is a string"
'these are unescaped: \n \\ \t'
`these are not unescaped: \n ' " \t`

正数，浮点数: 表达式中可以使用正数或浮点数，例如：

3
-2.4

查询结果类型

PromQL 查询结果主要有 3 种类型：

瞬时数据 (Instant vector): 包含一组时序，每个时序只有一个点，例如：http_requests_total
区间数据 (Range vector): 包含一组时序，每个时序有多个点，例如：http_requests_total[5m]
纯量数据 (Scalar): 纯量只有一个数字，没有时序，例如：count(http_requests_total)

查询条件

Prometheus 存储的是时序数据，而它的时序是由名字和一组标签构成的，其实名字也可以写出标签的形式，例如 http_requests_total 等价于 {name="http_requests_total"}。

一个简单的查询相当于是对各种标签的筛选，例如：

http_requests_total{code="200"} // 表示查询名字为 http_requests_total，code 为 "200" 的数据

查询条件支持正则匹配，例如：

http_requests_total{code!="200"}  // 表示查询 code 不为 "200" 的数据
http_requests_total{code=～"2.."} // 表示查询 code 为 "2xx" 的数据
http_requests_total{code!～"2.."} // 表示查询 code 不为 "2xx" 的数据

操作符

Prometheus 查询语句中，支持常见的各种表达式操作符，例如

算术运算符:

支持的算术运算符有 +，-，*，/，%，^, 例如 http_requests_total * 2 表示将 http_requests_total 所有数据 double 一倍。

比较运算符:

支持的比较运算符有 ==，!=，>，<，>=，<=, 例如 http_requests_total > 100 表示 http_requests_total 结果中大于 100 的数据。

逻辑运算符:

支持的逻辑运算符有 and，or，unless, 例如 http_requests_total == 5 or http_requests_total == 2 表示 http_requests_total 结果中等于 5 或者 2 的数据。

聚合运算符:

支持的聚合运算符有 sum，min，max，avg，stddev，stdvar，count，count_values，bottomk，topk，quantile，, 例如 max(http_requests_total) 表示 http_requests_total 结果中最大的数据。

注意，和四则运算类型，Prometheus 的运算符也有优先级，它们遵从（^）> (*, /, %) > (+, -) > (==, !=, <=, <, >=, >) > (and, unless) > (or) 的原则。

内置函数

Prometheus 内置不少函数，方便查询以及数据格式化，例如将结果由浮点数转为整数的 floor 和 ceil，

floor(avg(http_requests_total{code="200"}))
ceil(avg(http_requests_total{code="200"}))

查看 http_requests_total 5分钟内，平均每秒数据

rate(http_requests_total[5m])

更多请参见详情。

2、与 SQL 对比

下面将以 Prometheus server 收集的 http_requests_total 时序数据为例子展开对比。

MySQL 数据准备

mysql>
# 创建数据库
create database prometheus_practice;
use prometheus_practice;

# 创建 http_requests_total 表
CREATE TABLE http_requests_total (
    code VARCHAR(256),
    handler VARCHAR(256),
    instance VARCHAR(256),
    job VARCHAR(256),
    method VARCHAR(256),
    created_at DOUBLE NOT NULL,
    value DOUBLE NOT NULL) ENGINE=InnoDB DEFAULT CHARSET=utf8;

ALTER TABLE http_requests_total ADD INDEX created_at_index (created_at);

# 初始化数据
# time at 2017/5/22 14:45:27
INSERT INTO http_requests_total (code, handler, instance, job, method, created_at, value) values ("200", "query_range", "localhost:9090", "prometheus", "get", 1495435527, 3);
INSERT INTO http_requests_total (code, handler, instance, job, method, created_at, value) values ("400", "query_range", "localhost:9090", "prometheus", "get", 1495435527, 5);
INSERT INTO http_requests_total (code, handler, instance, job, method, created_at, value) values ("200", "prometheus", "localhost:9090", "prometheus", "get", 1495435527, 6418);
INSERT INTO http_requests_total (code, handler, instance, job, method, created_at, value) values ("200", "static", "localhost:9090", "prometheus", "get", 1495435527, 9);
INSERT INTO http_requests_total (code, handler, instance, job, method, created_at, value) values ("304", "static", "localhost:9090", "prometheus", "get", 1495435527, 19);
INSERT INTO http_requests_total (code, handler, instance, job, method, created_at, value) values ("200", "query", "localhost:9090", "prometheus", "get", 1495435527, 87);
INSERT INTO http_requests_total (code, handler, instance, job, method, created_at, value) values ("400", "query", "localhost:9090", "prometheus", "get", 1495435527, 26);
INSERT INTO http_requests_total (code, handler, instance, job, method, created_at, value) values ("200", "graph", "localhost:9090", "prometheus", "get", 1495435527, 7);
INSERT INTO http_requests_total (code, handler, instance, job, method, created_at, value) values ("200", "label_values", "localhost:9090", "prometheus", "get", 1495435527, 7);

# time at 2017/5/22 14:48:27
INSERT INTO http_requests_total (code, handler, instance, job, method, created_at, value) values ("200", "query_range", "localhost:9090", "prometheus", "get", 1495435707, 3);
INSERT INTO http_requests_total (code, handler, instance, job, method, created_at, value) values ("400", "query_range", "localhost:9090", "prometheus", "get", 1495435707, 5);
INSERT INTO http_requests_total (code, handler, instance, job, method, created_at, value) values ("200", "prometheus", "localhost:9090", "prometheus", "get", 1495435707, 6418);
INSERT INTO http_requests_total (code, handler, instance, job, method, created_at, value) values ("200", "static", "localhost:9090", "prometheus", "get", 1495435707, 9);
INSERT INTO http_requests_total (code, handler, instance, job, method, created_at, value) values ("304", "static", "localhost:9090", "prometheus", "get", 1495435707, 19);
INSERT INTO http_requests_total (code, handler, instance, job, method, created_at, value) values ("200", "query", "localhost:9090", "prometheus", "get", 1495435707, 87);
INSERT INTO http_requests_total (code, handler, instance, job, method, created_at, value) values ("400", "query", "localhost:9090", "prometheus", "get", 1495435707, 26);
INSERT INTO http_requests_total (code, handler, instance, job, method, created_at, value) values ("200", "graph", "localhost:9090", "prometheus", "get", 1495435707, 7);
INSERT INTO http_requests_total (code, handler, instance, job, method, created_at, value) values ("200", "label_values", "localhost:9090", "prometheus", "get", 1495435707, 7);

数据初始完成后，通过查询可以看到如下数据：

mysql>
mysql> select * from http_requests_total;
+------+--------------+----------------+------------+--------+------------+-------+
| code | handler      | instance       | job        | method | created_at | value |
+------+--------------+----------------+------------+--------+------------+-------+
| 200  | query_range  | localhost:9090 | prometheus | get    | 1495435527 |     3 |
| 400  | query_range  | localhost:9090 | prometheus | get    | 1495435527 |     5 |
| 200  | prometheus   | localhost:9090 | prometheus | get    | 1495435527 |  6418 |
| 200  | static       | localhost:9090 | prometheus | get    | 1495435527 |     9 |
| 304  | static       | localhost:9090 | prometheus | get    | 1495435527 |    19 |
| 200  | query        | localhost:9090 | prometheus | get    | 1495435527 |    87 |
| 400  | query        | localhost:9090 | prometheus | get    | 1495435527 |    26 |
| 200  | graph        | localhost:9090 | prometheus | get    | 1495435527 |     7 |
| 200  | label_values | localhost:9090 | prometheus | get    | 1495435527 |     7 |
| 200  | query_range  | localhost:9090 | prometheus | get    | 1495435707 |     3 |
| 400  | query_range  | localhost:9090 | prometheus | get    | 1495435707 |     5 |
| 200  | prometheus   | localhost:9090 | prometheus | get    | 1495435707 |  6418 |
| 200  | static       | localhost:9090 | prometheus | get    | 1495435707 |     9 |
| 304  | static       | localhost:9090 | prometheus | get    | 1495435707 |    19 |
| 200  | query        | localhost:9090 | prometheus | get    | 1495435707 |    87 |
| 400  | query        | localhost:9090 | prometheus | get    | 1495435707 |    26 |
| 200  | graph        | localhost:9090 | prometheus | get    | 1495435707 |     7 |
| 200  | label_values | localhost:9090 | prometheus | get    | 1495435707 |     7 |
+------+--------------+----------------+------------+--------+------------+-------+
18 rows in set (0.00 sec)

基本查询对比

假设当前时间为 2017/5/22 14:48:30

查询当前所有数据

// PromQL
http_requests_total

// MySQL
SELECT * from http_requests_total WHERE created_at BETWEEN 1495435700 AND 1495435710;

我们查询 MySQL 数据的时候，需要将当前时间向前推一定间隔，比如这里的 10s (Prometheus 数据抓取间隔)，这样才能确保查询到数据，而 PromQL 自动帮我们实现了这个逻辑。

条件查询

// PromQL
http_requests_total{code="200", handler="query"}

// MySQL
SELECT * from http_requests_total WHERE code="200" AND handler="query" AND created_at BETWEEN 1495435700 AND 1495435710;

模糊查询: code 为 2xx 的数据

// PromQL
http_requests_total{code~="2xx"}

// MySQL
SELECT * from http_requests_total WHERE code LIKE "%2%" AND created_at BETWEEN 1495435700 AND 1495435710;

比较查询: value 大于 100 的数据

// PromQL
http_requests_total > 100

// MySQL
SELECT * from http_requests_total WHERE value > 100 AND created_at BETWEEN 1495435700 AND 1495435710;

范围区间查询: 过去 5 分钟数据

// PromQL
http_requests_total[5m]

// MySQL
SELECT * from http_requests_total WHERE created_at BETWEEN 1495435410 AND 1495435710;

聚合, 统计高级查询

count 查询: 统计当前记录总数

// PromQL
count(http_requests_total)

// MySQL
SELECT COUNT(*) from http_requests_total WHERE created_at BETWEEN 1495435700 AND 1495435710;

sum 查询: 统计当前数据总值

// PromQL
sum(http_requests_total)

// MySQL
SELECT SUM(value) from http_requests_total WHERE created_at BETWEEN 1495435700 AND 1495435710;

avg 查询: 统计当前数据平均值

// PromQL
avg(http_requests_total)

// MySQL
SELECT AVG(value) from http_requests_total WHERE created_at BETWEEN 1495435700 AND 1495435710;

top 查询: 查询最靠前的 3 个值

// PromQL
topk(3, http_requests_total)

// MySQL
SELECT * from http_requests_total WHERE created_at BETWEEN 1495435700 AND 1495435710 ORDER BY value DESC LIMIT 3;

irate 查询，过去 5 分钟平均每秒数值

// PromQL
irate(http_requests_total[5m])

// MySQL
SELECT code, handler, instance, job, method, SUM(value)/300 AS value from http_requests_total WHERE created_at BETWEEN 1495435700 AND 1495435710  GROUP BY code, handler, instance, job, method;

总结

通过以上一些示例可以看出，在常用查询和统计方面，PromQL 比 MySQL 简单和丰富很多，而且查询性能也高不少。

六、数据可视化

收集到数据只是第一步，如果没有很好做到数据可视化，有时很难发现问题。

本章将介绍使用 Prometheus 自带的 web console 以及 grafana 来查询和展现数据。

Prometheus Web

Prometheus 自带了 Web Console，安装成功后可以访问 http://localhost:9090/graph 页面，用它可以进行任何 PromQL 查询和调试工作，非常方便，例如：

通过上图你不难发现，Prometheus 自带的 Web 界面比较简单，因为它的目的是为了及时查询数据，方便 PromeQL 调试。

它并不是像常见的 Admin Dashboard，在一个页面尽可能展示多的数据，如果你有这方面的需求，不妨试试 Grafana。

Grafana 使用

Grafana 是一套开源的分析监视平台，支持 Graphite, InfluxDB, OpenTSDB, Prometheus, Elasticsearch, CloudWatch 等数据源，其 UI 非常漂亮且高度定制化。

这是 Prometheus web console 不具备的，在上一节中我已经说明了选择它的原因。

版本说明

Mac version 4.3.2

安装和运行程序

这里我使用 brew 安装，命令为

brew update
brew install grafana

当安装成功后，你可以使用默认配置启动程序

grafana-server -homepath /usr/local/Cellar/grafana/4.3.2/share/grafana/

如果顺利，你将看到如下日志

INFO[06-11|15:20:14] Starting Grafana                         logger=main version=4.3.2 commit=unknown-dev compiled=2017-06-01T05:47:48+0800
INFO[06-11|15:20:14] Config loaded from                       logger=settings file=/usr/local/Cellar/grafana/4.3.2/share/grafana/conf/defaults.ini
INFO[06-11|15:20:14] Path Home                                logger=settings path=/usr/local/Cellar/grafana/4.3.2/share/grafana/
INFO[06-11|15:20:14] Path Data                                logger=settings path=/usr/local/Cellar/grafana/4.3.2/share/grafana/data
INFO[06-11|15:20:14] Path Logs                                logger=settings path=/usr/local/Cellar/grafana/4.3.2/share/grafana/data/log
INFO[06-11|15:20:14] Path Plugins                             logger=settings path=/usr/local/Cellar/grafana/4.3.2/share/grafana/data/plugins
INFO[06-11|15:20:14] Initializing DB                          logger=sqlstore dbtype=sqlite3
INFO[06-11|15:20:14] Starting DB migration                    logger=migrator
INFO[06-11|15:20:14] Executing migration                      logger=migrator id="copy data account to org"
INFO[06-11|15:20:14] Skipping migration condition not fulfilled logger=migrator id="copy data account to org"
INFO[06-11|15:20:14] Executing migration                      logger=migrator id="copy data account_user to org_user"
INFO[06-11|15:20:14] Skipping migration condition not fulfilled logger=migrator id="copy data account_user to org_user"
INFO[06-11|15:20:14] Starting plugin search                   logger=plugins
INFO[06-11|15:20:14] Initializing Alerting                    logger=alerting.engine
INFO[06-11|15:20:14] Initializing CleanUpService              logger=cleanup
INFO[06-11|15:20:14] Initializing Stream Manager
INFO[06-11|15:20:14] Initializing HTTP Server                 logger=http.server address=0.0.0.0:3000 protocol=http subUrl= socket=

此时，你可以打开页面 http://localhost:3000，访问 Grafana 的 web 界面。

其他平台安装方案，请参考更多安装。

登录并设置 Prometheus 数据源

Grafana 本身支持 Prometheus 数据源，故不需要安装其他插件。

使用默认账号 admin/admin 登录 grafana

在 Dashboard 首页，点击添加数据源

配置 Prometheus 数据源

目前为止，Grafana 已经和 Prometheus 连上了，你将看到这样的 Dashboard

自定义监视画板

由顶部 Manage dashboard -> Settings 进入管理页面

在管理页面中取消 Hide Controls

点击页面底部 + ADD ROW 按钮, 并选择 Graph 类型

点击 Panel Title -> Edit 进入 Panel 编辑页面，并在 Metrics 中的 Metric lookup 选择 go_goroutines

你也可以直接在管理界面中填写 Prometheus 的查询语句，以及修改查询的 step 数值。

当你修改了 Dashboard 后，记得点击顶部的 Save dashboard 按钮，或直接 CTRL+S 保存。

至此，我们自定义的 Panel 已添加完成

我们可以通过拖拽，拉升调节 panel 的位置和尺寸，我们调节的目的是尽量在一个屏幕显示更多信息。

总结

Grafana 是一款非常漂亮，强大的监视分析平台，本身支持了 Prometheus 数据源，所以在做数据和监视可视化的时候，Grafana + Prometheus 是个不错的选择。

配置

Prometheus 启动的时候，可以加载运行参数 -config.file 指定配置文件，默认为 prometheus.yml。

在配置文件中我们可以指定 global, alerting, rule_files, scrape_configs, remote_write, remote_read 等属性。

其代码结构体定义为：

// Config is the top-level configuration for Prometheus's config files.
type Config struct {
    GlobalConfig   GlobalConfig    `yaml:"global"`
    AlertingConfig AlertingConfig  `yaml:"alerting,omitempty"`
    RuleFiles      []string        `yaml:"rule_files,omitempty"`
    ScrapeConfigs  []*ScrapeConfig `yaml:"scrape_configs,omitempty"`

    RemoteWriteConfigs []*RemoteWriteConfig `yaml:"remote_write,omitempty"`
    RemoteReadConfigs  []*RemoteReadConfig  `yaml:"remote_read,omitempty"`

    // Catches all undefined fields and must be empty after parsing.
    XXX map[string]interface{} `yaml:",inline"`

    // original is the input from which the config was parsed.
    original string
}

配置文件结构大概为：

global:
  # How frequently to scrape targets by default.
  [ scrape_interval:  | default = 1m ]

  # How long until a scrape request times out.
  [ scrape_timeout:  | default = 10s ]

  # How frequently to evaluate rules.
  [ evaluation_interval:  | default = 1m ]

  # The labels to add to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    [ :  ... ]

# Rule files specifies a list of globs. Rules and alerts are read from
# all matching files.
rule_files:
  [ -  ... ]

# A list of scrape configurations.
scrape_configs:
  [ -  ... ]

# Alerting specifies settings related to the Alertmanager.
alerting:
  alert_relabel_configs:
    [ -  ... ]
  alertmanagers:
    [ -  ... ]

# Settings related to the experimental remote write feature.
remote_write:
  [ -  ... ]

# Settings related to the experimental remote read feature.
remote_read:
  [ -  ... ]

全局配置

global 属于全局的默认配置，它主要包含 4 个属性，

scrape_interval: 拉取 targets 的默认时间间隔。
scrape_timeout: 拉取一个 target 的超时时间。
evaluation_interval: 执行 rules 的时间间隔。
external_labels: 额外的属性，会添加到拉取的数据并存到数据库中。

其代码结构体定义为：

 // GlobalConfig configures values that are used across other configuration
// objects.
type GlobalConfig struct {
    // How frequently to scrape targets by default.
    ScrapeInterval model.Duration `yaml:"scrape_interval,omitempty"`
    // The default timeout when scraping targets.
    ScrapeTimeout model.Duration `yaml:"scrape_timeout,omitempty"`
    // How frequently to evaluate rules by default.
    EvaluationInterval model.Duration `yaml:"evaluation_interval,omitempty"`
    // The labels to add to any timeseries that this Prometheus instance scrapes.
    ExternalLabels model.LabelSet `yaml:"external_labels,omitempty"`

    // Catches all undefined fields and must be empty after parsing.
    XXX map[string]interface{} `yaml:",inline"`
}

配置文件结构大概为：

global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.
  evaluation_interval: 15s # By default, scrape targets every 15 seconds.
  scrape_timeout: 10s # is set to the global default (10s).

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    monitor: 'codelab-monitor'

告警配置

通常我们可以使用运行参数 -alertmanager.xxx 来配置 Alertmanager，但是这样不够灵活，没有办法做到动态更新加载，以及动态定义告警属性。

所以 alerting 配置主要用来解决这个问题，它能够更好的管理 Alertmanager, 主要包含 2 个参数：

alert_relabel_configs: 动态修改 alert 属性的规则配置。
alertmanagers: 用于动态发现 Alertmanager 的配置。

其代码结构体定义为：

// AlertingConfig configures alerting and alertmanager related configs.
type AlertingConfig struct {
    AlertRelabelConfigs []*RelabelConfig      `yaml:"alert_relabel_configs,omitempty"`
    AlertmanagerConfigs []*AlertmanagerConfig `yaml:"alertmanagers,omitempty"`

    // Catches all undefined fields and must be empty after parsing.
    XXX map[string]interface{} `yaml:",inline"`
}

配置文件结构大概为：

# Alerting specifies settings related to the Alertmanager.
alerting:
  alert_relabel_configs:
    [ -  ... ]
  alertmanagers:
    [ -  ... ]

其中 alertmanagers 为 alertmanager_config 数组，而 alertmanager_config 的代码结构体为,

// AlertmanagerConfig configures how Alertmanagers can be discovered and communicated with.
type AlertmanagerConfig struct {
    // We cannot do proper Go type embedding below as the parser will then parse
    // values arbitrarily into the overflow maps of further-down types.

    ServiceDiscoveryConfig ServiceDiscoveryConfig `yaml:",inline"`
    HTTPClientConfig       HTTPClientConfig       `yaml:",inline"`

    // The URL scheme to use when talking to Alertmanagers.
    Scheme string `yaml:"scheme,omitempty"`
    // Path prefix to add in front of the push endpoint path.
    PathPrefix string `yaml:"path_prefix,omitempty"`
    // The timeout used when sending alerts.
    Timeout time.Duration `yaml:"timeout,omitempty"`

    // List of Alertmanager relabel configurations.
    RelabelConfigs []*RelabelConfig `yaml:"relabel_configs,omitempty"`

    // Catches all undefined fields and must be empty after parsing.
    XXX map[string]interface{} `yaml:",inline"`
}

配置文件结构大概为:

# Per-target Alertmanager timeout when pushing alerts.
[ timeout:  | default = 10s ]

# Prefix for the HTTP path alerts are pushed to.
[ path_prefix:  | default = / ]

# Configures the protocol scheme used for requests.
[ scheme:  | default = http ]

# Sets the `Authorization` header on every request with the
# configured username and password.
basic_auth:
  [ username:  ]
  [ password:  ]

# Sets the `Authorization` header on every request with
# the configured bearer token. It is mutually exclusive with `bearer_token_file`.
[ bearer_token:  ]

# Sets the `Authorization` header on every request with the bearer token
# read from the configured file. It is mutually exclusive with `bearer_token`.
[ bearer_token_file: /path/to/bearer/token/file ]

# Configures the scrape request's TLS settings.
tls_config:
  [  ]

# Optional proxy URL.
[ proxy_url:  ]

# List of Azure service discovery configurations.
azure_sd_configs:
  [ -  ... ]

# List of Consul service discovery configurations.
consul_sd_configs:
  [ -  ... ]

# List of DNS service discovery configurations.
dns_sd_configs:
  [ -  ... ]

# List of EC2 service discovery configurations.
ec2_sd_configs:
  [ -  ... ]

# List of file service discovery configurations.
file_sd_configs:
  [ -  ... ]

# List of GCE service discovery configurations.
gce_sd_configs:
  [ -  ... ]

# List of Kubernetes service discovery configurations.
kubernetes_sd_configs:
  [ -  ... ]

# List of Marathon service discovery configurations.
marathon_sd_configs:
  [ -  ... ]

# List of AirBnB's Nerve service discovery configurations.
nerve_sd_configs:
  [ -  ... ]

# List of Zookeeper Serverset service discovery configurations.
serverset_sd_configs:
  [ -  ... ]

# List of Triton service discovery configurations.
triton_sd_configs:
  [ -  ... ]

# List of labeled statically configured Alertmanagers.
static_configs:
  [ -  ... ]

# List of Alertmanager relabel configurations.
relabel_configs:
  [ -  ... ]

规则配置

rule_files 主要用于配置 rules 文件，它支持多个文件以及文件目录。

其代码结构定义为：

RuleFiles      []string        `yaml:"rule_files,omitempty"`

配置文件结构大致为：

rule_files:
  - "rules/node.rules"
  - "rules2/*.rules"

数据拉取配置

scrape_configs 主要用于配置拉取数据节点，每一个拉取配置主要包含以下参数：

job_name：任务名称
honor_labels：用于解决拉取数据标签有冲突，当设置为 true, 以拉取数据为准，否则以服务配置为准
params：数据拉取访问时带的请求参数
scrape_interval：拉取时间间隔
scrape_timeout: 拉取超时时间
metrics_path：拉取节点的 metric 路径
scheme：拉取数据访问协议
sample_limit：存储的数据标签个数限制，如果超过限制，该数据将被忽略，不入存储；默认值为0，表示没有限制
relabel_configs：拉取数据重置标签配置
metric_relabel_configs：metric 重置标签配置

其代码结构体定义为：

// ScrapeConfig configures a scraping unit for Prometheus.
type ScrapeConfig struct {
    // The job name to which the job label is set by default.
    JobName string `yaml:"job_name"`
    // Indicator whether the scraped metrics should remain unmodified.
    HonorLabels bool `yaml:"honor_labels,omitempty"`
    // A set of query parameters with which the target is scraped.
    Params url.Values `yaml:"params,omitempty"`
    // How frequently to scrape the targets of this scrape config.
    ScrapeInterval model.Duration `yaml:"scrape_interval,omitempty"`
    // The timeout for scraping targets of this config.
    ScrapeTimeout model.Duration `yaml:"scrape_timeout,omitempty"`
    // The HTTP resource path on which to fetch metrics from targets.
    MetricsPath string `yaml:"metrics_path,omitempty"`
    // The URL scheme with which to fetch metrics from targets.
    Scheme string `yaml:"scheme,omitempty"`
    // More than this many samples post metric-relabelling will cause the scrape to fail.
    SampleLimit uint `yaml:"sample_limit,omitempty"`

    // We cannot do proper Go type embedding below as the parser will then parse
    // values arbitrarily into the overflow maps of further-down types.

    ServiceDiscoveryConfig ServiceDiscoveryConfig `yaml:",inline"`
    HTTPClientConfig       HTTPClientConfig       `yaml:",inline"`

    // List of target relabel configurations.
    RelabelConfigs []*RelabelConfig `yaml:"relabel_configs,omitempty"`
    // List of metric relabel configurations.
    MetricRelabelConfigs []*RelabelConfig `yaml:"metric_relabel_configs,omitempty"`

    // Catches all undefined fields and must be empty after parsing.
    XXX map[string]interface{} `yaml:",inline"`
}

以上配置定义中还包含 ServiceDiscoveryConfig，它的代码定义为：

// ServiceDiscoveryConfig configures lists of different service discovery mechanisms.
type ServiceDiscoveryConfig struct {
    // List of labeled target groups for this job.
    StaticConfigs []*TargetGroup `yaml:"static_configs,omitempty"`
    // List of DNS service discovery configurations.
    DNSSDConfigs []*DNSSDConfig `yaml:"dns_sd_configs,omitempty"`
    // List of file service discovery configurations.
    FileSDConfigs []*FileSDConfig `yaml:"file_sd_configs,omitempty"`
    // List of Consul service discovery configurations.
    ConsulSDConfigs []*ConsulSDConfig `yaml:"consul_sd_configs,omitempty"`
    // List of Serverset service discovery configurations.
    ServersetSDConfigs []*ServersetSDConfig `yaml:"serverset_sd_configs,omitempty"`
    // NerveSDConfigs is a list of Nerve service discovery configurations.
    NerveSDConfigs []*NerveSDConfig `yaml:"nerve_sd_configs,omitempty"`
    // MarathonSDConfigs is a list of Marathon service discovery configurations.
    MarathonSDConfigs []*MarathonSDConfig `yaml:"marathon_sd_configs,omitempty"`
    // List of Kubernetes service discovery configurations.
    KubernetesSDConfigs []*KubernetesSDConfig `yaml:"kubernetes_sd_configs,omitempty"`
    // List of GCE service discovery configurations.
    GCESDConfigs []*GCESDConfig `yaml:"gce_sd_configs,omitempty"`
    // List of EC2 service discovery configurations.
    EC2SDConfigs []*EC2SDConfig `yaml:"ec2_sd_configs,omitempty"`
    // List of OpenStack service discovery configurations.
    OpenstackSDConfigs []*OpenstackSDConfig `yaml:"openstack_sd_configs,omitempty"`
    // List of Azure service discovery configurations.
    AzureSDConfigs []*AzureSDConfig `yaml:"azure_sd_configs,omitempty"`
    // List of Triton service discovery configurations.
    TritonSDConfigs []*TritonSDConfig `yaml:"triton_sd_configs,omitempty"`

    // Catches all undefined fields and must be empty after parsing.
    XXX map[string]interface{} `yaml:",inline"`
}

ServiceDiscoveryConfig 主要用于 target 发现，大体分为两类，静态配置和动态发现。

所以，一份完整的 scrape_configs 配置大致为：

# The job name assigned to scraped metrics by default.
job_name: 

# How frequently to scrape targets from this job.
[ scrape_interval:  | default =  ]

# Per-scrape timeout when scraping this job.
[ scrape_timeout:  | default =  ]

# The HTTP resource path on which to fetch metrics from targets.
[ metrics_path:  | default = /metrics ]

# honor_labels controls how Prometheus handles conflicts between labels that are
# already present in scraped data and labels that Prometheus would attach
# server-side ("job" and "instance" labels, manually configured target
# labels, and labels generated by service discovery implementations).
#
# If honor_labels is set to "true", label conflicts are resolved by keeping label
# values from the scraped data and ignoring the conflicting server-side labels.
#
# If honor_labels is set to "false", label conflicts are resolved by renaming
# conflicting labels in the scraped data to "exported_" (for
# example "exported_instance", "exported_job") and then attaching server-side
# labels. This is useful for use cases such as federation, where all labels
# specified in the target should be preserved.
#
# Note that any globally configured "external_labels" are unaffected by this
# setting. In communication with external systems, they are always applied only
# when a time series does not have a given label yet and are ignored otherwise.
[ honor_labels:  | default = false ]

# Configures the protocol scheme used for requests.
[ scheme:  | default = http ]

# Optional HTTP URL parameters.
params:
  [ : [, ...] ]

# Sets the `Authorization` header on every scrape request with the
# configured username and password.
basic_auth:
  [ username:  ]
  [ password:  ]

# Sets the `Authorization` header on every scrape request with
# the configured bearer token. It is mutually exclusive with `bearer_token_file`.
[ bearer_token:  ]

# Sets the `Authorization` header on every scrape request with the bearer token
# read from the configured file. It is mutually exclusive with `bearer_token`.
[ bearer_token_file: /path/to/bearer/token/file ]

# Configures the scrape request's TLS settings.
tls_config:
  [  ]

# Optional proxy URL.
[ proxy_url:  ]

# List of Azure service discovery configurations.
azure_sd_configs:
  [ -  ... ]

# List of Consul service discovery configurations.
consul_sd_configs:
  [ -  ... ]

# List of DNS service discovery configurations.
dns_sd_configs:
  [ -  ... ]

# List of EC2 service discovery configurations.
ec2_sd_configs:
  [ -  ... ]

# List of OpenStack service discovery configurations.
openstack_sd_configs:
  [ -  ... ]

# List of file service discovery configurations.
file_sd_configs:
  [ -  ... ]

# List of GCE service discovery configurations.
gce_sd_configs:
  [ -  ... ]

# List of Kubernetes service discovery configurations.
kubernetes_sd_configs:
  [ -  ... ]

# List of Marathon service discovery configurations.
marathon_sd_configs:
  [ -  ... ]

# List of AirBnB's Nerve service discovery configurations.
nerve_sd_configs:
  [ -  ... ]

# List of Zookeeper Serverset service discovery configurations.
serverset_sd_configs:
  [ -  ... ]

# List of Triton service discovery configurations.
triton_sd_configs:
  [ -  ... ]

# List of labeled statically configured targets for this job.
static_configs:
  [ -  ... ]

# List of target relabel configurations.
relabel_configs:
  [ -  ... ]

# List of metric relabel configurations.
metric_relabel_configs:
  [ -  ... ]

# Per-scrape limit on number of scraped samples that will be accepted.
# If more than this number of samples are present after metric relabelling
# the entire scrape will be treated as failed. 0 means no limit.
[ sample_limit:  | default = 0 ]

远程可写存储

remote_write 主要用于可写远程存储配置，主要包含以下参数：

url: 访问地址
remote_timeout: 请求超时时间
write_relabel_configs: 标签重置配置, 拉取到的数据，经过重置处理后，发送给远程存储

其代码结构体为：

// RemoteWriteConfig is the configuration for writing to remote storage.
type RemoteWriteConfig struct {
    URL                 *URL             `yaml:"url,omitempty"`
    RemoteTimeout       model.Duration   `yaml:"remote_timeout,omitempty"`
    WriteRelabelConfigs []*RelabelConfig `yaml:"write_relabel_configs,omitempty"`

    // We cannot do proper Go type embedding below as the parser will then parse
    // values arbitrarily into the overflow maps of further-down types.
    HTTPClientConfig HTTPClientConfig `yaml:",inline"`

    // Catches all undefined fields and must be empty after parsing.
    XXX map[string]interface{} `yaml:",inline"`
}

一份完整的配置大致为:

# The URL of the endpoint to send samples to.
url: 

# Timeout for requests to the remote write endpoint.
[ remote_timeout:  | default = 30s ]

# List of remote write relabel configurations.
write_relabel_configs:
  [ -  ... ]

# Sets the `Authorization` header on every remote write request with the
# configured username and password.
basic_auth:
  [ username:  ]
  [ password:  ]

# Sets the `Authorization` header on every remote write request with
# the configured bearer token. It is mutually exclusive with `bearer_token_file`.
[ bearer_token:  ]

# Sets the `Authorization` header on every remote write request with the bearer token
# read from the configured file. It is mutually exclusive with `bearer_token`.
[ bearer_token_file: /path/to/bearer/token/file ]

# Configures the remote write request's TLS settings.
tls_config:
  [  ]

# Optional proxy URL.
[ proxy_url:  ]

注意： remote_write 属于试验阶段，慎用，因为在以后的版本中可能发生改变。

远程可读存储

remote_read 主要用于可读远程存储配置，主要包含以下参数：

url: 访问地址
remote_timeout: 请求超时时间

其代码结构体为：

// RemoteReadConfig is the configuration for reading from remote storage.
type RemoteReadConfig struct {
    URL           *URL           `yaml:"url,omitempty"`
    RemoteTimeout model.Duration `yaml:"remote_timeout,omitempty"`

    // We cannot do proper Go type embedding below as the parser will then parse
    // values arbitrarily into the overflow maps of further-down types.
    HTTPClientConfig HTTPClientConfig `yaml:",inline"`

    // Catches all undefined fields and must be empty after parsing.
    XXX map[string]interface{} `yaml:",inline"`
}

一份完整的配置大致为:

# The URL of the endpoint to query from.
url: 

# Timeout for requests to the remote read endpoint.
[ remote_timeout:  | default = 30s ]

# Sets the `Authorization` header on every remote read request with the
# configured username and password.
basic_auth:
  [ username:  ]
  [ password:  ]

# Sets the `Authorization` header on every remote read request with
# the configured bearer token. It is mutually exclusive with `bearer_token_file`.
[ bearer_token:  ]

# Sets the `Authorization` header on every remote read request with the bearer token
# read from the configured file. It is mutually exclusive with `bearer_token`.
[ bearer_token_file: /path/to/bearer/token/file ]

# Configures the remote read request's TLS settings.
tls_config:
  [  ]

# Optional proxy URL.
[ proxy_url:  ]

注意： remote_read 属于试验阶段，慎用，因为在以后的版本中可能发生改变。

服务发现

在 Prometheus 的配置中，一个最重要的概念就是数据源 target，而数据源的配置主要分为静态配置和动态发现, 大致为以下几类：

static_configs: 静态服务发现
dns_sd_configs: DNS 服务发现
file_sd_configs: 文件服务发现
consul_sd_configs: Consul 服务发现
serverset_sd_configs: Serverset 服务发现
nerve_sd_configs: Nerve 服务发现
marathon_sd_configs: Marathon 服务发现
kubernetes_sd_configs: Kubernetes 服务发现
gce_sd_configs: GCE 服务发现
ec2_sd_configs: EC2 服务发现
openstack_sd_configs: OpenStack 服务发现
azure_sd_configs: Azure 服务发现
triton_sd_configs: Triton 服务发现

它们具体使用以及配置模板，请参考服务发现配置模板。

它们中最重要的，也是使用最广泛的应该是 static_configs, 其实那些动态类型都可以看成是某些通用业务使用静态服务封装的结果。

配置样例

Prometheus 的配置参数比较多，但是个人使用较多的是 global, rules, scrap_configs, statstic_config, rebel_config 等。

我平时使用的配置文件大致为这样：

global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.
  evaluation_interval: 15s # By default, scrape targets every 15 seconds.

rule_files:
  - "rules/node.rules"

scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node'
    scrape_interval: 8s
    static_configs:
      - targets: ['127.0.0.1:9100', '127.0.0.12:9100']

  - job_name: 'mysqld'
    static_configs:
      - targets: ['127.0.0.1:9104']
  - job_name: 'memcached'
    static_configs:
      - targets: ['127.0.0.1:9150']

Exporter

在 Prometheus 中负责数据汇报的程序统一叫做 Exporter, 而不同的 Exporter 负责不同的业务。它们具有统一命名格式，即 xx_exporter, 例如负责主机信息收集的 node_exporter。

Prometheus 社区已经提供了很多 exporter, 详情请参考这里。

文本格式

在讨论 Exporter 之前，有必要先介绍一下 Prometheus 文本数据格式，因为一个 Exporter 本质上就是将收集的数据，转化为对应的文本格式，并提供 http 请求。

Exporter 收集的数据转化的文本内容以行 (\n) 为单位，空行将被忽略, 文本内容最后一行为空行。

注释

文本内容，如果以 # 开头通常表示注释。

以 # HELP 开头表示 metric 帮助说明。
以 # TYPE 开头表示定义 metric 类型，包含 counter, gauge, histogram, summary, 和 untyped 类型。
其他表示一般注释，供阅读使用，将被 Prometheus 忽略。

采样数据

内容如果不以 # 开头，表示采样数据。它通常紧挨着类型定义行，满足以下格式：

metric_name [
  "{" label_name "=" `"` label_value `"` { "," label_name "=" `"` label_value `"` } [ "," ] "}"
] value [ timestamp ]

下面是一个完整的例子：

# HELP http_requests_total The total number of HTTP requests.
# TYPE http_requests_total counter
http_requests_total{method="post",code="200"} 1027 1395066363000
http_requests_total{method="post",code="400"}    3 1395066363000

# Escaping in label values:
msdos_file_access_time_seconds{path="C:\\DIR\\FILE.TXT",error="Cannot find file:\n\"FILE.TXT\""} 1.458255915e9

# Minimalistic line:
metric_without_timestamp_and_labels 12.47

# A weird metric from before the epoch:
something_weird{problem="division by zero"} +Inf -3982045

# A histogram, which has a pretty complex representation in the text format:
# HELP http_request_duration_seconds A histogram of the request duration.
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{le="0.05"} 24054
http_request_duration_seconds_bucket{le="0.1"} 33444
http_request_duration_seconds_bucket{le="0.2"} 100392
http_request_duration_seconds_bucket{le="0.5"} 129389
http_request_duration_seconds_bucket{le="1"} 133988
http_request_duration_seconds_bucket{le="+Inf"} 144320
http_request_duration_seconds_sum 53423
http_request_duration_seconds_count 144320

# Finally a summary, which has a complex representation, too:
# HELP rpc_duration_seconds A summary of the RPC duration in seconds.
# TYPE rpc_duration_seconds summary
rpc_duration_seconds{quantile="0.01"} 3102
rpc_duration_seconds{quantile="0.05"} 3272
rpc_duration_seconds{quantile="0.5"} 4773
rpc_duration_seconds{quantile="0.9"} 9001
rpc_duration_seconds{quantile="0.99"} 76656
rpc_duration_seconds_sum 1.7560473e+07
rpc_duration_seconds_count 2693

需要特别注意的是，假设采样数据 metric 叫做 x, 如果 x 是 histogram 或 summary 类型必需满足以下条件：

采样数据的总和应表示为 x_sum。
采样数据的总量应表示为 x_count。
summary 类型的采样数据的 quantile 应表示为 x{quantile="y"}。
histogram 类型的采样分区统计数据将表示为 x_bucket{le="y"}。
histogram 类型的采样必须包含 x_bucket{le="+Inf"}, 它的值等于 x_count 的值。
summary 和 historam 中 quantile 和 le 必需按从小到大顺序排列。

Sample Exporter

既然一个 exporter 就是将收集的数据转化为文本格式，并提供 http 请求即可，那很容自己实现一个。

一个简单的 exporter

下面我将用 golang 实现一个简单的 sample_exporter, 其代码大致为：


package main

import (
    "fmt"
    "net/http"
)

func handler(w http.ResponseWriter, r *http.Request) { fmt.Fprintf(w, exportData) } func main() { http.HandleFunc("/", handler) http.ListenAndServe(":8080", nil) } var exportData string = `# HELP sample_http_requests_total The total number of HTTP requests. # TYPE sample_http_requests_total counter sample_http_requests_total{method="post",code="200"} 1027 1395066363000 sample_http_requests_total{method="post",code="400"} 3 1395066363000 # Escaping in label values: sample_msdos_file_access_time_seconds{path="C:\\DIR\\FILE.TXT",error="Cannot find file:\n\"FILE.TXT\""} 1.458255915e9 # Minimalistic line: sample_metric_without_timestamp_and_labels 12.47 # A histogram, which has a pretty complex representation in the text format: # HELP sample_http_request_duration_seconds A histogram of the request duration. # TYPE sample_http_request_duration_seconds histogram sample_http_request_duration_seconds_bucket{le="0.05"} 24054 sample_http_request_duration_seconds_bucket{le="0.1"} 33444 sample_http_request_duration_seconds_bucket{le="0.2"} 100392 sample_http_request_duration_seconds_bucket{le="0.5"} 129389 sample_http_request_duration_seconds_bucket{le="1"} 133988 sample_http_request_duration_seconds_bucket{le="+Inf"} 144320 sample_http_request_duration_seconds_sum 53423 sample_http_request_duration_seconds_count 144320 # Finally a summary, which has a complex representation, too: # HELP sample_rpc_duration_seconds A summary of the RPC duration in seconds. # TYPE sample_rpc_duration_seconds summary sample_rpc_duration_seconds{quantile="0.01"} 3102 sample_rpc_duration_seconds{quantile="0.05"} 3272 sample_rpc_duration_seconds{quantile="0.5"} 4773 sample_rpc_duration_seconds{quantile="0.9"} 9001 sample_rpc_duration_seconds{quantile="0.99"} 76656 sample_rpc_duration_seconds_sum 1.7560473e+07 sample_rpc_duration_seconds_count 2693 `

当运行此程序，你访问 http://localhost:8080/metrics, 将看到这样的页面：

与 Prometheus 集成

我们可以利用 Prometheus 的 static_configs 来收集 sample_exporter 的数据。

打开 prometheus.yml 文件, 在 scrape_configs 中添加如下配置：

- job_name: "sample"
    static_configs:
      - targets: ["127.0.0.1:8080"]

重启加载配置，然后到 Prometheus Console 查询，你会看到 simple_exporter 的数据。

转载于:https://www.cnblogs.com/xibuhaohao/p/10177995.html

你可能感兴趣的:(Prometheus)

Spring Boot Docker容器监控 - 容器化环境监控方案全面指南 Clf丶忆笙 spring boot docker 后端
文章目录一、容器监控基础概念与重要性1.1为什么需要容器监控1.2容器监控与传统监控的区别1.3核心监控指标分类二、SpringBoot与Docker监控基础集成2.1SpringBootActuator基础配置2.2基础Docker监控配置2.3监控数据可视化基础三、高级监控方案实现3.1多维度JVM监控3.2自定义业务指标3.3容器资源限制与监控四、全链路监控方案4.1集成Prometheus
【kafka】在Linux系统中部署配置Kafka的详细用法教程分享景天科技苑 linux基础与进阶 shell脚本编写实战 kafka linux 分布式 kafka安装配置 kafka优化
✨✨欢迎大家来到景天科技苑✨✨养成好习惯，先赞后看哦~作者简介：景天科技苑《头衔》：大厂架构师，华为云开发者社区专家博主，阿里云开发者社区专家博主，CSDN全栈领域优质创作者，掘金优秀博主，51CTO博客专家等。《博客》：Python全栈，PyQt5和Tkinter桌面应用开发，小程序开发，人工智能，js逆向，App逆向，网络系统安全，云原生K8S，Prometheus监控，数据分析，Django
Spring Boot应用监控与管理：Actuator+Prometheus+Grafana终极指南（2025） allenXer Spring Boot 信息可视化 spring boot java
SpringBoot应用监控与管理：Actuator+Prometheus+Grafana终极指南（2025）随着微服务架构的普及，应用监控已成为生产环境的必备能力。本文深入探讨如何通过SpringBootActuator提供深度应用监控，配合Prometheus和Grafana构建完整的企业级监控解决方案。一、监控架构全景图1.1监控技术栈组成1.2核心组件功能对比组件角色关键能力Actuato
全栈运维的“诅咒”与“荣光”：为什么“万金油”工程师是项目成功的隐藏MVP？云原生水神职业发展系统运维运维
大家好，今天，我们来聊一个特殊且至关重要的群体：运维工程师。特别是那些在项目制中，以一己之力扛起一个或多个产品生死的“全能战士”。你是否就是其中一员？你的技能树上点亮了：操作系统、网络协议、mysql与Redis中间件、Docker与K8s容器化、Ansible与Terraform自动化、Go/Python工具开发、Prometheus监控体系、opentelemetry可视化，甚至要负责信息安全
涨薪技术|Prometheus之PromQL操作符川石课堂软件测试 prometheus python 数据库 postman 测试工具 appium 功能测试
使用PromQL除了能够方便的按照查询和过滤时间序列以外，PromQL还支持丰富的操作符，用户可以使用这些操作符对进一步的对事件序列进行二次加工。这些操作符包括：数学运算符，逻辑运算符，布尔运算符等等。01数学运算例如，我们可以通过指标node_memory_free_bytes_total获取当前主机可用的内存空间大小，其样本单位为Bytes。这是如果客户端要求使用MB作为单位响应数据，那只需要
构建企业级大模型运行监控体系：健康度五级指标与实战部署路径全解析
构建企业级大模型运行监控体系：健康度五级指标与实战部署路径全解析关键词：模型运行监控、健康度分级体系、DeepSeek、私有化部署、Prometheus、Grafana、异常检测、推理稳定性、性能观测、可视化大屏摘要：在DeepSeek大模型私有化部署的生产环境中，传统的“是否可用”监控已难以满足对模型稳定性、推理质量与异常风险的精细管理需求。为此，企业必须构建一套基于五级健康度模型的全维监控体系
Pushgateway扩展Prometheus监控 ivwdcwso 运维与云原生 prometheus k8s 云原生
Pushgateway是Prometheus生态系统中的一个重要组件,它允许我们将短期作业或批处理任务的指标推送到Prometheus中。本文将详细介绍如何安装、配置和使用Pushgateway来扩展Prometheus监控。1.Pushgateway简介Pushgateway主要用于解决以下场景:短期作业无法被Prometheus直接抓取批处理任务需要推送指标防火墙后的应用需要主动推送指标它作为
Prometheus系列01-Prometheus的单机版二进制部署 tinychen777 Devops linux 监控程序 centos
作为CNCF中最成功的开源项目之一，Prometheus已经成为了云原生监控的代名词，被广泛应用在Kubernetes和OpenShift等项目中，同时有很多第三方解决方案也会集成Prometheus。随着Kubernetes在容器调度和管理上确定领头羊的地位，Prometheus也成为Kubernetes容器监控的标配。考虑到k8s系统的复杂性和上手难度较高，本文将从最简单最基础的部分开始循序渐
【Prometheus】cAdvisor工作原理介绍码上淘金 prometheus
cAdvisor（ContainerAdvisor）是Google开源的容器监控工具，专注于实时采集和暴露容器级别的资源使用数据。其底层实现基于Linux内核的多项技术，结合高效的事件驱动架构，实现对容器资源的细粒度监控。以下从核心机制、数据采集原理和架构实现三方面详细解析：一、核心依赖技术cAdvisor的监控能力建立在Linux内核提供的底层机制之上：cgroups（控制组）资源隔离与统计：c
【Prometheus】通过tar包部署单机版Prometheus 和 Pushgateway
在ECS（ElasticComputeService）机器上通过tar包部署Prometheus和Pushgateway，并配置Prometheus采集Pushgateway的数据，是一个常见的监控部署任务。以下是详细的步骤说明：环境准备操作系统：Linux（如CentOS、Ubuntu）已安装tar命名已开通ECS实例的相应端口（9090forPrometheus,9091forPushgate
【Java 面试八股学习自用版】MYSQL优化-------定位慢查询以及分析
定位慢查询以及分析导致慢查询的一些原因聚合查询多表查询表数据量过大查询深度分页查询此时的表现为：页面加载过慢接口压测响应时间过长（1s以上）。定位方法（定位哪一条）方法一开源工具调试工具Arthas运维工具prometheusSkywalkingMySql自带慢日志需要在配置文件中开启设置开启以及时间阈值（ps2s）注意：一般在调试阶段开启注意一般结合自己项目说！！！！！分析慢SQL语句的原因聚合
可观测性大脑：Pyroscope+Tempo实现代码级根因定位 Star_Sea_77 云原生可观测性根因分析性能剖析分布式追踪智能运维
可观测性大脑：Pyroscope+Tempo实现代码级根因定位摘要本文针对传统可观测性方案“指标、链路、性能数据割裂”的痛点（某电商故障定位平均耗时3.5小时），提出基于Pyroscope+Tempo的“可观测性大脑”方案。通过Prometheus告警触发性能热点与分布式链路的智能关联，实现从“指标异常”到“代码级根因”的一键定位：Pyroscope生成CPU火焰图锁定耗时代码方法，Tempo追溯
Spring Cloud（微服务部署与监控）白仑色 Spring系列 spring cloud 微服务 spring 微服务部署服务监控健康检查
摘要在微服务架构中，随着服务数量的增长和部署复杂度的提升，如何高效部署、持续监控、快速定位问题并实现自动化运维成为保障系统稳定性的关键。本文将围绕SpringCloud微服务的部署与监控展开，深入讲解：微服务打包与部署方式（JAR/Docker/Kubernetes）如何构建CI/CD流水线服务健康检查与自动恢复机制Prometheus+Grafana实现指标可视化监控ELK实现日志集中管理Sky
container_memory_working_set_bytes` 与 `container_memory_usage_bytes` 的区别强哥之神 prometheus 容器 docker k8s
在Prometheus中，container_memory_working_set_bytes与container_memory_usage_bytes的区别如下：计算方式及包含内容：container_memory_usage_bytes：表示容器当前使用的总内存，包括所有内存，不管这些内存是否最近被访问过，也不管其是否可以被操作系统回收，即它包含了缓存、工作集等所有内存部分。container
Zabbix和Prometheus的区别运维小贺 zabbix prometheus 运维
Zabbix监控平台监控概念对服务的管理，不能仅限于可用性。还需要服务可以安全、稳定、高效地运行。监控的目的：早发现、早治疗。被监控的资源类型：公开数据：对外开放的，不需要认证即可获取的数据私有数据：对外不开放，需要认证、权限才能获得的数据Zabbix是什么？Zabbix是个适用于监控硬件服务器的一款开源的分布式监控方案实施监控的几个方面：数据采集：使用agent（可安装软件的系统上）、SNMP（
半导体FAB中的服务器硬件故障监控与预防全方案：从预警到零宕机实战爱吃青菜的大力水手服务器运维半导体 FAB运维 IT运维
服务器硬件故障监控与预防全方案：从预警到零宕机实战关键词：SMART监控RAID预警IPMI传感器性能基线PrometheusZabbix高可用架构一、硬件故障前的7大预警信号（附关联工具）故障类型关键指标监控工具预警阈值磁盘故障Reallocated_Sector_Countsmartctl+smartd>0立即告警Current_Pending_SectorPrometheus+NodeExp
Istio 深度解析与实战：从原理到应用的全面指南阿贾克斯的黎明 java istio 网络云原生
目录Istio深度解析与实战：从原理到应用的全面指南一、Istio原理深度剖析1.数据平面2.控制平面二、Istio的安装与部署1.环境准备2.安装Istio3.注入Sidecar三、Istio实战应用场景1.流量管理（1）简单路由（2）流量镜像2.安全防护（1）服务间双向认证（2）基于角色的访问控制（RBAC）3.监控与可观测性（1）启用Prometheus和Grafana（2）查看监控指标四、
AI原生应用微服务监控：Prometheus+Grafana实战 AI原生应用开发 AI-native 微服务 prometheus ai
AI原生应用微服务监控：Prometheus+Grafana实战关键词：微服务监控、Prometheus、Grafana、AI应用、指标收集、可视化告警、云原生摘要：本文将深入探讨如何为AI原生应用构建完整的微服务监控系统。我们将从基础概念出发，详细介绍Prometheus的指标收集机制和Grafana的可视化能力，并通过实际案例展示如何搭建完整的监控解决方案。文章包含详细的配置示例、架构图解和最
Python HTTP服务监控：Prometheus与自定义Exporter开发指南
在微服务架构中，HTTP服务的高效监控对保障系统稳定性至关重要。Prometheus作为云原生监控标杆，通过其Pull模型与灵活的指标体系，结合Python开发的自定义Exporter，可实现HTTP服务性能、可用性及业务指标的全面观测。Prometheus监控核心机制Prometheus采用时间序列数据库存储指标数据，每条数据由指标名称（如http_requests_total）、标签（如met
机器学习模型监控警报系统设计：Prometheus+Evidently 实战教程大熊计算机机器学习 prometheus 人工智能
1.系统架构设计：从数据采集到智能告警（1）监控系统核心组件交互图预测请求监控指标告警规则通知渠道预测结果质量报告时序数据模型服务PrometheusExporterPrometheusServerAlertmanager邮件/Slack/WebhookEvidently服务可视化仪表盘图解：系统采用双引擎架构，Prometheus负责基础监控指标采集与告警触发，Evidently执行深度模型分析
Gitea 服务器监控面板的搭建 shengyin714959 笔记最高笔记服务器 gitea 数据库
Prometheus是一个开源的服务监控系统和时序数据库。Grafana是一个可视化的数据分析面板，它可以从Prometheus中查询时序数据，绘制漂亮的数据图表。本文作者在实践中使用Prometheus抓取和存储Gitea服务器的运行数据，并基于Grafana提供的开源数据面板创建了一个自己服务器的Gitea性能监控面板。工作原理为了更清晰地理解Prometheus的工作原理，我在下方列出了Pr
kube-promethesu调整coredns监控 jingleli21 docker linux 运维
K8s集群版本是二进制部署的1.20.4，kube-prometheus对应选择的版本是kube-prometheus-0.8.0Coredns是在安装集群的时候部署的，采用的也是该版本的官方文档，kube-prometheus中也有coredns的监控配置信息，但是在prometheus的监控页面并没有发现coredns的servicemonitor.。所以我们需要一步步的去排查该问题。先看下c
Promtail收集docker容器的日志 jingleli21 docker
什么是Promtail？Promtail是Linux操作系统上的一个服务，它会扫描日志文件，并将它们提取到Loki中。Loki是Grafana的一个日志聚合工具，它类似于Prometheus，但主要用于日志数据。Promtail能够自动发现运行中的Docker容器，并抓取它们的日志。Promtail的工作原理Promtail的工作原理可以简单概括为以下几个步骤：监控日志文件：Promtail不断扫
16.7 Prometheus+Grafana实战：容器化监控与日志聚合一站式解决方案少林码僧 prometheus grafana 人工智能 langchain llama 语言模型机器学习
《Prometheus+Grafana实战：容器化监控与日志聚合一站式解决方案》关键词：容器化监控、日志聚合、Prometheus、Grafana、ELKStack、用户反馈收集容器化监控与日志系统的架构设计在LanguageMentorAgent生产部署中，监控系统需要覆盖以下维度：
prometheus+grafana+MySQL监控甲柒运维监控 prometheus grafana mysql
prometheus+grafana+MySQL监控环境说明操作前提：先去搭建Docker部署prometheus+grafana+...这篇文章的系统Docker部署prometheus+grafana+...的参考文章：Docker部署prometheus+grafana+…-CSDN博客在的节点服务器上搭建MySQL数据库（可以采用直接安装或者docker部署）搭建MySQL数据库的参考文章
Prometheus + Grafana监控方案详解：从入门到实战风偷走了蒲公开发知识 Prometheus Grafana 监控 DevOps Node.js
Prometheus+Grafana监控方案详解：从入门到实战1.引言在现代分布式系统中，监控是保障系统稳定性的关键。Prometheus作为一款开源的监控工具，结合Grafana的可视化能力，能够提供强大的监控解决方案。本文将详细介绍Prometheus+Grafana的监控方案，并通过丰富的代码示例和应用场景帮助读者快速掌握。2.Prometheus基础2.1Prometheus简介Prome
Kylin Linux Advanced Server V10 离线安装 Prometheus + Grafana + node_exporter指南晴空06 操作系统管理工具性能测试 kylin linux prometheus
离线安装Prometheus+Grafana+InfluxDB指南(KylinLinuxAdvancedServerV10)最终结果展示准备工作在一台有互联网连接的机器上下载所有必要的安装包和依赖准备一个USB驱动器或内部网络共享位置来传输文件确保目标服务器有足够的资源运行这些服务下载离线安装包在有网络的机器上下载以下组件：Prometheuswgethttps://github.com/prom
Sentinel：微服务稳定性的守护者未来并未来 sentinel 微服务 java
首先，我们要明确Sentinel在微服务架构中的定位。Sentinel并不是一个全功能的监控或追踪系统（比如Prometheus+Grafana组合或Jaeger/Zipkin），它的核心定位是流量控制（TrafficControl）和熔断降级（CircuitBreaking&Degradation）。简单理解，它的任务就是：管住流量：监控服务接口的访问量，当流量超过设定的阈值时，进行拦截（限流）
OSS监控体系搭建：Prometheus+Grafana实时监控流量、错误码、存储量（开源方案替代云监控自定义视图）大熊计算机 #阿里云 prometheus grafana 开源
1.开源监控方案核心架构设计（1）技术选型对比分析当前主流OSS监控方案可分为三类：云厂商自带监控（如阿里云云监控）开源方案（Prometheus生态）商业APM工具（如Datadog）通过以下维度进行对比：维度云监控自定义视图Prometheus+Grafana商业APM工具数据采集粒度1分钟15秒（可调）10秒存储成本按量收费自控存储周期高额订阅费告警灵活性基础阈值告警支持PromQL复杂逻辑
java全家桶之35: jvm如何调优 leijmdas java
JVM调优指南：提升性能与稳定性JVM调优是Java应用性能优化的关键环节，合理的调优可以显著提高应用吞吐量、降低延迟并减少资源消耗。以下是系统的JVM调优方法和实践：一、调优基础准备监控先行使用工具收集基线数据：jstat-监控GC情况jstack-分析线程堆栈jmap-内存分析VisualVM/Arthas-可视化监控Prometheus+Grafana-生产级监控确定优化目标吞吐量优先（批处
ztree异步加载 3213213333332132 JavaScript Ajax json Web ztree
相信新手用ztree的时候,对异步加载会有些困惑，我开始的时候也是看了API花了些时间才搞定了异步加载，在这里分享给大家。我后台代码生成的是json格式的数据，数据大家按各自的需求生成，这里只给出前端的代码。设置setting，这里只关注async属性的配置 var setting = { //异步加载配置
thirft rpc 具体调用流程 BlueSkator 中间件 rpc thrift
Thrift调用过程中，Thrift客户端和服务器之间主要用到传输层类、协议层类和处理类三个主要的核心类，这三个类的相互协作共同完成rpc的整个调用过程。在调用过程中将按照以下顺序进行协同工作：（1）将客户端程序调用的函数名和参数传递给协议层（TProtocol），协议
异或运算推导, 交换数据 dcj3sjt126com PHP 异或 ^
/* * 5 0101 * 9 1010 * * 5 ^ 5 * 0101 * 0101 * ----- * 0000 * 得出第一个规律: 相同的数进行异或, 结果是0 * * 9 ^ 5 ^ 6 * 1010 * 0101 * ---- * 1111 * * 1111 * 0110 * ---- * 1001
事件源对象周华华 JavaScript
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml&q
MySql配置及相关命令 g21121 mysql
MySQL安装完毕后我们需要对它进行一些设置及性能优化，主要包括字符集设置，启动设置，连接优化，表优化，分区优化等等。一修改MySQL密码及用户
[简单]poi删除excel 2007超链接 53873039oycg Excel
采用解析sheet.xml方式删除超链接，缺点是要打开文件2次,代码如下: public void removeExcel2007AllHyperLink(String filePath) throws Exception { OPCPackage ocPkg = OPCPac
Struts2添加 open flash chart 云端月影
准备以下开源项目： 1. Struts 2.1.6 2. Open Flash Chart 2 Version 2 Lug Wyrm Charmer (28th, July 2009) 3. jofc2，这东西不知道是没做好还是什么意思，好像和ofc2不怎么匹配，最好下源码，有什么问题直接改。 4. log4j 用eclipse新建动态网站，取名OFC2Demo，将Struts2 l
spring包详解 aijuans spring
下载的spring包中文件及各种包众多，在项目中往往只有部分是我们必须的，如果不清楚什么时候需要什么包的话，看看下面就知道了。 aspectj目录下是在Spring框架下使用aspectj的源代码和测试程序文件。Aspectj是java最早的提供AOP的应用框架。 dist 目录下是Spring 的发布包，关于发布包下面会详细进行说明。 docs&nb
网站推广之seo概念 antonyup_2006 算法 Web 应用服务器搜索引擎 Google
持续开发一年多的b2c网站终于在08年10月23日上线了。作为开发人员的我在修改bug的同时，准备了解下网站的推广分析策略。所谓网站推广，目的在于让尽可能多的潜在用户了解并访问网站，通过网站获得有关产品和服务等信息，为最终形成购买决策提供支持。网站推广策略有很多，seo，email，adv
单例模式,sql注入,序列百合不是茶单例模式序列 sql注入预编译
序列在前面写过有关的博客,也有过总结,但是今天在做一个JDBC操作数据库的相关内容时需要使用序列创建一个自增长的字段居然不会了,所以将序列写在本篇的前面 1,序列是一个保存数据连续的增长的一种方式; 序列的创建; CREATE SEQUENCE seq_pro 2 INCREMENT BY 1 -- 每次加几个 3
Mockito单元测试实例 bijian1013 单元测试 mockito
Mockito单元测试实例： public class SettingServiceTest { private List<PersonDTO> personList = new ArrayList<PersonDTO>(); @InjectMocks private SettingPojoService settin
精通Oracle10编程SQL(9)使用游标 bijian1013 oracle 数据库 plsql
/* *使用游标 */ --显示游标 --在显式游标中使用FETCH...INTO语句 DECLARE CURSOR emp_cursor is select ename,sal from emp where deptno=1; v_ename emp.ename%TYPE; v_sal emp.sal%TYPE; begin ope
【Java语言】动态代理 bit1129 java语言
JDK接口动态代理 JDK自带的动态代理通过动态的根据接口生成字节码(实现接口的一个具体类)的方式，为接口的实现类提供代理。被代理的对象和代理对象通过InvocationHandler建立关联 package com.tom; import com.tom.model.User; import com.tom.service.IUserService;
Java通信之URL通信基础白糖_ java jdk webservice 网络协议 ITeye
java对网络通信以及提供了比较全面的jdk支持，java.net包能让程序员直接在程序中实现网络通信。在技术日新月异的现在，我们能通过很多方式实现数据通信，比如webservice、url通信、socket通信等等，今天简单介绍下URL通信。学习准备：建议首先学习java的IO基础知识 URL是统一资源定位器的简写，URL可以访问Internet和www，可以通过url
博弈Java讲义 - Java线程同步 (1) boyitech java 多线程同步锁
在并发编程中经常会碰到多个执行线程共享资源的问题。例如多个线程同时读写文件，共用数据库连接，全局的计数器等。如果不处理好多线程之间的同步问题很容易引起状态不一致或者其他的错误。同步不仅可以阻止一个线程看到对象处于不一致的状态，它还可以保证进入同步方法或者块的每个线程，都看到由同一锁保护的之前所有的修改结果。处理同步的关键就是要正确的识别临界条件（cri
java-给定字符串，删除开始和结尾处的空格，并将中间的多个连续的空格合并成一个。 bylijinnan java
public class DeleteExtraSpace { /** * 题目：给定字符串，删除开始和结尾处的空格，并将中间的多个连续的空格合并成一个。 * 方法1.用已有的String类的trim和replaceAll方法 * 方法2.全部用正则表达式，这个我不熟 * 方法3.“重新发明轮子”，从头遍历一次 */ public static v
An error has occurred.See the log file错误解决！ Kai_Ge MyEclipse
今天早上打开MyEclipse时，自动关闭！弹出An error has occurred.See the log file错误提示！很郁闷昨天启动和关闭还好着！！！打开几次依然报此错误，确定不是眼花了！打开日志文件！找到当日错误文件内容： --------------------------------------------------------------------------
[矿业与工业]修建一个空间矿床开采站要多少钱? comsci
地球上的钛金属矿藏已经接近枯竭........... 我们在冥王星的一颗卫星上面发现一些具有开采价值的矿床..... 那么,现在要编制一个预算,提交给财政部门..
解析Google Map Routes dai_lm google api
为了获得从A点到B点的路劲，经常会使用Google提供的API，例如 [url] http://maps.googleapis.com/maps/api/directions/json?origin=40.7144,-74.0060&destination=47.6063,-122.3204&sensor=false [/url] 从返回的结果上，大致可以了解应该怎么走，但
SQL还有多少“理所应当”？ datamachine sql
转贴存档，原帖地址：http://blog.chinaunix.net/uid-29242841-id-3968998.html、http://blog.chinaunix.net/uid-29242841-id-3971046.html！ ------------------------------------华丽的分割线--------------------------------
Yii使用Ajax验证时，如何设置某些字段不需要验证 dcj3sjt126com Ajax yii
经常像你注册页面,你可能非常希望只需要Ajax去验证用户名和Email,而不需要使用Ajax再去验证密码,默认如果你使用Yii 内置的ajax验证Form,例如: $form=$this->beginWidget('CActiveForm', array( 'id'=>'usuario-form',&
使用git同步网站代码 dcj3sjt126com crontab git
转自:http://ued.ctrip.com/blog/?p=3646?tn=gongxinjun.com 管理一网站，最开始使用的虚拟空间，采用提供商支持的ftp上传网站文件，后换用vps，vps可以自己搭建ftp的，但是懒得搞，直接使用scp传输文件到服务器，现在需要更新文件到服务器，使用scp真的很烦。发现本人就职的公司，采用的git+rsync的方式来管理、同步代码，遂
sql基本操作蕃薯耀 sql sql基本操作 sql常用操作
sql基本操作 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 蕃薯耀 2015年6月1日 17:30:33 星期一 &
Spring4+Hibernate4+Atomikos3.3多数据源事务管理 hanqunfeng Hibernate4
Spring3+后不再对JTOM提供支持，所以可以改用Atomikos管理多数据源事务。Spring2.5+Hibernate3+JTOM参考：http://hanqunfeng.iteye.com/blog/1554251Atomikos官网网站：http://www.atomikos.com/ 一.pom.xml <dependency> <
jquery中两个值得注意的方法one()和trigger()方法 jackyrong trigger
在jquery中，有两个值得注意但容易忽视的方法，分别是one()方法和trigger()方法,这是从国内作者<<jquery权威指南》一书中看到不错的介绍 1） one方法 one方法的功能是让所选定的元素绑定一个仅触发一次的处理函数，格式为 one(type,${data},fn) &nb
拿工资不仅仅是让你写代码的 lampcy 工作面试咨询
这是我对团队每个新进员工说的第一件事情。这句话的意思是，我并不关心你是如何快速完成任务的，哪怕代码很差，只要它像救生艇通气门一样管用就行。这句话也是我最喜欢的座右铭之一。这个说法其实很合理：我们的工作是思考客户提出的问题，然后制定解决方案。思考第一，代码第二，公司请我们的最终目的不是写代码，而是想出解决方案。话粗理不粗。付你薪水不是让你来思考的，也不是让你来写代码的，你的目的是交付产品
架构师之对象操作----------对象的效率复制和判断是否全为空 nannan408 架构师
1.前言。如题。 2.代码。 (1)对象的复制，比spring的beanCopier在大并发下效率要高，利用net.sf.cglib.beans.BeanCopier Src src=new Src(); BeanCopier beanCopier = BeanCopier.create(Src.class, Des.class, false);
ajax 被缓存的解决方案 Rainbow702 JavaScript jquery Ajax cache 缓存
使用jquery的ajax来发送请求进行局部刷新画面，各位可能都做过。今天碰到一个奇怪的现象，就是，同一个ajax请求，在chrome中，不论发送多少次，都可以发送至服务器端，而不会被缓存。但是，换成在IE下的时候，发现，同一个ajax请求，会发生被缓存的情况，只有第一次才会被发送至服务器端，之后的不会再被发送。郁闷。解决方法如下： ① 直接使用 JQuery提供的 “cache”参数，
修改date.toLocaleString()的警告 tntxia String
我们在写程序的时候，经常要查看时间，所以我们经常会用到date.toLocaleString()，但是date.toLocaleString()是一个过时的API，代替的方法如下： package com.tntxia.htmlmaker.util; import java.text.SimpleDateFormat; import java.util.
项目完成后的小总结 xiaomiya js 总结项目
项目完成了，突然想做个总结但是有点无从下手了。做之前对于客户端给的接口很模式。然而定义好了格式要求就如此的愉快了。先说说项目主要实现的功能吧 1，按键精灵 2，获取行情数据 3，各种input输入条件判断 4，发送数据（有json格式和string格式） 5，获取预警条件列表和预警结果列表， 6，排序， 7，预警结果分页获取 8，导出文件（excel，text等） 9，修