安装prometheus
1 下载
https://prometheus.io/download/
2 解压安装
wget https://github.com/prometheus/prometheus/releases/download/v2.11.1/prometheus-2.11.1.linux-amd64.tar.gz
tar xf prometheus-2.8.0.linux-amd64.tar.gz
mv prometheus-2.8.0.linux-amd64 /usr/local/prometheus
cd /usr/local/prometheus
./prometheus --version
3 将prometheus写成系统服务
cat>/lib/systemd/system/prometheus.service<
安装node_exporter
1 下载安装
wget https://github.com/prometheus/node_exporter/releases/download/v0.17.0/node_exporter-0.17.0.linux-amd64.tar.gz
安装agent
tar xf node_exporter-0.17.0.linux-amd64.tar.gz
cd node_exporter-0.17.0.linux-amd64
2 将agent写成系统服务
cat>/lib/systemd/system/node_exporter.service<
3 向prometheus注册(只需要在prometheus主配置文件中添加job)
vim prometheus.yml
1) 你可以这样写即一个job对应一台被监控主机
scrape_configs下添加
- job_name: 'linux-node'
static_configs:
- targets: ['10.10.25.149:9100']
labels:
instance: node1
2) 你还可以这样写即一个job对应一组主机
scrape_configs下添加
- job_name: 'linux-node-cluster'
static_configs:
- targets: ['node1:9100','node2:9100','node3:9100']
labels:
instance: nodecluster
注:如果使用主机名或者域名注册,那么必须保证域名或者主机名能够被解析
重启 prometheus
最后补充内容:prometheus启动项定义,我在上面的启动守护进程中是用的是默认启动方式,如果你想要自定义启动一些 选项可以查看帮助(node_exporter也是一样,因为node_exporter默认开启以一部分数据收集项还有一部分未开启,如果想要开启也通过帮助查看使用未开启的选项)
[root@aliyun-hk-yabo-prod-jiranew prometheus210]# ./prometheus --help
usage: prometheus []
The Prometheus monitoring server
Flags:
-h, --help Show context-sensitive help (also try --help-long and --help-man).
--version Show application version.
--config.file="prometheus.yml"
Prometheus configuration file path.
--web.listen-address="0.0.0.0:9090"
Address to listen on for UI, API, and telemetry.
--web.read-timeout=5m Maximum duration before timing out read of the request, and closing
idle connections.
--web.max-connections=512 Maximum number of simultaneous connections.
--web.external-url= The URL under which Prometheus is externally reachable (for example,
if Prometheus is served via a reverse proxy). Used for generating
relative and absolute links back to Prometheus itself. If the URL has
a path portion, it will be used to prefix all HTTP endpoints served
by Prometheus. If omitted, relevant URL components will be derived
automatically.
--web.route-prefix= Prefix for the internal routes of web endpoints. Defaults to path of
--web.external-url.
--web.user-assets= Path to static asset directory, available at /user.
--web.enable-lifecycle Enable shutdown and reload via HTTP request.
--web.enable-admin-api Enable API endpoints for admin control actions.
--web.console.templates="consoles"
Path to the console template directory, available at /consoles.
--web.console.libraries="console_libraries"
Path to the console library directory.
--web.page-title="Prometheus Time Series Collection and Processing Server"
Document title of Prometheus instance.
--web.cors.origin=".*" Regex for CORS origin. It is fully anchored. Example:
'https?://(domain1|domain2)\.com'
--storage.tsdb.path="data/"
Base path for metrics storage.
--storage.tsdb.retention=STORAGE.TSDB.RETENTION
[DEPRECATED] How long to retain samples in storage. This flag has
been deprecated, use "storage.tsdb.retention.time" instead.
--storage.tsdb.retention.time=STORAGE.TSDB.RETENTION.TIME
How long to retain samples in storage. When this flag is set it
overrides "storage.tsdb.retention". If neither this flag nor
"storage.tsdb.retention" nor "storage.tsdb.retention.size" is set,
the retention time defaults to 15d.
--storage.tsdb.retention.size=STORAGE.TSDB.RETENTION.SIZE
[EXPERIMENTAL] Maximum number of bytes that can be stored for blocks.
Units supported: KB, MB, GB, TB, PB. This flag is experimental and
can be changed in future releases.
--storage.tsdb.no-lockfile
Do not create lockfile in data directory.
--storage.tsdb.allow-overlapping-blocks
[EXPERIMENTAL] Allow overlapping blocks, which in turn enables
vertical compaction and vertical query merge.
--storage.remote.flush-deadline=
How long to wait flushing sample on shutdown or config reload.
--storage.remote.read-sample-limit=5e7
Maximum overall number of samples to return via the remote read
interface, in a single query. 0 means no limit.
--storage.remote.read-concurrent-limit=10
Maximum number of concurrent remote read calls. 0 means no limit.
--rules.alert.for-outage-tolerance=1h
Max time to tolerate prometheus outage for restoring "for" state of
alert.
--rules.alert.for-grace-period=10m
Minimum duration between alert and restored "for" state. This is
maintained only for alerts with configured "for" time greater than
grace period.
--rules.alert.resend-delay=1m
Minimum amount of time to wait before resending an alert to
Alertmanager.
--alertmanager.notification-queue-capacity=10000
The capacity of the queue for pending Alertmanager notifications.
--alertmanager.timeout=10s
Timeout for sending alerts to Alertmanager.
--query.lookback-delta=5m The maximum lookback duration for retrieving metrics during
expression evaluations.
--query.timeout=2m Maximum time a query may take before being aborted.
--query.max-concurrency=20
Maximum number of queries executed concurrently.
--query.max-samples=50000000
Maximum number of samples a single query can load into memory. Note
that queries will fail if they try to load more samples than this
into memory, so this also limits the number of samples a query can
return.
--log.level=info Only log messages with the given severity or above. One of: [debug,
info, warn, error]
--log.format=logfmt Output format of log messages. One of: [logfmt, json]