prometheus和node_exporter安装

安装prometheus

1 下载 

https://prometheus.io/download/

2 解压安装

wget https://github.com/prometheus/prometheus/releases/download/v2.11.1/prometheus-2.11.1.linux-amd64.tar.gz
tar xf prometheus-2.8.0.linux-amd64.tar.gz

mv prometheus-2.8.0.linux-amd64 /usr/local/prometheus

cd /usr/local/prometheus

./prometheus --version

3 将prometheus写成系统服务

cat>/lib/systemd/system/prometheus.service<

安装node_exporter

1 下载安装

wget https://github.com/prometheus/node_exporter/releases/download/v0.17.0/node_exporter-0.17.0.linux-amd64.tar.gz
安装agent
tar xf node_exporter-0.17.0.linux-amd64.tar.gz
cd node_exporter-0.17.0.linux-amd64

2 将agent写成系统服务

cat>/lib/systemd/system/node_exporter.service<

3 向prometheus注册(只需要在prometheus主配置文件中添加job)

vim prometheus.yml 


1) 你可以这样写即一个job对应一台被监控主机
scrape_configs下添加
  - job_name: 'linux-node'         
    static_configs:
    - targets: ['10.10.25.149:9100']     
      labels: 
         instance: node1

2) 你还可以这样写即一个job对应一组主机
scrape_configs下添加
  - job_name: 'linux-node-cluster'         
    static_configs:
    - targets: ['node1:9100','node2:9100','node3:9100']     
      labels: 
         instance: nodecluster
注:如果使用主机名或者域名注册,那么必须保证域名或者主机名能够被解析


重启 prometheus

最后补充内容:prometheus启动项定义,我在上面的启动守护进程中是用的是默认启动方式,如果你想要自定义启动一些 选项可以查看帮助(node_exporter也是一样,因为node_exporter默认开启以一部分数据收集项还有一部分未开启,如果想要开启也通过帮助查看使用未开启的选项)

[root@aliyun-hk-yabo-prod-jiranew prometheus210]# ./prometheus --help
usage: prometheus []

The Prometheus monitoring server

Flags:
  -h, --help                     Show context-sensitive help (also try --help-long and --help-man).
      --version                  Show application version.
      --config.file="prometheus.yml"  
                                 Prometheus configuration file path.
      --web.listen-address="0.0.0.0:9090"  
                                 Address to listen on for UI, API, and telemetry.
      --web.read-timeout=5m      Maximum duration before timing out read of the request, and closing
                                 idle connections.
      --web.max-connections=512  Maximum number of simultaneous connections.
      --web.external-url=   The URL under which Prometheus is externally reachable (for example,
                                 if Prometheus is served via a reverse proxy). Used for generating
                                 relative and absolute links back to Prometheus itself. If the URL has
                                 a path portion, it will be used to prefix all HTTP endpoints served
                                 by Prometheus. If omitted, relevant URL components will be derived
                                 automatically.
      --web.route-prefix=  Prefix for the internal routes of web endpoints. Defaults to path of
                                 --web.external-url.
      --web.user-assets=   Path to static asset directory, available at /user.
      --web.enable-lifecycle     Enable shutdown and reload via HTTP request.
      --web.enable-admin-api     Enable API endpoints for admin control actions.
      --web.console.templates="consoles"  
                                 Path to the console template directory, available at /consoles.
      --web.console.libraries="console_libraries"  
                                 Path to the console library directory.
      --web.page-title="Prometheus Time Series Collection and Processing Server"  
                                 Document title of Prometheus instance.
      --web.cors.origin=".*"     Regex for CORS origin. It is fully anchored. Example:
                                 'https?://(domain1|domain2)\.com'
      --storage.tsdb.path="data/"  
                                 Base path for metrics storage.
      --storage.tsdb.retention=STORAGE.TSDB.RETENTION  
                                 [DEPRECATED] How long to retain samples in storage. This flag has
                                 been deprecated, use "storage.tsdb.retention.time" instead.
      --storage.tsdb.retention.time=STORAGE.TSDB.RETENTION.TIME  
                                 How long to retain samples in storage. When this flag is set it
                                 overrides "storage.tsdb.retention". If neither this flag nor
                                 "storage.tsdb.retention" nor "storage.tsdb.retention.size" is set,
                                 the retention time defaults to 15d.
      --storage.tsdb.retention.size=STORAGE.TSDB.RETENTION.SIZE  
                                 [EXPERIMENTAL] Maximum number of bytes that can be stored for blocks.
                                 Units supported: KB, MB, GB, TB, PB. This flag is experimental and
                                 can be changed in future releases.
      --storage.tsdb.no-lockfile  
                                 Do not create lockfile in data directory.
      --storage.tsdb.allow-overlapping-blocks  
                                 [EXPERIMENTAL] Allow overlapping blocks, which in turn enables
                                 vertical compaction and vertical query merge.
      --storage.remote.flush-deadline=  
                                 How long to wait flushing sample on shutdown or config reload.
      --storage.remote.read-sample-limit=5e7  
                                 Maximum overall number of samples to return via the remote read
                                 interface, in a single query. 0 means no limit.
      --storage.remote.read-concurrent-limit=10  
                                 Maximum number of concurrent remote read calls. 0 means no limit.
      --rules.alert.for-outage-tolerance=1h  
                                 Max time to tolerate prometheus outage for restoring "for" state of
                                 alert.
      --rules.alert.for-grace-period=10m  
                                 Minimum duration between alert and restored "for" state. This is
                                 maintained only for alerts with configured "for" time greater than
                                 grace period.
      --rules.alert.resend-delay=1m  
                                 Minimum amount of time to wait before resending an alert to
                                 Alertmanager.
      --alertmanager.notification-queue-capacity=10000  
                                 The capacity of the queue for pending Alertmanager notifications.
      --alertmanager.timeout=10s  
                                 Timeout for sending alerts to Alertmanager.
      --query.lookback-delta=5m  The maximum lookback duration for retrieving metrics during
                                 expression evaluations.
      --query.timeout=2m         Maximum time a query may take before being aborted.
      --query.max-concurrency=20  
                                 Maximum number of queries executed concurrently.
      --query.max-samples=50000000  
                                 Maximum number of samples a single query can load into memory. Note
                                 that queries will fail if they try to load more samples than this
                                 into memory, so this also limits the number of samples a query can
                                 return.
      --log.level=info           Only log messages with the given severity or above. One of: [debug,
                                 info, warn, error]
      --log.format=logfmt        Output format of log messages. One of: [logfmt, json]

 

转载于:https://my.oschina.net/54188zz/blog/3070599

你可能感兴趣的:(prometheus和node_exporter安装)