最近公司有对服务器进行性能监控的需求在查阅大量资料后。本人将从零开始一步步演示在Centos上搭建监控系统平台
为监控线上服务器CPU、内存、磁盘、IO等信息需要借助node_exporter完成以上机器信息收集在下面你将会了解到:
我们可以从github查找自己需要的node_exporter版本进行下载。
下载地址如下:
wget https://github.com/prometheus/node_exporter/releases/download/v0.18.1/node_exporter-0.18.1.linux-amd64.tar.gz
解压安装:
tar -zxvf node_exporter-0.18.1.linux-amd64.tar.gz -C /soft/service
mv node_exporter-0.18.1.linux-amd64 node_exporter
进入node_exporter目录运行服务:
cd node_exporter
./node_exporter
启动后你将会看到下面的显示的界面且可以清晰看到此服务运行绑定端口号为9100如下所示:
浏览器可以访问此端口以查看效果(需开放此端口或者关闭防火墙):
Prometheus是一个开源的系统监控和警报工具包,rometheus使用Go语言开发,是Google BorgMon监控系统的开源版本。
官方文档展示的架构图如下所示:
官方下载地址:https://prometheus.io/download/
下载prometheus安装包:
wget https://github.com/prometheus/prometheus/releases/download/v2.18.1/prometheus-2.18.1.linux-amd64.tar.gz
解压安装:
tar -zxvf prometheus-2.18.1.linux-amd64.tar.gz -C /soft/service
mv prometheus-2.18.1.linux-amd64 prometheus
配置 prometheus监控目标:
cd prometheus
vim prometheus.yml
在配置文件中添加如下内容:
- job_name: 'export_test2'
static_configs:
- targets: ['10.211.55.5:9100']
labels:
instance: 'node2'
然后我们浏览器访问此9100端口如果出现以下界面,恭喜你已经成功安装好了prometheus。
然后我们输入一个 node_cpu_seconds_total
命令并点击Execute 查看是否有数据输出如下所示:
prometheus默认采用的是本地磁盘做数据存储,本地存储的优势就是运维简单但是缺点就是无法海量的metrics持久化和数据存在丢失的风险,数据写入可能造成wal文件损坏导致采集数据无法再写入的问题。
为了解决单节点存储的限制,prometheus没有自己实现集群存储,而是提供了远程读写的接口,让用户自己选择合适的时序数据库来实现prometheus的扩展性。
Prometheus 提供接口将数据写入到第三方存储系统亦提供接口读取第三方存储系统存储的数据原理如下所示:
接下来我们将node_exporter收集到Prometheus的数据持久化到influxdb数据库中。
InfluxDB(时序数据库)常用的一种使用场景:服务器监控数据统计然后将数据统计汇总并借助Grafana进行图形化展示
官方下载地址:https://portal.influxdata.com/downloads/
下载安装包:
wget https://dl.influxdata.com/influxdb/releases/influxdb-1.5.2.x86_64.rpm
安装rpm包
sudo yum localinstal linfluxdb-1.5.2.x86_64.rpm
启动服务并设置开机启动:
# 启动InfluxDB服务、添加开机启动:
systemctl start influxdb
systemctl enable influxdb
当安装完毕后输入influx 然后就弹出如下界面:
接下来可以创建数据库以及用户
# 创建名称为prometheus的数据库实例
1.create database prometheus
# 切换数据库实例prometheus
2. use prometheus
# 创建用户名和密码都为node的用户,注意密码只能用''字符否则influxdb将会报错
3.create user "node" with password 'node'
这个是prometheus 官方提供的写适配器插件,通过Prometheus的远程写协议接收样本,并将它们存储在Graphite, InfluxDB, or OpenTSDB 中。
下载此插件需要机器拥有go环境,这样就可以自主编译remote_storage_adapter插件,关于go环境配置这里就不过多介绍读者可以从网上很多博客找到相关素材。
编译好后即可运行插件,如果没有go环境也不想编译此组件,也可以下载这个编译好的组件:remote_storage_adapter:
./remote_storage_adapter --influxdb-url=http://127.0.0.1:8086/ --influxdb.database="prometheus" --influxdb.retention-policy=autogen
# 远程写配置
remote_write:
- url: "http://localhost:9201/write"
# 配置连接influxdb连接的用户名与密码
basic_auth:
username: node
password: node
# 远程读配置
remote_read:
- url: "http://localhost:9201/read"
basic_auth:
username: node
password: node
然后重启prometheus 然后可以在启动日志可以看到如下输出:
然后我们在进入influx查看是否已拥有数据:
[root@donniegao prometheus-2.17.1.linux-amd64]# influx
Connected to http://localhost:8086 version 1.5.2
InfluxDB shell version: 1.5.2
> use prometheus
Using database prometheus
> show measurements
name: measurements
name
----
_
go_gc_duration_seconds
go_gc_duration_seconds_count
go_gc_duration_seconds_sum
go_goroutines
go_info
go_memstats_alloc_bytes
go_memstats_alloc_bytes_total
go_memstats_buck_hash_sys_bytes
go_memstats_frees_total
go_memstats_gc_cpu_fraction
go_memstats_gc_sys_bytes
go_memstats_heap_alloc_bytes
go_memstats_heap_idle_bytes
go_memstats_heap_inuse_bytes
go_memstats_heap_objects
go_memstats_heap_released_bytes
go_memstats_heap_sys_bytes
go_memstats_last_gc_time_seconds
go_memstats_lookups_total
go_memstats_mallocs_total
go_memstats_mcache_inuse_bytes
go_memstats_mcache_sys_bytes
go_memstats_mspan_inuse_bytes
go_memstats_mspan_sys_bytes
go_memstats_next_gc_bytes
go_memstats_other_sys_bytes
go_memstats_stack_inuse_bytes
go_memstats_stack_sys_bytes
go_memstats_sys_bytes
go_threads
net_conntrack_dialer_conn_attempted_total
net_conntrack_dialer_conn_closed_total
net_conntrack_dialer_conn_established_total
net_conntrack_dialer_conn_failed_total
net_conntrack_listener_conn_accepted_total
net_conntrack_listener_conn_closed_total
node_arp_entries
node_boot_time_seconds
node_context_switches_total
node_cpu_guest_seconds_total
node_cpu_seconds_total
node_disk_io_now
node_disk_io_time_seconds_total
node_disk_io_time_weighted_seconds_total
node_disk_read_bytes_total
node_disk_read_time_seconds_total
node_disk_reads_completed_total
node_disk_reads_merged_total
node_disk_write_time_seconds_total
node_disk_writes_completed_total
node_disk_writes_merged_total
node_disk_written_bytes_total
node_entropy_available_bits
node_exporter_build_info
node_filefd_allocated
node_filefd_maximum
node_filesystem_avail_bytes
node_filesystem_device_error
node_filesystem_files
node_filesystem_files_free
node_filesystem_free_bytes
node_filesystem_readonly
node_filesystem_size_bytes
node_forks_total
node_hwmon_chip_names
node_hwmon_sensor_label
node_hwmon_temp_celsius
node_hwmon_temp_crit_alarm_celsius
node_hwmon_temp_crit_celsius
node_hwmon_temp_max_celsius
node_intr_total
node_load1
node_load15
node_load5
node_memory_Active_anon_bytes
node_memory_Active_bytes
node_memory_Active_file_bytes
node_memory_AnonHugePages_bytes
node_memory_AnonPages_bytes
node_memory_Bounce_bytes
node_memory_Buffers_bytes
node_memory_Cached_bytes
node_memory_CmaFree_bytes
node_memory_CmaTotal_bytes
node_memory_CommitLimit_bytes
node_memory_Committed_AS_bytes
node_memory_DirectMap2M_bytes
node_memory_DirectMap4k_bytes
node_memory_Dirty_bytes
node_memory_HardwareCorrupted_bytes
node_memory_HugePages_Free
node_memory_HugePages_Rsvd
node_memory_HugePages_Surp
node_memory_HugePages_Total
node_memory_Hugepagesize_bytes
node_memory_Inactive_anon_bytes
node_memory_Inactive_bytes
node_memory_Inactive_file_bytes
node_memory_KernelStack_bytes
node_memory_Mapped_bytes
node_memory_MemAvailable_bytes
node_memory_MemFree_bytes
node_memory_MemTotal_bytes
node_memory_Mlocked_bytes
node_memory_NFS_Unstable_bytes
node_memory_PageTables_bytes
node_memory_SReclaimable_bytes
node_memory_SUnreclaim_bytes
node_memory_Shmem_bytes
node_memory_Slab_bytes
node_memory_SwapCached_bytes
node_memory_SwapFree_bytes
node_memory_SwapTotal_bytes
node_memory_Unevictable_bytes
node_memory_VmallocChunk_bytes
node_memory_VmallocTotal_bytes
node_memory_VmallocUsed_bytes
node_memory_WritebackTmp_bytes
node_memory_Writeback_bytes
node_netstat_Icmp6_InErrors
node_netstat_Icmp6_InMsgs
node_netstat_Icmp6_OutMsgs
node_netstat_Icmp_InErrors
node_netstat_Icmp_InMsgs
node_netstat_Icmp_OutMsgs
node_netstat_Ip6_InOctets
node_netstat_Ip6_OutOctets
node_netstat_IpExt_InOctets
node_netstat_IpExt_OutOctets
node_netstat_Ip_Forwarding
node_netstat_TcpExt_ListenDrops
node_netstat_TcpExt_ListenOverflows
node_netstat_TcpExt_SyncookiesFailed
node_netstat_TcpExt_SyncookiesRecv
node_netstat_TcpExt_SyncookiesSent
node_netstat_TcpExt_TCPSynRetrans
node_netstat_Tcp_ActiveOpens
node_netstat_Tcp_CurrEstab
node_netstat_Tcp_InErrs
node_netstat_Tcp_InSegs
node_netstat_Tcp_OutSegs
node_netstat_Tcp_PassiveOpens
node_netstat_Tcp_RetransSegs
node_netstat_Udp6_InDatagrams
node_netstat_Udp6_InErrors
node_netstat_Udp6_NoPorts
node_netstat_Udp6_OutDatagrams
node_netstat_UdpLite6_InErrors
node_netstat_UdpLite_InErrors
node_netstat_Udp_InDatagrams
node_netstat_Udp_InErrors
node_netstat_Udp_NoPorts
node_netstat_Udp_OutDatagrams
node_network_address_assign_type
node_network_carrier
node_network_carrier_changes_total
node_network_device_id
node_network_dormant
node_network_flags
node_network_iface_id
node_network_iface_link
node_network_iface_link_mode
node_network_info
node_network_mtu_bytes
node_network_net_dev_group
node_network_protocol_type
node_network_receive_bytes_total
node_network_receive_compressed_total
node_network_receive_drop_total
node_network_receive_errs_total
node_network_receive_fifo_total
node_network_receive_frame_total
node_network_receive_multicast_total
node_network_receive_packets_total
node_network_transmit_bytes_total
node_network_transmit_carrier_total
node_network_transmit_colls_total
node_network_transmit_compressed_total
node_network_transmit_drop_total
node_network_transmit_errs_total
node_network_transmit_fifo_total
node_network_transmit_packets_total
node_network_transmit_queue_length
node_network_up
node_procs_blocked
node_procs_running
node_scrape_collector_duration_seconds
node_scrape_collector_success
node_sockstat_FRAG_inuse
node_sockstat_FRAG_memory
node_sockstat_RAW_inuse
node_sockstat_TCP_alloc
node_sockstat_TCP_inuse
node_sockstat_TCP_mem
node_sockstat_TCP_mem_bytes
node_sockstat_TCP_orphan
node_sockstat_TCP_tw
node_sockstat_UDPLITE_inuse
node_sockstat_UDP_inuse
node_sockstat_UDP_mem
node_sockstat_UDP_mem_bytes
node_sockstat_sockets_used
node_textfile_scrape_error
node_time_seconds
node_timex_estimated_error_seconds
node_timex_frequency_adjustment_ratio
node_timex_loop_time_constant
node_timex_maxerror_seconds
node_timex_offset_seconds
node_timex_pps_calibration_total
node_timex_pps_error_total
node_timex_pps_frequency_hertz
node_timex_pps_jitter_seconds
node_timex_pps_jitter_total
node_timex_pps_shift_seconds
node_timex_pps_stability_exceeded_total
node_timex_pps_stability_hertz
node_timex_status
node_timex_sync_status
node_timex_tai_offset_seconds
node_timex_tick_seconds
node_uname_info
node_vmstat_pgfault
node_vmstat_pgmajfault
node_vmstat_pgpgin
node_vmstat_pgpgout
node_vmstat_pswpin
node_vmstat_pswpout
node_xfs_allocation_btree_compares_total
node_xfs_allocation_btree_lookups_total
node_xfs_allocation_btree_records_deleted_total
node_xfs_allocation_btree_records_inserted_total
node_xfs_block_map_btree_compares_total
node_xfs_block_map_btree_lookups_total
node_xfs_block_map_btree_records_deleted_total
node_xfs_block_map_btree_records_inserted_total
node_xfs_block_mapping_extent_list_compares_total
node_xfs_block_mapping_extent_list_deletions_total
node_xfs_block_mapping_extent_list_insertions_total
node_xfs_block_mapping_extent_list_lookups_total
node_xfs_block_mapping_reads_total
node_xfs_block_mapping_unmaps_total
node_xfs_block_mapping_writes_total
node_xfs_extent_allocation_blocks_allocated_total
node_xfs_extent_allocation_blocks_freed_total
node_xfs_extent_allocation_extents_allocated_total
node_xfs_extent_allocation_extents_freed_total
process_cpu_seconds_total
process_max_fds
process_open_fds
process_resident_memory_bytes
process_start_time_seconds
process_virtual_memory_bytes
process_virtual_memory_max_bytes
prometheus_api_remote_read_queries
prometheus_build_info
prometheus_config_last_reload_success_timestamp_seconds
prometheus_config_last_reload_successful
prometheus_engine_queries
prometheus_engine_queries_concurrent_max
prometheus_engine_query_duration_seconds
prometheus_engine_query_duration_seconds_count
prometheus_engine_query_duration_seconds_sum
prometheus_engine_query_log_enabled
prometheus_engine_query_log_failures_total
prometheus_http_request_duration_seconds_bucket
prometheus_http_request_duration_seconds_count
prometheus_http_request_duration_seconds_sum
prometheus_http_requests_total
prometheus_http_response_size_bytes_bucket
prometheus_http_response_size_bytes_count
prometheus_http_response_size_bytes_sum
prometheus_notifications_alertmanagers_discovered
prometheus_notifications_dropped_total
prometheus_notifications_queue_capacity
prometheus_notifications_queue_length
prometheus_remote_storage_dropped_samples_total
prometheus_remote_storage_enqueue_retries_total
prometheus_remote_storage_failed_samples_total
prometheus_remote_storage_highest_timestamp_in_seconds
prometheus_remote_storage_pending_samples
prometheus_remote_storage_queue_highest_sent_timestamp_seconds
prometheus_remote_storage_remote_read_queries
prometheus_remote_storage_retried_samples_total
prometheus_remote_storage_samples_in_total
prometheus_remote_storage_sent_batch_duration_seconds_bucket
prometheus_remote_storage_sent_batch_duration_seconds_count
prometheus_remote_storage_sent_batch_duration_seconds_sum
prometheus_remote_storage_sent_bytes_total
prometheus_remote_storage_shard_capacity
prometheus_remote_storage_shards
prometheus_remote_storage_shards_desired
prometheus_remote_storage_shards_max
prometheus_remote_storage_shards_min
prometheus_remote_storage_string_interner_zero_reference_releases_total
prometheus_remote_storage_succeeded_samples_total
prometheus_rule_evaluation_duration_seconds_count
prometheus_rule_evaluation_duration_seconds_sum
prometheus_rule_evaluation_failures_total
prometheus_rule_evaluations_total
prometheus_rule_group_duration_seconds_count
prometheus_rule_group_duration_seconds_sum
prometheus_rule_group_iterations_missed_total
prometheus_rule_group_iterations_total
prometheus_sd_consul_rpc_duration_seconds_count
prometheus_sd_consul_rpc_duration_seconds_sum
prometheus_sd_consul_rpc_failures_total
prometheus_sd_discovered_targets
prometheus_sd_dns_lookup_failures_total
prometheus_sd_dns_lookups_total
prometheus_sd_failed_configs
prometheus_sd_file_read_errors_total
prometheus_sd_file_scan_duration_seconds_count
prometheus_sd_file_scan_duration_seconds_sum
prometheus_sd_kubernetes_events_total
prometheus_sd_received_updates_total
prometheus_sd_updates_total
prometheus_target_interval_length_seconds
prometheus_target_interval_length_seconds_count
prometheus_target_interval_length_seconds_sum
prometheus_target_metadata_cache_bytes
prometheus_target_metadata_cache_entries
prometheus_target_scrape_pool_reloads_failed_total
prometheus_target_scrape_pool_reloads_total
prometheus_target_scrape_pool_sync_total
prometheus_target_scrape_pools_failed_total
prometheus_target_scrape_pools_total
prometheus_target_scrapes_cache_flush_forced_total
prometheus_target_scrapes_exceeded_sample_limit_total
prometheus_target_scrapes_sample_duplicate_timestamp_total
prometheus_target_scrapes_sample_out_of_bounds_total
prometheus_target_scrapes_sample_out_of_order_total
prometheus_target_sync_length_seconds
prometheus_target_sync_length_seconds_count
prometheus_target_sync_length_seconds_sum
prometheus_template_text_expansion_failures_total
prometheus_template_text_expansions_total
prometheus_treecache_watcher_goroutines
prometheus_treecache_zookeeper_failures_total
prometheus_tsdb_blocks_loaded
prometheus_tsdb_checkpoint_creations_failed_total
prometheus_tsdb_checkpoint_creations_total
prometheus_tsdb_checkpoint_deletions_failed_total
prometheus_tsdb_checkpoint_deletions_total
prometheus_tsdb_compaction_chunk_range_seconds_bucket
prometheus_tsdb_compaction_chunk_range_seconds_count
prometheus_tsdb_compaction_chunk_range_seconds_sum
prometheus_tsdb_compaction_chunk_samples_bucket
prometheus_tsdb_compaction_chunk_samples_count
prometheus_tsdb_compaction_chunk_samples_sum
prometheus_tsdb_compaction_chunk_size_bytes_bucket
prometheus_tsdb_compaction_chunk_size_bytes_count
prometheus_tsdb_compaction_chunk_size_bytes_sum
prometheus_tsdb_compaction_duration_seconds_bucket
prometheus_tsdb_compaction_duration_seconds_count
prometheus_tsdb_compaction_duration_seconds_sum
prometheus_tsdb_compaction_populating_block
prometheus_tsdb_compactions_failed_total
prometheus_tsdb_compactions_skipped_total
prometheus_tsdb_compactions_total
prometheus_tsdb_compactions_triggered_total
prometheus_tsdb_head_active_appenders
prometheus_tsdb_head_chunks
prometheus_tsdb_head_chunks_created_total
prometheus_tsdb_head_chunks_removed_total
prometheus_tsdb_head_gc_duration_seconds_count
prometheus_tsdb_head_gc_duration_seconds_sum
prometheus_tsdb_head_max_time
prometheus_tsdb_head_max_time_seconds
prometheus_tsdb_head_min_time
prometheus_tsdb_head_min_time_seconds
prometheus_tsdb_head_samples_appended_total
prometheus_tsdb_head_series
prometheus_tsdb_head_series_created_total
prometheus_tsdb_head_series_not_found_total
prometheus_tsdb_head_series_removed_total
prometheus_tsdb_head_truncations_failed_total
prometheus_tsdb_head_truncations_total
prometheus_tsdb_isolation_high_watermark
prometheus_tsdb_isolation_low_watermark
prometheus_tsdb_lowest_timestamp
prometheus_tsdb_lowest_timestamp_seconds
prometheus_tsdb_reloads_failures_total
prometheus_tsdb_reloads_total
prometheus_tsdb_retention_limit_bytes
prometheus_tsdb_size_retentions_total
prometheus_tsdb_storage_blocks_bytes
prometheus_tsdb_symbol_table_size_bytes
prometheus_tsdb_time_retentions_total
prometheus_tsdb_tombstone_cleanup_seconds_bucket
prometheus_tsdb_tombstone_cleanup_seconds_count
prometheus_tsdb_tombstone_cleanup_seconds_sum
prometheus_tsdb_vertical_compactions_total
prometheus_tsdb_wal_completed_pages_total
prometheus_tsdb_wal_corruptions_total
prometheus_tsdb_wal_fsync_duration_seconds
prometheus_tsdb_wal_fsync_duration_seconds_count
prometheus_tsdb_wal_fsync_duration_seconds_sum
prometheus_tsdb_wal_page_flushes_total
prometheus_tsdb_wal_segment_current
prometheus_tsdb_wal_truncate_duration_seconds_count
prometheus_tsdb_wal_truncate_duration_seconds_sum
prometheus_tsdb_wal_truncations_failed_total
prometheus_tsdb_wal_truncations_total
prometheus_tsdb_wal_writes_failed_total
prometheus_wal_watcher_current_segment
prometheus_wal_watcher_record_decode_failures_total
prometheus_wal_watcher_records_read_total
prometheus_wal_watcher_samples_sent_pre_tailing_total
promhttp_metric_handler_requests_in_flight
promhttp_metric_handler_requests_total
scrape_duration_seconds
scrape_samples_post_metric_relabeling
scrape_samples_scraped
scrape_series_added
up
>
如果你的influxdb数据库有数据那么接下来就可以安装grafana了。
grafana 是一款采用 go 语言编写的开源应用,主要用于大规模指标数据的可视化展现,是网络架构和应用分析中最流行的时序数据展示工具,
官方下载地址:https://grafana.com/grafana/download
下载安装包:
wget https://dl.grafana.com/oss/release/grafana-6.7.2-1.x86_64.rpm
安装grafana
yum localinstall -y grafana-6.7.2-1.x86_64.rpm
启动服务并添加开机启动
systemctl start grafana-server
systemctl enablegrafana-server.service
浏览器访问3000端口如下:
接下来需配置一下grafana的prometheus数据源
这里我已经进行了配置如下:
然后我们在配置influxdb
选择influxdb并设置连接属性如下:
接下来我们访问 https://grafana.com/grafana/dashboards下载自己需要的dashborad模版如下所示:
我们这里点击上图 Node Exporter for Prometheus Dashboard CN v20191102
并下载这个模版
接下来我们需要导入此模版详细步骤截图如下所示:
选择上传json文件如下所示:
然后这里我们使用刚刚下载的模版json文件如下所示: