数据采集端
1)node_exporter 负责服务器数据采集
目前调用的参数有 CPU、memory 和进出口带宽
2)rtc_exporter 负责业务指标数据采集
部分核心代码
func (c *ClusterManager) Collect(ch chan<- prometheus.Metric) {
i++
timestamp := time.Now().Unix()
fmt.Println()
tm := time.Unix(timestamp, 0)
fmt.Println("timestamp:", timestamp, " time.Unix:", time.Unix(timestamp, 0), " value:", i)
ch <- prometheus.NewMetricWithTimestamp(
tm,
prometheus.MustNewConstMetric(
c.OOMCountDesc,
prometheus.GaugeValue,
float64(i),
"testhost",
),
)
}
func NewClusterManager(user string, qps string) *ClusterManager {
return &ClusterManager{
OOMCountDesc: prometheus.NewDesc(
"rtc_server",
"Data from rtc server...",
[]string{"host"},
prometheus.Labels{"user": user, "qps": qps},
),
}
}
// Since we are dealing with custom Collector implementations, it might
// be a good idea to try it out with a pedantic registry.
reg := prometheus.NewPedanticRegistry()
reg.MustRegister(workerDB)
gatherers := prometheus.Gatherers{
//prometheus.DefaultGatherer,
reg,
}
h := promhttp.HandlerFor(gatherers,
promhttp.HandlerOpts{
ErrorLog: log.NewErrorLogger(),
ErrorHandling: promhttp.ContinueOnError,
})
http.HandleFunc("/metrics", func(w http.ResponseWriter, r *http.Request) {
h.ServeHTTP(w, r)
})
//log.Infoln("Start server at :8081")
if err := http.ListenAndServe(":8088", nil); err != nil {
log.Errorf("Error occur when start server %v", err)
os.Exit(1)
}
prometheus 监控系统
1)负责数据的收集和查询
prometheus.yml 部分配置
scrape_configs:
- job_name: 'roma-test'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:8088']
remote_write:
- url: "http://localhost:9201/write"
remote_read:
- url: "http://localhost:9201/read"
自带查询页面
grafana 监控数据展示
1)负责监控数据的展示
rtc_dispatcher 监控数据调度
1)负责业务数据的调度
获取的源数据格式
golang 解析
resp, err := http.PostForm(posturl, url.Values{"start": {start}, "end": {end}, "step": {step}})
if err != nil {
fmt.Println("Error:", err)
}
defer resp.Body.Close()
body, err := ioutil.ReadAll(resp.Body)
if err != nil {
fmt.Println("Error:", err)
}
var body_obj map[string]interface{}
json.Unmarshal([]byte(body), &body_obj)
var account_data = body_obj["data"].(map[string]interface{})
//fmt.Println(account_data)
var result_obj = account_data["result"].([]interface{})
var ret_data = make([]map[string]interface{}, 0)
for _, ite := range result_obj {
fmt.Println(ite)
item := ite.(map[string]interface{})
var temp_result = make(map[string]interface{})
var metric_obj = item["metric"].(map[string]interface{})
fmt.Println(metric_obj)
temp_result["Hostname"] = ""
temp_result["Eip"] = metric_obj["instance"]
temp_result["Port"] = ""
temp_result["State"] = "running"
ret_data = append(ret_data, temp_result)
}
rest_resp.WriteEntity(&Resp{0, "GetRtcServerResponse", "GetRtcServer ok", 0, ret_data})
return
clickhouse 远端数据存储
1) 负责数据的落地和历史记录查询
安装可参考 https://www.jianshu.com/p/4f3c6bbbbfa9
关于此架构,有以下几点:
每个k8s集群部署一个Prometheus-clickhouse-adapter 。clickhouse 集群部署,需要zk集群做一致性表数据复制。
而clickhouse 的集群示意图如下:
ReplicatedMergeTree + Distributed。ReplicatedMergeTree里,共享同一个ZK路径的表,会相互,注意是,相互同步数据
每个IDC有3个分片,各自占1/3数据
每个节点,依赖ZK,各自有2个副本