RTC 监控系统体系

RTC监控体系架构图.png

数据采集端
1)node_exporter 负责服务器数据采集
目前调用的参数有 CPU、memory 和进出口带宽

2)rtc_exporter 负责业务指标数据采集
部分核心代码

func (c *ClusterManager) Collect(ch chan<- prometheus.Metric) {
    i++
    timestamp := time.Now().Unix()
    fmt.Println()
    tm := time.Unix(timestamp, 0)
    fmt.Println("timestamp:", timestamp, " time.Unix:", time.Unix(timestamp, 0), " value:", i)

    ch <- prometheus.NewMetricWithTimestamp(
        tm,
        prometheus.MustNewConstMetric(
            c.OOMCountDesc,
            prometheus.GaugeValue,
            float64(i),
            "testhost",
        ),
    )
}
func NewClusterManager(user string, qps string) *ClusterManager {
    return &ClusterManager{
        OOMCountDesc: prometheus.NewDesc(
            "rtc_server",
            "Data from rtc server...",
            []string{"host"},
            prometheus.Labels{"user": user, "qps": qps},
        ),
    }
}
// Since we are dealing with custom Collector implementations, it might
        // be a good idea to try it out with a pedantic registry.
        reg := prometheus.NewPedanticRegistry()
        reg.MustRegister(workerDB)

        gatherers := prometheus.Gatherers{
            //prometheus.DefaultGatherer,
            reg,
        }

        h := promhttp.HandlerFor(gatherers,
            promhttp.HandlerOpts{
                ErrorLog:      log.NewErrorLogger(),
                ErrorHandling: promhttp.ContinueOnError,
            })
        http.HandleFunc("/metrics", func(w http.ResponseWriter, r *http.Request) {
            h.ServeHTTP(w, r)
        })
        //log.Infoln("Start server at :8081")
        if err := http.ListenAndServe(":8088", nil); err != nil {
            log.Errorf("Error occur when start server %v", err)
            os.Exit(1)
        }

prometheus 监控系统
1)负责数据的收集和查询
prometheus.yml 部分配置

scrape_configs: 
   - job_name: 'roma-test'

   # metrics_path defaults to '/metrics'
   # scheme defaults to 'http'.
     static_configs:
       - targets: ['localhost:8088']

remote_write:
    - url: "http://localhost:9201/write"
remote_read:
    - url: "http://localhost:9201/read"

自带查询页面


prometheus.png

grafana 监控数据展示
1)负责监控数据的展示

grafana.png

rtc_dispatcher 监控数据调度
1)负责业务数据的调度
获取的源数据格式

metric.png

golang 解析

resp, err := http.PostForm(posturl, url.Values{"start": {start}, "end": {end}, "step": {step}})

    if err != nil {
        fmt.Println("Error:", err)
    }

    defer resp.Body.Close()
    body, err := ioutil.ReadAll(resp.Body)
    if err != nil {
        fmt.Println("Error:", err)
    }

    var body_obj map[string]interface{}
    json.Unmarshal([]byte(body), &body_obj)
    var account_data = body_obj["data"].(map[string]interface{})
    //fmt.Println(account_data)
    var result_obj = account_data["result"].([]interface{})
    var ret_data = make([]map[string]interface{}, 0)
    for _, ite := range result_obj {
        fmt.Println(ite)
        item := ite.(map[string]interface{})
        var temp_result = make(map[string]interface{})
        var metric_obj = item["metric"].(map[string]interface{})
        fmt.Println(metric_obj)
        temp_result["Hostname"] = ""
        temp_result["Eip"] = metric_obj["instance"]
        temp_result["Port"] = ""
        temp_result["State"] = "running"
        ret_data = append(ret_data, temp_result)
    }

    rest_resp.WriteEntity(&Resp{0, "GetRtcServerResponse", "GetRtcServer ok", 0, ret_data})
    return

clickhouse 远端数据存储
1) 负责数据的落地和历史记录查询
安装可参考 https://www.jianshu.com/p/4f3c6bbbbfa9

集群架构.png

关于此架构,有以下几点:
每个k8s集群部署一个Prometheus-clickhouse-adapter 。clickhouse 集群部署,需要zk集群做一致性表数据复制。
而clickhouse 的集群示意图如下:


数据库集群.png

ReplicatedMergeTree + Distributed。ReplicatedMergeTree里,共享同一个ZK路径的表,会相互,注意是,相互同步数据
每个IDC有3个分片,各自占1/3数据
每个节点,依赖ZK,各自有2个副本

你可能感兴趣的:(RTC 监控系统体系)