Skywalking:定制化

为了满足一些业务上的特定场景,这时就需要定制化开发一些功能,在我们的业务代码里加入少许代码,就能实现和我们自身业务相关的一些监控功能,比如追踪日志里加入一些特殊的信息、对订单数量的变化进行监控、对用户数量变化进行监控等。

一、Trace

自定义一个跟踪方法很简单,只需在要跟踪的方法上添加@Trace注解即可,当然它也需要 activations/apm-toolkit-trace-activation-8.6.0.jar插件的支持

  1. 在springboot的pom.xml中引入

     org.apache.skywalking
     apm-toolkit-trace
     ${skywalking.version}

  1. 定义一个Controller,添加下面请求
@GetMapping("tractAnnotation")
public User traceAnnotation(@RequestParam("name") String name) {
    log.info("参数:[{}]", name);
    User user = trace(name);
    ActiveSpan.tag("user-tag", user.toString());
    log.info("tractId:[{}]", TraceContext.traceId());
    return user;
}

@Trace(operationName = "myTrace")
@Tags({
        @Tag(key = "参数", value = "arg[0]"),
        @Tag(key = "返回值", value = "returnedObj.name")
})
private User trace(String name) {
    User user = new User();
    user.setName(name);
    return user;
}
  1. 请求 http://localhost:9000/tractAnnotation?name=xxx 后,在UI的追踪面板中查看记录。

二、Meter

skywalking 从8.0开始引入了指标监控,同时也可以支持 micrometer,这样就可以在自己的业务系统中自定义一些指标,比如访问总数,订单总数等,增强了扩展性。下面我们以一个实例来演示这个功能。

修改OAP配置

  1. 首先在服务器端增加一个自定义指标文件 spring-meter.yaml,并且要遵从MAL语法。

!!! 将spring-meter.yaml文件放到config/meter-analyzer-config下

expSuffix: instance(['service'], ['instance'])
metricPrefix: meter_order
metricsRules:
  - name: new_increase_count
    exp: new_increase_count.increase("PT1M")
  1. 修改config/application.yml 第280行左右找到 meterAnalyzerActiveFiles,配置为上面文件名spring-meter.yaml(去掉后缀)
agent-analyzer:
  selector: ${SW_AGENT_ANALYZER:default}
  default:
   ....
   meterAnalyzerActiveFiles: ${SW_METER_ANALYZER_ACTIVE_FILES:spring-meter}

如果存储用的是mysql,服务启动后,会生成一张 meter_order_new_increase_count 的表,说明服务端配置成功。

应用端开发

在springboot应用中引入meter依赖


    org.apache.skywalking
    apm-toolkit-meter
    ${project.version}

编写一个Controller,多次请求meter 来模拟订单数量变化,并查看meter_order_new_increase_count 表是否有新增记录

@GetMapping("meter")
public void meter() {
    Counter counter = MeterFactory.counter(new MeterId ("new_increase_count",MeterId.MeterType.COUNTER)).tag("Order Count", "100").mode(Counter.Mode.INCREMENT).build();
    counter.increment(Math.random()*10);
    log.info("{}:{}", counter.getName(),counter.get());
}

注意!!!:启动springboot时别忘了在VM Option中添加javaagent参数

-javaagent:skywalking-agent\skywalking-agent.jar -Dskywalking.agent.service_name=myapp -Dskywalking.agent.instance_name=myapp -Dskywalking.collector.backend_service=localhost:11800

关于 micrometer 的使用大概 这个样子,这个我没有实践,感兴趣的可以测试下。

  
      org.apache.skywalking
      apm-toolkit-micrometer-registry
      ${skywalking.version}
   
@GetMapping("micrometer")
public void micrometer() {
    // If you has some counter want to rate by agent side
    SkywalkingConfig config = new SkywalkingConfig(Arrays.asList("test_rate_counter"));
    SkywalkingMeterRegistry registry = new SkywalkingMeterRegistry(config);
    io.micrometer.core.instrument.Counter counter = registry.counter("order.count.total","china","beijing");
    counter.increment();

    log.info("Midrometer-{}:{}",registry.getMeters(),counter.measure());
}

UI图表

编辑UI,添加一个item,指标输入meter_order_new_increase_count(就是上面在OAP服务端定义的那个指标),选择read all values in..

注意: UI中添加的指标必须是在OAP服务端提前编写好的,否则这里无法添加

image-20210812161113238.png

三、Log

skywalking可以将应用日志收集到oap服务端方便在调用链中查看某个请求的相关日志。

  1. 在springboot应用中添加logback配置:logback-spring.xml




    
        
            
                %d{yyyy-MM-dd HH:mm:ss.SSS} [%X{tid}] [%thread] %-5level %logger{36} -%msg%n
            
        
    

    
        
            
                %d{yyyy-MM-dd HH:mm:ss.SSS} [%X{tid}] [%thread] %-5level %logger{36} -%msg%n
            
        
    

    
        d:/temp/e2e-service-provider.log
        
            
                [%sw_ctx] [%level] %d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %logger:%line - %msg%n
            
        
    

    
        
        
    

    
        
    

  1. 修改agent.config ,添加如下配置:
plugin.toolkit.log.grpc.reporter.server_host=${SW_GRPC_LOG_SERVER_HOST:192.168.x.x}
plugin.toolkit.log.grpc.reporter.server_port=${SW_GRPC_LOG_SERVER_PORT:11800}
plugin.toolkit.log.grpc.reporter.max_message_size=${SW_GRPC_LOG_MAX_MESSAGE_SIZE:10485760}
plugin.toolkit.log.grpc.reporter.upstream_timeout=${SW_GRPC_LOG_GRPC_UPSTREAM_TIMEOUT:30}
  1. 当访问应用时,会在Skywalking中产生日志
image-20210812162121233.png

四、node-exporter

Skywalking 也支持 Prometheus node-exporter导入指标,从而可以监控操作系统级别的指标。在Skywalking中类似这类的指标是通过OpenTelemetry Collector来收集,通过 OpenTelemetry receiver 来接收。因此要支持 node-exporter 需要分为三个步骤:

  • 在要监控的操作系统上启动一个 node-exporter

  • 安装并启动一个 OpenTelemetry Collector .

  • 在SkyWalking中配置 OpenTelemetry receiver.

  1. 在vm01、vm02上,分别启动 node_exporter
$ tar -xzvf node_exporter-1.0.1.linux-amd64.tar.gz && cd node_exporter-1.0.1.linux-amd64 
$ nohup ./node_exporter &
  1. 安装OpenTelemetry Collector

    使用docker-compose方式启动一个otel-collector

version: "2"
services:
  # Collector
  otel-collector:
    # Specify the image to start the container from
    image: otel/opentelemetry-collector:0.19.0
    # Set the  otel-collector configfile
    command: ["--config=/etc/otel-collector-config.yaml"]
    # Mapping the configfile to host directory
    volumes:
      - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
    ports:
      - "13133:13133" # health_check extension
      - "55678:55678"       # OpenCensus receiver

修改 otel-collector-config.yaml配置,vm01、vm02为启动了node_exporter的机器IP,将oap替换成OAP服务地址。

注意:logging 级别不要设成debug,否则磁盘会被日志爆满

receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: 'otel-collector'
          scrape_interval: 1s
          static_configs:
            - targets: ['vm01:9100']
            - targets: ['vm02:9100']

processors:
  batch:

exporters:
  opencensus:
    endpoint: "oap:11800" # The OAP Server address
    insecure: true
  # Exports data to the console
  logging:
    # 注意这里的日志级别不要设的太高,否则会磁盘爆满
    logLevel: error

service:
  pipelines:
    metrics:
      receivers: [prometheus]
      processors: [batch]
      exporters: [opencensus,logging]

如果采用k8s来部署opentelemetry-collector,请参考下面

# otel-collector-k8s.yaml
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-agent-conf
  labels:
    app: opentelemetry
    component: otel-agent-conf
data:
  otel-agent-config: |
    receivers:
      prometheus:
        config:
          scrape_configs:
            - job_name: 'otel-collector'
              scrape_interval: 1s
              static_configs:
                - targets: ['vm-1:9100']
                - targets: ['vm-2:9100']     
    
    processors:
      batch:
    
    exporters:
      opencensus:
        endpoint: "oap.skywalking.svc.cluster.local:11800" # The OAP Server address
        insecure: true
      # Exports data to the console  
      #logging:
      #  logLevel: debug
    
    service:
      pipelines:
        metrics:
          receivers: [prometheus]
          processors: [batch]
          exporters: [opencensus]


---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: otel-agent
  labels:
    app: opentelemetry
    component: otel-agent
spec:
  serviceName: otel-agent
  selector:
    matchLabels:
      app: opentelemetry
      component: otel-agent
  template:
    metadata:
      labels:
        app: opentelemetry
        component: otel-agent
    spec:
      containers:
      - command:
          - "/otelcol"
          - "--config=/conf/otel-agent-config.yaml"
          # Memory Ballast size should be max 1/3 to 1/2 of memory.
          - "--mem-ballast-size-mib=165"
        image: otel/opentelemetry-collector:0.19.0
        name: otel-agent
        resources:
          limits:
            cpu: 500m
            memory: 500Mi
          requests:
            cpu: 100m
            memory: 100Mi
        ports:
        - containerPort: 55679 # ZPages endpoint.
        - containerPort: 4317 # Default OpenTelemetry receiver port.
        - containerPort: 8888  # Metrics.
        volumeMounts:
        - name: otel-agent-config-vol
          mountPath: /conf
        # 这里不能开启探针检查,否则容器会自动退出  
        #livenessProbe:
        #  httpGet:
        #    path: /
        #    port: 13133 # Health Check extension default port.
        #readinessProbe:
        #  httpGet:
        #    path: /
        #    port: 13133 # Health Check extension default port.
      volumes:
        - configMap:
            name: otel-agent-conf
            items:
              - key: otel-agent-config
                path: otel-agent-config.yaml
          name: otel-agent-config-vol
  1. 修改OAP的配置文件config/application.yml,激活vm规则,这些规则配置存放在otel-oc-rules目录下,如果配置多个规则,以逗号分隔。如果要定制指标就修改 vm.yaml文件。

    按照官方的文档一步步操作完,发现UI上根本不显示。这里就要注意了,默认receiver-otel的selector是 -,因此receiver-otel插件根本不会加载的,所以需要将selector配置成default。

receiver-otel:
  selector: ${SW_OTEL_RECEIVER:default}
  default:
    enabledHandlers: ${SW_OTEL_RECEIVER_ENABLED_HANDLERS:"oc"}
    enabledOcRules: ${SW_OTEL_RECEIVER_ENABLED_OC_RULES:"vm,oap"}
  1. 查看UI中VM已经抓取到机器的指标,但貌似和真实值有些出入,暂先不管了
image-20210812164256777.png

你可能感兴趣的:(Skywalking:定制化)