为了满足一些业务上的特定场景,这时就需要定制化开发一些功能,在我们的业务代码里加入少许代码,就能实现和我们自身业务相关的一些监控功能,比如追踪日志里加入一些特殊的信息、对订单数量的变化进行监控、对用户数量变化进行监控等。
一、Trace
自定义一个跟踪方法很简单,只需在要跟踪的方法上添加@Trace注解即可,当然它也需要 activations/apm-toolkit-trace-activation-8.6.0.jar插件的支持
- 在springboot的pom.xml中引入
org.apache.skywalking
apm-toolkit-trace
${skywalking.version}
- 定义一个Controller,添加下面请求
@GetMapping("tractAnnotation")
public User traceAnnotation(@RequestParam("name") String name) {
log.info("参数:[{}]", name);
User user = trace(name);
ActiveSpan.tag("user-tag", user.toString());
log.info("tractId:[{}]", TraceContext.traceId());
return user;
}
@Trace(operationName = "myTrace")
@Tags({
@Tag(key = "参数", value = "arg[0]"),
@Tag(key = "返回值", value = "returnedObj.name")
})
private User trace(String name) {
User user = new User();
user.setName(name);
return user;
}
- 请求 http://localhost:9000/tractAnnotation?name=xxx 后,在UI的追踪面板中查看记录。
二、Meter
skywalking 从8.0开始引入了指标监控,同时也可以支持 micrometer,这样就可以在自己的业务系统中自定义一些指标,比如访问总数,订单总数等,增强了扩展性。下面我们以一个实例来演示这个功能。
修改OAP配置
- 首先在服务器端增加一个自定义指标文件 spring-meter.yaml,并且要遵从MAL语法。
!!! 将spring-meter.yaml文件放到config/meter-analyzer-config下
expSuffix: instance(['service'], ['instance'])
metricPrefix: meter_order
metricsRules:
- name: new_increase_count
exp: new_increase_count.increase("PT1M")
- 修改config/application.yml 第280行左右找到 meterAnalyzerActiveFiles,配置为上面文件名spring-meter.yaml(去掉后缀)
agent-analyzer:
selector: ${SW_AGENT_ANALYZER:default}
default:
....
meterAnalyzerActiveFiles: ${SW_METER_ANALYZER_ACTIVE_FILES:spring-meter}
如果存储用的是mysql,服务启动后,会生成一张 meter_order_new_increase_count 的表,说明服务端配置成功。
应用端开发
在springboot应用中引入meter依赖
org.apache.skywalking
apm-toolkit-meter
${project.version}
编写一个Controller,多次请求meter 来模拟订单数量变化,并查看meter_order_new_increase_count 表是否有新增记录
@GetMapping("meter")
public void meter() {
Counter counter = MeterFactory.counter(new MeterId ("new_increase_count",MeterId.MeterType.COUNTER)).tag("Order Count", "100").mode(Counter.Mode.INCREMENT).build();
counter.increment(Math.random()*10);
log.info("{}:{}", counter.getName(),counter.get());
}
注意!!!:启动springboot时别忘了在VM Option中添加javaagent参数
-javaagent:skywalking-agent\skywalking-agent.jar -Dskywalking.agent.service_name=myapp -Dskywalking.agent.instance_name=myapp -Dskywalking.collector.backend_service=localhost:11800
关于 micrometer 的使用大概 这个样子,这个我没有实践,感兴趣的可以测试下。
org.apache.skywalking
apm-toolkit-micrometer-registry
${skywalking.version}
@GetMapping("micrometer")
public void micrometer() {
// If you has some counter want to rate by agent side
SkywalkingConfig config = new SkywalkingConfig(Arrays.asList("test_rate_counter"));
SkywalkingMeterRegistry registry = new SkywalkingMeterRegistry(config);
io.micrometer.core.instrument.Counter counter = registry.counter("order.count.total","china","beijing");
counter.increment();
log.info("Midrometer-{}:{}",registry.getMeters(),counter.measure());
}
UI图表
编辑UI,添加一个item,指标输入meter_order_new_increase_count(就是上面在OAP服务端定义的那个指标),选择read all values in..
注意: UI中添加的指标必须是在OAP服务端提前编写好的,否则这里无法添加
三、Log
skywalking可以将应用日志收集到oap服务端方便在调用链中查看某个请求的相关日志。
- 在springboot应用中添加logback配置:logback-spring.xml
%d{yyyy-MM-dd HH:mm:ss.SSS} [%X{tid}] [%thread] %-5level %logger{36} -%msg%n
%d{yyyy-MM-dd HH:mm:ss.SSS} [%X{tid}] [%thread] %-5level %logger{36} -%msg%n
d:/temp/e2e-service-provider.log
[%sw_ctx] [%level] %d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %logger:%line - %msg%n
- 修改agent.config ,添加如下配置:
plugin.toolkit.log.grpc.reporter.server_host=${SW_GRPC_LOG_SERVER_HOST:192.168.x.x}
plugin.toolkit.log.grpc.reporter.server_port=${SW_GRPC_LOG_SERVER_PORT:11800}
plugin.toolkit.log.grpc.reporter.max_message_size=${SW_GRPC_LOG_MAX_MESSAGE_SIZE:10485760}
plugin.toolkit.log.grpc.reporter.upstream_timeout=${SW_GRPC_LOG_GRPC_UPSTREAM_TIMEOUT:30}
- 当访问应用时,会在Skywalking中产生日志
四、node-exporter
Skywalking 也支持 Prometheus node-exporter导入指标,从而可以监控操作系统级别的指标。在Skywalking中类似这类的指标是通过OpenTelemetry Collector来收集,通过 OpenTelemetry receiver 来接收。因此要支持 node-exporter 需要分为三个步骤:
在要监控的操作系统上启动一个 node-exporter
安装并启动一个 OpenTelemetry Collector .
在SkyWalking中配置 OpenTelemetry receiver.
- 在vm01、vm02上,分别启动 node_exporter
$ tar -xzvf node_exporter-1.0.1.linux-amd64.tar.gz && cd node_exporter-1.0.1.linux-amd64
$ nohup ./node_exporter &
-
安装OpenTelemetry Collector
使用docker-compose方式启动一个otel-collector
version: "2"
services:
# Collector
otel-collector:
# Specify the image to start the container from
image: otel/opentelemetry-collector:0.19.0
# Set the otel-collector configfile
command: ["--config=/etc/otel-collector-config.yaml"]
# Mapping the configfile to host directory
volumes:
- ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
ports:
- "13133:13133" # health_check extension
- "55678:55678" # OpenCensus receiver
修改 otel-collector-config.yaml配置,vm01、vm02为启动了node_exporter的机器IP,将oap替换成OAP服务地址。
注意:logging 级别不要设成debug,否则磁盘会被日志爆满
receivers:
prometheus:
config:
scrape_configs:
- job_name: 'otel-collector'
scrape_interval: 1s
static_configs:
- targets: ['vm01:9100']
- targets: ['vm02:9100']
processors:
batch:
exporters:
opencensus:
endpoint: "oap:11800" # The OAP Server address
insecure: true
# Exports data to the console
logging:
# 注意这里的日志级别不要设的太高,否则会磁盘爆满
logLevel: error
service:
pipelines:
metrics:
receivers: [prometheus]
processors: [batch]
exporters: [opencensus,logging]
如果采用k8s来部署opentelemetry-collector,请参考下面
# otel-collector-k8s.yaml
---
apiVersion: v1
kind: ConfigMap
metadata:
name: otel-agent-conf
labels:
app: opentelemetry
component: otel-agent-conf
data:
otel-agent-config: |
receivers:
prometheus:
config:
scrape_configs:
- job_name: 'otel-collector'
scrape_interval: 1s
static_configs:
- targets: ['vm-1:9100']
- targets: ['vm-2:9100']
processors:
batch:
exporters:
opencensus:
endpoint: "oap.skywalking.svc.cluster.local:11800" # The OAP Server address
insecure: true
# Exports data to the console
#logging:
# logLevel: debug
service:
pipelines:
metrics:
receivers: [prometheus]
processors: [batch]
exporters: [opencensus]
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: otel-agent
labels:
app: opentelemetry
component: otel-agent
spec:
serviceName: otel-agent
selector:
matchLabels:
app: opentelemetry
component: otel-agent
template:
metadata:
labels:
app: opentelemetry
component: otel-agent
spec:
containers:
- command:
- "/otelcol"
- "--config=/conf/otel-agent-config.yaml"
# Memory Ballast size should be max 1/3 to 1/2 of memory.
- "--mem-ballast-size-mib=165"
image: otel/opentelemetry-collector:0.19.0
name: otel-agent
resources:
limits:
cpu: 500m
memory: 500Mi
requests:
cpu: 100m
memory: 100Mi
ports:
- containerPort: 55679 # ZPages endpoint.
- containerPort: 4317 # Default OpenTelemetry receiver port.
- containerPort: 8888 # Metrics.
volumeMounts:
- name: otel-agent-config-vol
mountPath: /conf
# 这里不能开启探针检查,否则容器会自动退出
#livenessProbe:
# httpGet:
# path: /
# port: 13133 # Health Check extension default port.
#readinessProbe:
# httpGet:
# path: /
# port: 13133 # Health Check extension default port.
volumes:
- configMap:
name: otel-agent-conf
items:
- key: otel-agent-config
path: otel-agent-config.yaml
name: otel-agent-config-vol
-
修改OAP的配置文件config/application.yml,激活vm规则,这些规则配置存放在otel-oc-rules目录下,如果配置多个规则,以逗号分隔。如果要定制指标就修改 vm.yaml文件。
按照官方的文档一步步操作完,发现UI上根本不显示。这里就要注意了,默认receiver-otel的selector是
-
,因此receiver-otel插件根本不会加载的,所以需要将selector配置成default。
receiver-otel:
selector: ${SW_OTEL_RECEIVER:default}
default:
enabledHandlers: ${SW_OTEL_RECEIVER_ENABLED_HANDLERS:"oc"}
enabledOcRules: ${SW_OTEL_RECEIVER_ENABLED_OC_RULES:"vm,oap"}
- 查看UI中VM已经抓取到机器的指标,但貌似和真实值有些出入,暂先不管了