APM的全称是 Application Performance Monitor 即 应用性能监控,APM致力于监控和管理应用软件性能和可用性。通过监测和诊断处理复杂应用程序的性能问题,来保证软件应用程序的良好运行(预期的服务)。
有 zipkin,pinpoint,skywalking ,下面主要对pinpoint和skywalking 进行对比:
Pinpoint的优势在于:追踪数据粒度非常细、功能强大的用户界面,以及使用HBase作为存储带来的海量存储能力。
skywalking的优势在于:非常活跃的中文社区,支持多种语言的探针,对国产开源软件非常全面的支持,以及使用es作为底层存储带来的强大的检索能力,并且skywalking的扩展性以及定制化要更优于Pinpoint。
SkyWalking 是中国人开发的,也是国内唯一一个发展成为Apache顶级项目的个人开源项目。SkyWalking 是一个为了微服务,容器化,和分布式系统而生的高度组件化的APM项目。
SkyWalking 基础架构图
SkyWalking 探针在使用上是无代码侵入的,而这种无侵入的自动埋点基于Java 的JavaAgent 技术。
JavaAgent 主要有两个方法实现–premain 和transform;
premain是javaAgent 的入口方法
public static void premain(String agentArgs, Instrumentation instrumentation) throws PluginException {
final PluginFinder pluginFinder;
try {
SnifferConfigInitializer.initialize(agentArgs);
//创建插件查找器
pluginFinder = new PluginFinder(new PluginBootstrap().loadPlugins());
} catch (Exception e) {
logger.error(e, "Skywalking agent initialized failure. Shutting down.");
return;
}
final ByteBuddy byteBuddy = new ByteBuddy()
.with(TypeValidation.of(Config.Agent.IS_OPEN_DEBUGGING_CLASS));
AgentBuilder agentBuilder = new AgentBuilder.Default(byteBuddy)
.ignore(
nameStartsWith("net.bytebuddy.")
.or(nameStartsWith("org.slf4j."))
.or(nameStartsWith("org.apache.logging."))
.or(nameStartsWith("org.groovy."))
.or(nameContains("javassist"))
.or(nameContains(".asm."))
.or(nameStartsWith("sun.reflect"))
.or(allSkyWalkingAgentExcludeToolkit())
.or(ElementMatchers.<TypeDescription>isSynthetic()));
try {
agentBuilder = BootstrapInstrumentBoost.inject(pluginFinder, agentBuilder, instrumentation);
} catch (Exception e) {
logger.error(e, "SkyWalking agent inject bootstrap instrumentation failure. Shutting down.");
return;
}
agentBuilder
.type(pluginFinder.buildMatch())
.transform(new Transformer(pluginFinder))
.with(AgentBuilder.RedefinitionStrategy.RETRANSFORMATION)
.with(new Listener())
.installOn(instrumentation);
instrumentation.addTransformer(new MainClassTransformer());
try {
ServiceManager.INSTANCE.boot();
} catch (Exception e) {
logger.error(e, "Skywalking agent boot failure.");
}
Runtime.getRuntime().addShutdownHook(new Thread(new Runnable() {
@Override public void run() {
ServiceManager.INSTANCE.shutdown();
}
}, "skywalking service shutdown thread"));
}
transfrom的主要作用是进行字节码数据转换,码返回转换后的字节码;
@Override
public byte[] transform(ClassLoader loader, String className, Class<?> classBeingRedefined, ProtectionDomain protectionDomain, byte[] classfileBuffer) throws IllegalClassFormatException {
if (ContainerLifecycle.INSTANCE.getCurState().equals(ContainerLifecycleState.NEW)){
String mainClassName = findMainClass();
if (!StringUtil.isEmpty(mainClassName)){
logger.info("Find main class :{}",mainClassName);
if (mainClassName.equals(ContainerType.TOMCAT.getMainClassName())){
ContainerInfo.INSTANCE.setType(ContainerType.TOMCAT);
ContainerLifecycle.INSTANCE.fireLifecycleEvent(ContainerLifecycleState.INIT);
} else {
ContainerInfo.INSTANCE.setType(ContainerType.UNKNOWN);
ContainerLifecycle.INSTANCE.fireLifecycleEvent(ContainerLifecycleState.STARTED);
}
}
}
return classfileBuffer;
}
classFileTransformer:用途— classFileTransformer.md
OAL 是用户可自定义的描述分析过程的可扩展,轻量级,编译型语言。SkyWalking 可以将OAL 编译成Java class 文件,使用Skywalking 流计算引擎加载运行。
OAL实现原理,SkyWalking 流计算引擎可以接受类型为Source的数据源,每种Source 会通过指定类型的Dispatcher 转换为流计算引擎可以处理的原始数据类型。目前,存在以下4中原始数据类型:
Inventory 数据,即元数据,如服务名称定义,EndPoint,通过注册模型处理数据;
Record 数据,即明细数据,如Trace,访问日志,通过明细模型处理数据;
Metrics 数据,指标数据,绝大多数OAL指标都会生成这个类型,通过指标模型处理数据;
TopN 数据,周期抽样数据,如慢SQL的周期数据,通过采样模型处理数据;
OAL 工作的三个阶段: 语法与词法解析-》动态代码生成-》流计算注册。
OAL 语法忽略–自查。
通过上面讲解OAL,我们了解了SkyWalking 的四种模型,数据的存储就是通过这四种模型来实现的。
四种模型的结构不详细讲了。讲下数据存储的过程:
SkyWalking 的追踪模型主要包含几个元素:Segment,Trace,Span
解释:一个Segment 是Trace 在一个进程内的所有Span 的集合。
SkyWalking 的轻量级队列内核是基于无锁环状队列的生产者----消费者内存消息队列,主要是在生产者与消费者之间创建一个缓冲的异步内存队列。该缓存队列的组成元素有Buffer,Channel 和 DataCarrier。
Buffer 是SkyWalking队列内核中的数据载体。
public class Buffer {
private final Object[] buffer; //用于数据存储
private BufferStrategy strategy; //淘汰策略
private AtomicRangeInteger index; //一个原子的循环索引,与buffer 一起实现环状队列
private List> callbacks;
}
Channel 是管理Buffer 的载体。
public class Channels<T> {
private final Buffer<T>[] bufferChannels; // Buffer 的数组, 队列数据载体
private IDataPartitioner<T> dataPartitioner;// 确定数据写入那个Buffer以及写入失败后的重试
private BufferStrategy strategy;//队列策略
private final long size;//channel 能容纳的数据量 = Buffer.size()*Buffer.buffer.size()
}
DataCarrier 是轻量队列的门户。
public class DataCarrier<T> {
private final int bufferSize; //每个Buffer队列 的大小
private final int channelSize; //channal 数量
private Channels<T> channels;
private IDriver driver;
private String name;
}
消息生产
消息消费
主要通过实现IConsumer接口——consumer() 来实现,为了防止多线程重复消费消息,Skywalking采取一个线程对一个Buffer 消费,如果存在Buffer 数小于线程数的情况,会采取分割Buffer 给每个线程,以此杜绝重复消费。
模块化结构设计是指,整个程序不使用硬编码耦合的方式进行程序链接,而是按照预先的设计,划分成多个模块(Module),模块间无耦合关系,只定义了模块对外开放的服务接口(API)。每个模块有多个实现,但每次启动只能将一个实现激活。
SkyWalkeing的几种协议介绍:
探针 与后端的通信模式主要分为两种:注册通信和数据上报通信,官方支持gRPC与HTTP /1.1 通信。
注册通信
主要通过定时器 向后端发送注册信息,详情看代码:
//监听器确认 定时任务开启
public void run() {
logger.debug("ServiceAndEndpointRegisterClient running, status:{}.", status);
boolean shouldTry = true;
while (GRPCChannelStatus.CONNECTED.equals(status) && shouldTry) {
shouldTry = false;
JsonObject appInfoJson = AppInfoLoader.getAppInfoJson();
try {
//SERVICE_ID 由Agent 在本地内存中维护的,由后端分发的Service 唯一ID
//判断 SERVICE_ID 是否存在,若不存在 执行注册
if (RemoteDownstreamConfig.Agent.SERVICE_ID == DictionaryUtil.nullValue()) {
if (registerBlockingStub != null) {
ServiceRegisterMapping serviceRegisterMapping = registerBlockingStub.doServiceRegister(
Services.newBuilder().addServices(Service.newBuilder()
.addProperties(KeyStringValuePair.newBuilder().setKey("service_info").setValue(appInfoJson.toString()).build())
.setServiceName(Config.Agent.SERVICE_NAME)).build());
if (serviceRegisterMapping != null) {
for (KeyIntValuePair registered : serviceRegisterMapping.getServicesList()) {
if (Config.Agent.SERVICE_NAME.equals(registered.getKey())) {
RemoteDownstreamConfig.Agent.SERVICE_ID = registered.getValue();
RegisterNotifier.notify(new RegisterSource(RegisterSource.TYPE.SERVICE,Config.Agent.SERVICE_NAME,RemoteDownstreamConfig.Agent.SERVICE_ID));
shouldTry = true;
}
}
}
}
} else {
if (registerBlockingStub != null) {
// SERVICE_INSTANCE_ID Serivce中某个进程的具体实例ID,也是有Agent 本地内存维护,由后端分发 的ID
//SERVICE_ID注册后 ,执行SERVICE_INSTANCE_ID 注册
if (RemoteDownstreamConfig.Agent.SERVICE_INSTANCE_ID == DictionaryUtil.nullValue()) {
ServiceInstance.Builder instanceBuilder = ServiceInstance.newBuilder()
.setServiceId(RemoteDownstreamConfig.Agent.SERVICE_ID)
.setInstanceUUID(INSTANCE_UUID)
.setTime(System.currentTimeMillis())
.addProperties(KeyStringValuePair.newBuilder().setKey("instance_name").setValue(INSTANCE_NAME).build())
.addProperties(KeyStringValuePair.newBuilder().setKey("agent_version").setValue(Config.Agent.VERSION).build())
.addAllProperties(OSUtil.buildOSInfo())
.addAllProperties(ContainerInfo.INSTANCE.build())
.addProperties(KeyStringValuePair.newBuilder().setKey("app_info").setValue(appInfoJson.toString()).build());
//向后端发起 注册
ServiceInstanceRegisterMapping instanceMapping = registerBlockingStub.doServiceInstanceRegister(ServiceInstances.newBuilder()
.addInstances(instanceBuilder).build());
for (KeyIntValuePair serviceInstance : instanceMapping.getServiceInstancesList()) {
if (INSTANCE_UUID.equals(serviceInstance.getKey())) {
//获取 后端返回的注册 serviceInstanceId
int serviceInstanceId = serviceInstance.getValue();
if (serviceInstanceId != DictionaryUtil.nullValue()) {
RegisterNotifier.notify(new RegisterSource(RegisterSource.TYPE.SERVICE_INSTANCE,null,serviceInstanceId));
//保存 后端返回的注册 serviceInstanceId
RemoteDownstreamConfig.Agent.SERVICE_INSTANCE_ID = serviceInstanceId;
}
}
}
} else {
//发送 心跳通信
// Commands 由后端返回前端的指令消息,
Commands commands = serviceInstancePingStub.doPing(ServiceInstancePingPkg.newBuilder()
.setServiceInstanceId(RemoteDownstreamConfig.Agent.SERVICE_INSTANCE_ID)
.setTime(System.currentTimeMillis())
.setServiceInstanceUUID(INSTANCE_UUID)
.build());
//发起 NetworkAddress 注册
//NetworkAddress 指的是 网络IP 地址
NetworkAddressDictionary.INSTANCE.syncRemoteDictionary(registerBlockingStub);
//发起 Endpoint 注册
//EndpointName 端口名称
EndpointNameDictionary.INSTANCE.syncRemoteDictionary(registerBlockingStub);
}
}
}
} catch (Throwable t) {
logger.error(t, "ServiceAndEndpointRegisterClient execute fail.");
ServiceManager.INSTANCE.findService(GRPCChannelManager.class).reportError(t);
}
}
}
后端收到注册后,仅以doServiceRegister 为例:
@Override public void doServiceRegister(Services request, StreamObserver<ServiceRegisterMapping> responseObserver) {
ServiceRegisterMapping.Builder builder = ServiceRegisterMapping.newBuilder();
request.getServicesList().forEach(service -> {
String serviceName = service.getServiceName();
if (logger.isDebugEnabled()) {
logger.debug("Register service, service code: {}", serviceName);
}
List<KeyStringValuePair> propertiesList = service.getPropertiesList();
JsonObject propJson = null;
if (propertiesList!=null&&!propertiesList.isEmpty()){
propJson = new JsonObject();
for (KeyStringValuePair valuePair:propertiesList) {
propJson.addProperty(valuePair.getKey(),valuePair.getValue());
}
}
//获取 已经存在的service 的serviceId 或者创建一个新的
int serviceId = serviceInventoryRegister.getOrCreate(serviceName, propJson);
if (serviceId != Const.NONE) {
// 返回 serviceId
KeyIntValuePair value = KeyIntValuePair.newBuilder().setKey(serviceName).setValue(serviceId).build();
builder.addServices(value);
}
});
responseObserver.onNext(builder.build());
responseObserver.onCompleted();
}
数据上报通信:
初始化队列:
public void boot() throws Throwable {
lastLogTime = System.currentTimeMillis();
segmentUplinkedCounter = 0;
segmentAbandonedCounter = 0;
carrier = new DataCarrier<TraceSegment>(CHANNEL_SIZE, BUFFER_SIZE);
carrier.setBufferStrategy(BufferStrategy.IF_POSSIBLE);
carrier.consume(this, 1);
}
收集消息:
@Override
public void afterFinished(TraceSegment traceSegment) {
if (traceSegment.isIgnore()) {
return;
}
if (!carrier.produce(traceSegment)) {
if (logger.isDebugEnable()) {
logger.debug("One trace segment has been abandoned, cause by buffer is full.");
}
}
}
上报消息:
public void consume(List<TraceSegment> data) {
if (CONNECTED.equals(status)) {
final GRPCStreamServiceStatus status = new GRPCStreamServiceStatus(false);
StreamObserver<UpstreamSegment> upstreamSegmentStreamObserver = serviceStub.collect(new StreamObserver<Commands>() {
@Override
public void onNext(Commands commands) {
}
@Override
public void onError(Throwable throwable) {
status.finished();
if (logger.isErrorEnable()) {
logger.error(throwable, "Send UpstreamSegment to collector fail with a grpc internal exception.");
}
ServiceManager.INSTANCE.findService(GRPCChannelManager.class).reportError(throwable);
}
@Override
public void onCompleted() {
status.finished();
}
});
try {
for (TraceSegment segment : data) {
//发送数据到后端
UpstreamSegment upstreamSegment = segment.transform();
upstreamSegmentStreamObserver.onNext(upstreamSegment);
}
upstreamSegmentStreamObserver.onCompleted();
status.wait4Finish();
segmentUplinkedCounter += data.size();
} catch (Throwable t) {
logger.error(t, "Transform and send UpstreamSegment to collector fail.");
}
} else {
segmentAbandonedCounter += data.size();
}
printUplinkStatus();
}
探针开发中主要涉及到的概念:Span,Trace Segment ,ContextCarrier 和ContextSnapshop
Span 泛指 一次方法调用,一个程序块的调用,一次RPC 调用和SQL调用
Trace Segment 值 一个线程中 同一操作的所有span 的集合
ContextCarrier 处理跨进程 调用链
ContextSnapshop 处理跨线程调用链
探针开发的API…(省略)
拿金融网关为例子,举例说明;
探针开发首先需要 “定义拦截”,其次在定义的拦截器中调用 探针接口,补充链路关键信息,实现消费者与提供者的链路串联。
定义拦截:
class OutBoundMethodInstrumentation extends ClassInstanceMethodsEnhancePluginDefine {
//插件中定义的具体拦截类
private static final String INTERCEPT_CLASS = "org.apache.skywalking.apm.msoa.plugin.rpc.OutBoundMethodHandlerInterceptor";
//网关 需要拦截的接口类
private static final String ENHANCE_CLASS = "com.baidu.ub.msoa.rpc.outbound.DefaultOutboundMethodHandler";
//增强的实列类
@Override
protected ClassMatch enhanceClass() {
return NameMatch.byName(ENHANCE_CLASS);
}
// 构造器方法的拦截形式
@Override
protected ConstructorInterceptPoint[] getConstructorsInterceptPoints() {
return new ConstructorInterceptPoint[]{
new ConstructorInterceptPoint() {
@Override
public ElementMatcher<MethodDescription> getConstructorMatcher() {
return any();
}
@Override
public String getConstructorInterceptor() {
return INTERCEPT_CLASS;
}
}
};
}
//实例方法的拦截形式
@Override
protected InstanceMethodsInterceptPoint[] getInstanceMethodsInterceptPoints() {
return new InstanceMethodsInterceptPoint[] {
new InstanceMethodsInterceptPoint() {
//增强的实列类方法匹配器
@Override
public ElementMatcher<MethodDescription> getMethodsMatcher() {
return named("createInvoker");
}
//方法拦截类
@Override
public String getMethodsInterceptor() {
return getInstanceMethodsInterceptor();
}
//不更改引用参数
@Override
public boolean isOverrideArgs() {
return false;
}
}
};
}
}
拦截器具体实现:
public class OutBoundMethodHandlerInterceptor implements InstanceConstructorInterceptor,InstanceMethodsAroundInterceptor {
private static final ILog logger = LogManager.getLogger(OutBoundMethodHandlerInterceptor.class);
@Override
public void onConstruct(EnhancedInstance objInst, Object[] allArguments) {
Method getBundleServiceMetaMethod = null;
try {
getBundleServiceMetaMethod = DefaultOutboundMethodHandler.class.getDeclaredMethod("getBundleServiceMeta", new Class[]{BundleService.class, Object.class, Method.class});
getBundleServiceMetaMethod.setAccessible(true);
objInst.setSkyWalkingDynamicField(getBundleServiceMetaMethod);
} catch (NoSuchMethodException e) {
logger.error("NoSuchMethodException OutBoundMethodHandlerInterceptor.getBundleServiceMeta",e);
}
}
@Override
public void beforeMethod(EnhancedInstance objInst, Method method, Object[] allArguments, Class<?>[] argumentsTypes, MethodInterceptResult result) throws Throwable {
Object skyWalkingDynamicField = objInst.getSkyWalkingDynamicField();
if (skyWalkingDynamicField == null)return;
try {
BundleService rpcSettings = (BundleService) allArguments[0];
Object rpcObject = allArguments[1];
Method rpcMethod = (Method) allArguments[2];
Object[] rpcArgs = (Object[]) allArguments[3];
Method getBundleServiceMetaMethod = (Method) skyWalkingDynamicField;
BundleServiceMeta metaInfo = (BundleServiceMeta) getBundleServiceMetaMethod.invoke(objInst,rpcSettings,rpcObject,rpcMethod);
String operationName = new StringBuilder()
.append("Consumer:")
.append(metaInfo.provider).append("/")
.append(metaInfo.service).append("/")
.append(metaInfo.version).append("/")
.append(metaInfo.method).toString();
AbstractSpan span = ContextManager.createLocalSpan(operationName);
RpcHeadCacheManager.extractRequestIntoSpan(rpcArgs,span);
span.setComponent(ComponentsDefine.MSOA);
SpanLayer.asRPCFramework(span);
}catch (Exception ex){
logger.error("parse msoa request failed",ex);
}
}
@Override
public Object afterMethod(EnhancedInstance objInst, Method method, Object[] allArguments, Class<?>[] argumentsTypes, Object ret) throws Throwable {
Object skyWalkingDynamicField = objInst.getSkyWalkingDynamicField();
if (skyWalkingDynamicField == null)return ret;
AbstractSpan activeSpan = ContextManager.activeSpan();
try {
RpcHeadCacheManager.extractResponseIntoSpan(ret,activeSpan);
}catch (Exception ex) {
logger.error("parse msoa response failed",ex);
}
ContextManager.stopSpan();
return ret;
}
@Override
public void handleMethodException(EnhancedInstance objInst, Method method, Object[] allArguments, Class<?>[] argumentsTypes, Throwable throwable) {
Object skyWalkingDynamicField = objInst.getSkyWalkingDynamicField();
if (skyWalkingDynamicField == null)return;
AbstractSpan activeSpan = ContextManager.activeSpan();
activeSpan.errorOccurred();
activeSpan.log(throwable);
}
}
下一代监控体系–SkyWalking 观测Service Mesh。
Service Mesh (服务网格),被看作是未来新一代服务间通信的基础设施。Service Mesh 有如下特点:
应用之间通信的中间层;
轻量级网络代理;
应用程序无感知;
解耦应用程序的各种机制。
目前两款流行的Service Mesh 开源软件 Istio 和 Linkerd ,都可以直接在K8S中继承和单独在VM中部署。
简单解释,Istio 包含 服务发现,和负载均衡的基本服务还可以对网络流量进行更精细的控制。
Service Mesh 的监控往往称之为可观测性,其内涵是要超越传统的监控体系。它一般包含,监控,告警,可视化,分布式追踪与日志分析。