CAT(Central Application Tracking)是一个实时和接近全量的监控系统,它侧重于对Java应用的监控。目前在中间件(MVC、RPC、数据库、缓存等)框架中得到广泛应用,为美团各业务线提供系统的性能指标、健康状况、监控告警等。
CAT从开发至今,一直秉承着简单的架构就是最好的架构原则,主要分为三个模块:CAT-client、CAT-consumer、CAT-home。
监控系统的主要功能是及时定位线上异常,减小损失,所以一个好的监控系统需要至少做到以下两点:
监控系统主要靠客户端上报埋点数据至服务器,然后服务器端对埋点数据进行分析,进而产生一些报表对外展示。所以首先要了解消息的结构类型。
所有消息都可被组织进消息树(MessageTree),Transaction
类型的消息可作为消息树节点,而其他消息只可作为消息树的叶子节点。也就是Transaction
是一个可嵌套的递归结构。结构可表示如下图:
有时候以时序图的方式来表示也许会更清晰:
以下是个实际例子
Transaction transaction0 = Cat.newTransaction("m_type", "/home/hello");
Cat.logEvent("event0", "eventName0");
Transaction transaction1 = Cat.newTransaction("type1", "name1");
Transaction transaction2 = Cat.newTransaction("type2", "name2");
Cat.logEvent("event1", "eventName1");
消息的层级关系是怎么实现的呢?关键是靠ThreadLocal中的m_stack。只有transaction消息会放在m_stack中。
见https://blog.csdn.net/mnmlist/article/details/114293423
应用程序每次上报埋点都是通过MessageTree包装消息进行上报,这个消息中包含了Transaction、Event、Metric数据。而Heartbeat消息的上报不需要开发者在代码中埋点,Cat客户端会定时每分钟收集系统的内存,硬盘,cpu信息上报给服务端。
消息树的每一节点都有一个属性messageId,用来唯一表示节点本身,其构成为:{domain}-{ip}-{timestamp}-{自增index}。另外还有两个属性,分别是parentMessageId, rootMessageId。parentMessageId表示父节点的messageId;rootMessageId则表示整个消息树的根节点的messageId。这两个属性在之后CAT的调用链分析与分布式调用链分析中发挥了关键作用。
https://blog.csdn.net/heihaozi/article/details/103089668
这个线程很简单,每分钟上报关于应用的各种信息(OS、MXBean信息等等)
由于Conext
维护在ThreadLocal中,因此每一个thread都会拥有一份自己的Context
,Context
中会维护一个stack用来存储transaction,当新transaction开启时入栈,结束时出栈。当栈内压入第一个transaction时开始构造MessageTree
;栈空时认为一个MessageTree
结束,此时将该MessageTree
发送给待发送队列。
高并发下日志的打印通常会采用threadLocal这种方式,或者说一次事务的日志一起打印,因为一般默认一次事务都是由同一个线程执行的(如一次http请求),将事务的日志保存在线程局部变量当中,当事务执行完成的时候统一打印。
public void run() {
try {
Thread.sleep(10000L);
} catch (InterruptedException var23) {
return;
}
while(true) {
Calendar cal = Calendar.getInstance();
int second = cal.get(13);
if (second >= 2 && second <= 58) {
try {
this.buildClasspath();
} catch (Exception var21) {
var21.printStackTrace();
}
//发送基础信息
MessageProducer cat = Cat.getProducer();
Transaction reboot = cat.newTransaction("System", "Reboot");
reboot.setStatus("0");
cat.logEvent("Reboot", NetworkInterfaceManager.INSTANCE.getLocalHostAddress(), "0", (String)null);
cat.logEvent("cat_client_version", "2.0.0");
reboot.complete();
while(this.m_active) {
long start = MilliSecondTimer.currentTimeMillis();
if (this.m_manager.isCatEnabled()) {
//发送心跳
Transaction t = cat.newTransaction("System", "Status");
Heartbeat h = cat.newHeartbeat("Heartbeat", this.m_ipAddress);
StatusInfo status = new StatusInfo();
t.addData("dumpLocked", this.m_manager.isDumpLocked());
StatusInfoCollector collector = new StatusInfoCollector(this.m_statistics, this.m_jars);
try {
//发送
status.accept(collector.setDumpLocked(this.m_manager.isDumpLocked()));
this.buildExtensionData(status);
h.addData(status.toString());
h.setStatus("0");
} catch (Throwable var19) {
h.setStatus(var19);
cat.logError(var19);
} finally {
h.complete();
}
//发送堆栈信息
Cat.logEvent("Heartbeat", "jstack", "0", collector.getJstackInfo());
cat.logEvent("cat_client_version", "2.0.0");
t.setStatus("0");
t.complete();
}
long elapsed;
try {
elapsed = System.currentTimeMillis() / 1000L / 60L;
int min = (int)(elapsed % 60L);
if (min % 3 == 0) {
this.m_manager.refreshConfig();
}
if (min % 2 == 0) {
log.info("mark CAT client produced info {}", this.m_statistics);
cat.logEvent("cat_client_info", "stat", "0", this.m_statistics.toString());
}
} catch (Exception var18) {
;
}
elapsed = MilliSecondTimer.currentTimeMillis() - start;
if (elapsed < this.m_interval) {
try {
Thread.sleep(this.m_interval - elapsed);
} catch (InterruptedException var24) {
break;
}
}
}
return;
}
try {
Thread.sleep(1000L);
} catch (InterruptedException var22) {
;
}
}
}
如果m_stack中有多个transaction时,最底层的那个是根节点,其他transaction的子孙节点,所以所以发送的时候只需要发送根节点就可以同时把子节点发送出去。
以下通过源码分析对埋点的上报进行详细的了解:
MessageProducer cat = Cat.getProducer();
Transaction reboot = cat.newTransaction("System", "Reboot");
reboot.setStatus("0");
cat.logEvent("Reboot", NetworkInterfaceManager.INSTANCE.getLocalHostAddress(), "0", (String)null);
cat.logEvent("cat_client_version", "2.0.0");
reboot.complete();
对上面的方法进行重点分析:
DefaultMessageProducer:
public Transaction newTransaction(String type, String name) {
if (!this.messageManager.hasContext()) {
//对messageManager设置,在threadLocal中创建context
this.messageManager.setup();
}
DefaultTransaction transaction = new DefaultTransaction(type, name, this.messageManager);
this.messageManager.start(transaction, false);
return transaction;
}
DefaultMessageManager:
public void setup() {
DefaultMessageManager.Context ctx;
//创建一个Context对象
if (this.m_domain != null) {
ctx = new DefaultMessageManager.Context(this.m_domain.getId(), this.m_hostName, this.m_domain.getIp());
} else {
ctx = new DefaultMessageManager.Context("Unknown", this.m_hostName, "");
}
//获取采样频率,采样频率在clientConfigManager中,clientConfigManager是全局配置的,在BasicComponentAutoConfigure中初始化的
double samplingRate = this.clientConfigManager.getSampleRatio();
//如果samplingRate<1,那么要判断是否命中采样
if (samplingRate < 1.0D && this.hitSample(samplingRate)) {
//ctx中有一个messageTree属性m_tree,m_tree中又包含了布尔值m_hitSample
ctx.m_tree.setHitSample(true);
}
//m_context是ThreadLocal,也就是把ctx放在threadLocal中
this.m_context.set(ctx);
}
private boolean hitSample(double sampleRatio) {
int count = this.m_sampleCount.incrementAndGet();
return count % (int)(1.0D / sampleRatio) == 0;
}
DefaultMessageManager:
public void start(Transaction transaction, boolean forked) {
//从ThreaLocal获取context
DefaultMessageManager.Context ctx = this.getContext();
if (ctx != null) {
ctx.start(transaction, forked);
if (transaction instanceof TaggedTransaction) {
//todo TaggedTransaction是什么作用,待确认
TaggedTransaction tt = (TaggedTransaction)transaction;
this.m_taggedTransactions.put(tt.getTag(), tt);
}
} else if (this.m_firstMessage) {
this.m_firstMessage = false;
log.warn("CAT client is not enabled because it's not initialized yet");
}
}
Context:
public void start(Transaction transaction, boolean forked) {
if (!this.m_stack.isEmpty()) {
if (!(transaction instanceof ForkedTransaction)) {
//todo ForkedTransaction是什么作用,待确认
Transaction parent = (Transaction)this.m_stack.peek();
this.addTransactionChild(transaction, parent);
}
} else {
this.m_tree.setMessage(transaction);
}
if (!forked) {
this.m_stack.push(transaction);
}
}
Transaction reboot = cat.newTransaction(“System”, “Reboot”); 的整个过程其实就是在context引用的m_tree里增加了transaction这个message。
DefaultMessageProducer:
public void logEvent(String type, String name, String status, String nameValuePairs) {
//创建一个event,这里会调messageManager.setup()方法
Event event = this.newEvent(type, name);
if (nameValuePairs != null && nameValuePairs.length() > 0) {
//添加event的m_data
event.addData(nameValuePairs);
}
event.setStatus(status);
event.complete();
}
DefaultEvent:
public void complete() {
this.setCompleted(true);
if (this.m_manager != null) {
this.m_manager.add(this);
}
}
DefaultMessageManager:
public void add(Message message) {
DefaultMessageManager.Context ctx = this.getContext();
if (ctx != null) {
//还是通过ctx把message添加进去的
ctx.add(message);
}
}
Context:
public void add(Message message) {
if (this.m_stack.isEmpty()) {
//如果m_stack为空,那么直接拷贝一个messageTree,把该消息放在消息队列中
MessageTree tree = this.m_tree.copy();
tree.setMessage(message);
DefaultMessageManager.this.flush(tree, true);
} else {
//否则,把该消息作为子消息放在transaction中
Transaction parent = (Transaction)this.m_stack.peek();
this.addTransactionChild(message, parent);
}
}
所以这个很关键的是m_stack,m_stack不为空,就会使用这里边的transaction,说明在前面执行cat.newTransaction("System", "Reboot");的时候放进去的
public void complete() {
this.setCompleted(true);
if (this.m_manager != null) {
this.m_manager.add(this);
}
}
调用complete方法的时候,就是用该MessageManager对象把自己添加到Context中
public void add(Message message) {
DefaultMessageManager.Context ctx = this.getContext();
if (ctx != null) {
ctx.add(message);
}
}
Context:
public void add(Message message) {
if (this.m_stack.isEmpty()) {
//如果m_stack为空,就直接发送
MessageTree tree = this.m_tree.copy();
tree.setMessage(message);
DefaultMessageManager.this.flush(tree, true);
} else {
//否则,会弹出transaction对象,并把该message作为子对象添加进去
Transaction parent = (Transaction)this.m_stack.peek();
this.addTransactionChild(message, parent);
}
}
DefaultTransaction:
public void complete() {
try {
if (this.isCompleted()) {
DefaultEvent event = new DefaultEvent("cat", "BadInstrument");
event.setStatus("TransactionAlreadyCompleted");
event.complete();
this.addChild(event);
} else {
if (this.m_durationInMicro == -1L) {
this.m_durationInMicro = (System.nanoTime() - this.m_durationStart) / 1000L;
}
this.setCompleted(true);
if (this.m_manager != null) {
//通过MessageManager把数据发送到该transaction发送出去
this.m_manager.end(this);
}
}
} catch (Exception var2) {
;
}
}
DefaultMessageManager:
public void end(Transaction transaction) {
DefaultMessageManager.Context ctx = this.getContext();
if (ctx != null && transaction.isStandalone() && ctx.end(this, transaction)) {
this.m_context.remove();
}
}
Context:
public boolean end(DefaultMessageManager manager, Transaction transaction) {
if (!this.m_stack.isEmpty()) {
Transaction current = (Transaction)this.m_stack.pop();
if (transaction == current) {
DefaultMessageManager.this.m_validator.validate(this.m_stack.isEmpty() ? null : (Transaction)this.m_stack.peek(), current);
} else {
while(transaction != current && !this.m_stack.empty()) {
DefaultMessageManager.this.m_validator.validate((Transaction)this.m_stack.peek(), current);
current = (Transaction)this.m_stack.pop();
}
}
if (this.m_stack.isEmpty()) {
MessageTree tree = this.m_tree.copy();
this.m_tree.setMessageId((String)null);
this.m_tree.setMessage((Message)null);
if (this.m_totalDurationInMicros > 0L) {
this.adjustForTruncatedTransaction((Transaction)tree.getMessage());
}
//把这个messageTree发送出去
manager.flush(tree, true);
return true;
}
}
return false;
}
DefaultMessageManager:
public void flush(MessageTree tree, boolean clearContext) {
MessageSender sender = this.transportManager.getSender();
if (sender != null && this.isMessageEnabled()) {
//获取到sender对象,用sender发送tree
sender.send(tree);
if (clearContext) {
this.reset();
}
} else {
++this.m_throttleTimes;
if (this.m_throttleTimes % 10000L == 0L || this.m_throttleTimes == 1L) {
log.info("Cat Message is throttled! Times:" + this.m_throttleTimes);
}
}
}
TcpSocketSender:
public void send(MessageTree tree) {
if (!this.clientConfigManager.isBlock()) {
double sampleRatio = this.clientConfigManager.getSampleRatio();
if (tree.canDiscard() && sampleRatio < 1.0D && !tree.isHitSample()) {
//todo 命中采样频率后,走这里的逻辑,需确认这里的逻辑是干嘛的
this.processTreeInClient(tree);
} else {
//放到消息队列中
this.offer(tree);
}
}
}
private void offer(MessageTree tree) {
boolean result;
if (this.clientConfigManager.isAtomicMessage(tree)) {
//如果这个messageTree是原子的,就把消息放到原子消息队列中(非transaction类型的是原子的)
result = this.m_atomicQueue.offer(tree);
if (!result) {
this.logQueueFullInfo(tree);
}
} else {
//非原子的,就放在常规消息队列中
result = this.m_queue.offer(tree);
if (!result) {
this.logQueueFullInfo(tree);
}
}
}
各个业务线程把各自生产的消息放在客户端的内存队列中,cat在启动了后台线程用来专门把消息队列中的消息发送给服务器。
客户端就实现了消息的多线程、异步化、队列化,从而保证日志的记录不会因为CAT系统异常而影响主业务线程。
public void run() {
this.m_active = true;
while(this.m_active) {
this.processAtomicMessage();
this.processNormalMessage();
}
this.processAtomicMessage();
while(true) {
MessageTree tree = this.m_queue.poll();
if (tree == null) {
return;
}
ChannelFuture channel = this.m_channelManager.channel();
if (channel != null) {
this.sendInternal(channel, tree);
} else {
this.offer(tree);
}
}
}
private void processAtomicMessage() {
while(this.shouldMerge(this.m_atomicQueue)) {
MessageTree tree = this.mergeTree(this.m_atomicQueue);
//将合并生成的messageTree放在m_queue中
boolean result = this.m_queue.offer(tree);
if (!result) {
this.logQueueFullInfo(tree);
}
}
}
private boolean shouldMerge(MessageQueue queue) {
MessageTree tree = queue.peek(); //获取队列中的第一个
if (tree != null) {
long firstTime = tree.getMessage().getTimestamp();
if (System.currentTimeMillis() - firstTime > 30000L || queue.size() >= 200) {
//如果队列中第一个message生成的时间距离当前时间已经大于30s 或者 队列长度已经超过200个,就需要合并
return true;
}
}
return false;
}
private MessageTree mergeTree(MessageQueue handler) {
int max = 200;
//合并的过程,也是将message放在同一个transaction中,所以先创建一个transaction
DefaultTransaction tran = new DefaultTransaction("System", "_CatMergeTree", (MessageManager)null);
//把队列中第一个messageTree拿出来
MessageTree first = handler.poll();
tran.setStatus("0");
tran.setCompleted(true);
tran.setDurationInMicros(0L);
//将第一个messageTree的message放入刚创建的transaction的m_children列表中
tran.addChild(first.getMessage());
//将队列中剩余的messageTree拿出来,并把对应的message放入transaction的m_children列表中,最多拿出来201个
while(max >= 0) {
MessageTree tree = handler.poll();
if (tree == null) {
break;
}
tran.addChild(tree.getMessage());
--max;
}
//这样就创建了一个包含transaction的messageTree,其中该transaction包含最多不超过202个子message
((DefaultMessageTree)first).setMessage(tran);
return first;
}
private void processNormalMessage() {
//不断从m_queue中拿出messageTree并发送
while(true) {
ChannelFuture channel = this.m_channelManager.channel();
if (channel != null) {
try {
MessageTree tree = this.m_queue.poll();
if (tree == null) {
try {
Thread.sleep(5L);
} catch (Exception var4) {
this.m_active = false;
}
return;
}
this.sendInternal(channel, tree);
tree.setMessage((Message)null);
} catch (Throwable var5) {
log.error("Error when sending message over TCP socket!", var5);
}
} else {
try {
Thread.sleep(5L);
} catch (Exception var6) {
this.m_active = false;
}
}
}
}
//该方法才是真正发送数据的
public void sendInternal(ChannelFuture channel, MessageTree tree) {
if (tree.getMessageId() == null) {
tree.setMessageId(this.messageIdFactory.getNextId());
}
ByteBuf buf = this.m_codec.encode(tree);
int size = buf.readableBytes();
ChannelFuture f = channel.channel().writeAndFlush(buf);
if (this.messageStatistics != null) {
this.messageStatistics.onBytes(size);
}
}
这个线程是为了检查和服务端连接的状况,10s轮询一次,去检查路由服务端ip是否变动,并保证连接正常。典型的拉取配置信息机制。
检查的逻辑是:比较本地server列表跟远程服务提供的列表是否相等,不相等则根据远程服务提供的server列表顺序的重新建立第一个能用的ChannelFuture
public void run() {
while (m_active) {
// make save message id index asyc
m_idFactory.saveMark();
checkServerChanged();
ChannelFuture activeFuture = m_activeChannelHolder.getActiveFuture();
List<InetSocketAddress> serverAddresses = m_activeChannelHolder.getServerAddresses();
doubleCheckActiveServer(m_activeChannelHolder);
reconnectDefaultServer(activeFuture, serverAddresses);
try {
Thread.sleep(10 * 1000L); // check every 10 seconds
} catch (InterruptedException e) {
// ignore
}
}
}
对于一串消息流,我们必须能确定消息边界,提取出单条消息的字节流片段,然后对这个片段按照一定的规则进行反序列化来生成相应的消息对象。
在Java中,只要一个类实现了java.io.Serializable接口,那么它就可以被序列化。但是通过公共接口编码的字节会有很多冗余信息来保证不同对象与字节之间的正确编解码,在CAT中,需要传输的只有MessageTree这么一个对象。通过自定义的序列化方案可以节省许多不必要的字节信息,保证网络传输的高效性。
public ByteBuf encode(MessageTree tree) {
ByteBuf buf = PooledByteBufAllocator.DEFAULT.buffer(4096);
try {
NativeMessageCodec.Context ctx = new NativeMessageCodec.Context(tree);
buf.writeInt(0);
NativeMessageCodec.Codec.HEADER.encode(ctx, buf, (Message)null);
Message msg = tree.getMessage();
if (msg != null) {
this.encodeMessage(ctx, buf, msg);
}
int readableBytes = buf.readableBytes();
buf.setInt(0, readableBytes - 4);
return buf;
} catch (RuntimeException var6) {
buf.release();
throw var6;
}
}
private void encodeMessage(NativeMessageCodec.Context ctx, ByteBuf buf, Message msg) {
if (msg instanceof Transaction) {
Transaction transaction = (Transaction)msg;
List<Message> children = transaction.getChildren();
NativeMessageCodec.Codec.TRANSACTION_START.encode(ctx, buf, msg);
Iterator var6 = children.iterator();
while(var6.hasNext()) {
Message child = (Message)var6.next();
if (child != null) {
this.encodeMessage(ctx, buf, child);
}
}
NativeMessageCodec.Codec.TRANSACTION_END.encode(ctx, buf, msg);
} else if (msg instanceof Event) {
NativeMessageCodec.Codec.EVENT.encode(ctx, buf, msg);
} else if (msg instanceof Metric) {
NativeMessageCodec.Codec.METRIC.encode(ctx, buf, msg);
} else if (msg instanceof Heartbeat) {
NativeMessageCodec.Codec.HEARTBEAT.encode(ctx, buf, msg);
} else {
if (!(msg instanceof Trace)) {
throw new RuntimeException(String.format("Unsupported message(%s).", msg));
}
NativeMessageCodec.Codec.TRACE.encode(ctx, buf, msg);
}
}
protected void encode(NativeMessageCodec.Context ctx, ByteBuf buf, Message msg) {
MessageTree tree = ctx.getMessageTree();
ctx.writeVersion(buf, "NT1");
ctx.writeString(buf, tree.getDomain());
ctx.writeString(buf, tree.getHostName());
ctx.writeString(buf, tree.getIpAddress());
ctx.writeString(buf, tree.getThreadGroupName());
ctx.writeString(buf, tree.getThreadId());
ctx.writeString(buf, tree.getThreadName());
ctx.writeString(buf, tree.getMessageId());
ctx.writeString(buf, tree.getParentMessageId());
ctx.writeString(buf, tree.getRootMessageId());
ctx.writeString(buf, tree.getSessionToken());
}
CAT服务端在整个实时处理中,基本上实现了全异步化处理:
当客户端将日志数据上传到服务器之后,会交给MessageDecoder 解码数据,然后进行后续处理。消息解码完成后最终调用MessageConsumer.consume方法传递给消费方。
MessageAnalyzer与PeriodTask是1对1的关系,每种类别分析器具体有多少个实例由 getAnalyzerCount() 函数决定,默认是 1 个, 但是有些分析任务非常耗时,需要多个线程来处理,保证处理效率,比如 TransactionAnalyzer就是2个。
消息分发的时候,每一笔消息默认都会发送到所有种类分析器处理,但是同一种类别的分析器下如果有多个MessageAnalyzer实例,采用domain hash 选出其中一个实例安排处理消息
为什么要使用一个小时的粒度呢? 这是实时内存数据处理的复杂度与内存的开销方面的折中方案。 在这个小时结束后将生成的Transaction\Event\Problean等报表存入DB。然而为了实时性,当前小时的报表是保存在内存中的。
public void run() {
while (m_active) {
try {
long now = System.currentTimeMillis();
long value = m_strategy.next(now);
if (value > 0) {
startPeriod(value);
} else if (value < 0) {
// last period is over,make it asynchronous
Threads.forGroup("cat").start(new EndTaskThread(-value));
}
} catch (Throwable e) {
Cat.logError(e);
}
try {
Thread.sleep(1000L);
} catch (InterruptedException e) {
break;
}
}
}
private void startPeriod(long startTime) {
long endTime = startTime + m_strategy.getDuration();
Period period = new Period(startTime, endTime, m_analyzerManager, m_serverStateManager, m_logger);
m_periods.add(period);
period.start();
}
public Period(long startTime, long endTime, MessageAnalyzerManager analyzerManager,
ServerStatisticManager serverStateManager, Logger logger) {
m_startTime = startTime;
m_endTime = endTime;
m_analyzerManager = analyzerManager;
m_serverStateManager = serverStateManager;
m_logger = logger;
List<String> names = m_analyzerManager.getAnalyzerNames();
m_tasks = new HashMap<String, List<PeriodTask>>();
for (String name : names) {
List<MessageAnalyzer> messageAnalyzers = m_analyzerManager.getAnalyzer(name, startTime);
for (MessageAnalyzer analyzer : messageAnalyzers) {
MessageQueue queue = new DefaultMessageQueue(QUEUE_SIZE);
PeriodTask task = new PeriodTask(analyzer, queue, startTime);
task.enableLogging(m_logger);
List<PeriodTask> analyzerTasks = m_tasks.get(name);
if (analyzerTasks == null) {
analyzerTasks = new ArrayList<PeriodTask>();
m_tasks.put(name, analyzerTasks);
}
analyzerTasks.add(task);
}
}
}
在消费者中,最重要的一个概念就是消息分析器(MessageAnalyzer),所有的消息分析统计,报表创建都是由消息分析器来完成.
PeriodManager, 用于滚动式处理每小时的监控数据
public void run() {
while (m_active) {
try {
long now = System.currentTimeMillis();
long value = m_strategy.next(now);
if (value > 0) {
startPeriod(value);
} else if (value < 0) {
// last period is over,make it asynchronous
Threads.forGroup("cat").start(new EndTaskThread(-value));
}
} catch (Throwable e) {
Cat.logError(e);
}
try {
Thread.sleep(1000L);
} catch (InterruptedException e) {
break;
}
}
}
分析器的大体结构:
每个分析器都包含有多个报表,报表交由报表管理器(ReportManage)管理,报表在报表管理器中存储结构如下:
Map<Long, Map<String, T>> m_reports
最外层是个Map, key 为long类型,代表的是当前时间周期的报表,value还是一个Map,key类型为String,代表的是不同的domain,一个domain可以理解为一个 Project,value是不同report对象,在分析器处理报表的时候,我们会通过周期管理器(DefaultReportManage)的getHourlyReport方法根据周期时间和domain获取对应的Report。
以EventAnalyzer分析器为例
cat.logEvent("cat_client_version", "2.0.0");
事件分析报表会记录Event类型消息的统计汇总信息,每个周期时间,每个domain对应一个EventReport,每个Event报表包含多个Machine对象,按IP区分,相同IP下不同类型(Type)的Event信息存在于不同的EventType对象中,EventType记录了该类型消息的总数,失败总数,失败百分比,成功的MessageID,失败的MessageID,tps,以及该类型下各种命名消息。
EventAnalyzer:
public void process(MessageTree tree) {
String domain = tree.getDomain();
String ip = tree.getIpAddress();
//通过时间和domain获取对应的EventReport对象
EventReport report = m_reportManager.getHourlyReport(getStartTime(), domain, true);
List<Event> events = tree.findOrCreateEvents();
for (Event event : events) {
String data = String.valueOf(event.getData());
int total = 1;
int fail = 0;
boolean batchData = data.length() > 0 && data.charAt(0) == CatConstants.BATCH_FLAG;
if (batchData) {
String[] tab = data.substring(1).split(CatConstants.SPLIT);
total = Integer.parseInt(tab[0]);
fail = Integer.parseInt(tab[1]);
} else {
if (!event.isSuccess()) {
fail = 1;
}
}
//对messageTree中的event依次处理
processEvent(report, tree, event, ip, total, fail, batchData);
}
if (System.currentTimeMillis() > m_nextClearTime) {
m_nextClearTime = m_nextClearTime + TimeHelper.ONE_MINUTE;
Threads.forGroup("cat").start(new Runnable() {
@Override
public void run() {
cleanUpReports();
}
});
}
}
private void processEvent(EventReport report, MessageTree tree, Event event, String ip, int total, int fail,
boolean batchData) {
//根据ip获取或创建一个Machine对象
Machine machine = report.findOrCreateMachine(ip);
//根据事件类型获取或创建一个EventType对象
EventType type = findOrCreateType(machine, event.getType());
//根据事件名称获取或创建一个EventName对象
EventName name = findOrCreateName(type, event.getName(), report.getDomain());
String messageId = tree.getMessageId();
type.incTotalCount(total);
name.incTotalCount(total);
if (fail > 0) {
type.incFailCount(fail);
name.incFailCount(fail);
}
if (type.getSuccessMessageUrl() == null) {
type.setSuccessMessageUrl(messageId);
}
if (name.getSuccessMessageUrl() == null) {
name.setSuccessMessageUrl(messageId);
}
if (!batchData) {
if (event.isSuccess()) {
type.setSuccessMessageUrl(messageId);
name.setSuccessMessageUrl(messageId);
} else {
type.setSuccessMessageUrl(messageId);
name.setSuccessMessageUrl(messageId);
String statusCode = formatStatus(event.getStatus());
findOrCreateStatusCode(name, statusCode).incCount();
}
}
//计算该事件类型的失败率
type.setFailPercent(type.getFailCount() * 100.0 / type.getTotalCount());
//计算该事件名称的失败率
name.setFailPercent(name.getFailCount() * 100.0 / name.getTotalCount());
processEventGrpah(name, event, total, fail);
}
private void processEventGrpah(EventName name, Event event, int total, int fail) {
//计算该事件发生在该小时内第几分钟
long current = event.getTimestamp() / 1000 / 60;
int min = (int) (current % (60));
//获取或创建该分钟的Range对象
Range range = name.findOrCreateRange(min);
//对该range对象进行计数
range.incCount(total);
if (fail > 0) {
range.incFails(fail);
}
}
作为一个监控服务,cat需要做到尽量少对业务系统造成影响,尽快的响应异常。所以,cat也在各个环节做了优化:
cat客户端
数据传输
cat服务端
后记:
cat的logview使⽤的技术是threadlocal,将⼀个thread⾥⾯的打点聚合上报,有⼀点弱化版本的链路功能,但是cat并不是⼀个标准的全链路系统,全链路系统参考dapper的论⽂,业内⽐较知名的鹰眼,zipkin等,其实经常拿cat和这类系统进⾏⽐较其实是不合适的。cat的logview在异步线程等等⼀些场景下,其实不合适,cat本⾝模型并不适合这个。
参考文献:
http://www.javashuo.com/article/p-duotqdxz-a.html
https://www.jianshu.com/p/7a2e6c722b04
https://www.freesion.com/article/21471416726/
https://www.freesion.com/article/34341394263/
https://www.freesion.com/article/57301403415/
https://blog.csdn.net/mnmlist/article/details/114293423
https://www.cnblogs.com/xiaowenshu/p/10319762.html
https://www.cnblogs.com/xiaowenshu/p/10319762.html
ForkedTransaction的解释 https://www.shuzhiduo.com/A/6pdDBDAGJw/
https://mp.weixin.qq.com/s/9OFtmyX5IPvs2ljhPeiESw
https://blog.csdn.net/caohao0591/article/details/80200769
https://www.freesion.com/article/74931172436/
https://www.iocoder.cn/Spring-Boot/CAT/
https://www.iocoder.cn/CAT/install/?self
https://www.meiwen.com.cn/subject/alxjqqtx.html
https://www.freesion.com/article/22221033453/
https://blog.csdn.net/lemon89/article/details/76273404
https://blog.csdn.net/cd18333612683/article/details/83505137
https://blog.csdn.net/szwandcj/article/details/50992580