用BTrace排查性能问题

BTrace是一个可以动态跟踪Java代码执行的工具,网上已经有很多文章介绍,我这里分享一个我在实际工作中排查性能问题的例子。

现象

我的一个REST接口调用非常慢,postman耗时3873 ms,这个接口就是从redis里把一批数据取出来,redis性能很好,理论上不会这么慢,于是用btrace,trace方法调用。

 

/* BTrace Script Template */
import com.sun.btrace.annotations.*;
import static com.sun.btrace.BTraceUtils.*;

@BTrace
public class TracingScript {
    /* put your code here */
    @TLS  
    private static long startlTime = 0;
    @TLS  
    private static long startmTime = 0;

    @TLS  
    private static long startjTime = 0;
    @TLS  
    private static long startrTime = 0;
    @TLS  
    private static long startbTime = 0;

    @OnMethod(clazz="com.xx.monitoring.api.util.JedisHelper", method="listSize")
    public static void startjMethod() {
        startjTime = timeMillis();
    }

    @OnMethod(clazz="com.xx.monitoring.api.util.JedisHelper", method="listSize", location=@Location(Kind.RETURN))
    public static void endjMethod() {
        println(strcat("listSize method execute time=>", str(timeMillis()-startjTime)));  
        println("-------------------------------------------");  
    }

    @OnMethod(clazz="com.xx.monitoring.api.util.JedisHelper", method="listRange")
    public static void startrMethod() {
        startrTime = timeMillis();
    }

    @OnMethod(clazz="com.xx.monitoring.api.util.JedisHelper", method="listRange", location=@Location(Kind.RETURN))
    public static void endrMethod() {
        println(strcat("listRange method execute time=>", str(timeMillis()-startrTime)));  
        println("-------------------------------------------");  
    }

    @OnMethod(clazz="com.xx.monitoring.api.persistence.MetricRedisDao", method="listMetricByIds")
    public static void startbMethod() {
        startbTime = timeMillis();
    }

    @OnMethod(clazz="com.xx.monitoring.api.persistence.MetricRedisDao", method="listMetricByIds", location=@Location(Kind.RETURN))
    public static void endbMethod(java.util.List metricIds) {
        println("metrcIds.size: " + metricIds.size());
        println(strcat("listMetricByIds method execute time=>", str(timeMillis()-startbTime)));  
        println("-------------------------------------------");  
    }

    @OnMethod(clazz="com.xx.monitoring.api.service.MetricServiceImpl", method="listMetricsInternal")
    public static void startlMethod() {
        startlTime = timeMillis();
    }

    @OnMethod(clazz="com.xx.monitoring.api.service.MetricServiceImpl", method="listMetricsInternal", location=@Location(Kind.RETURN))
    public static void endlMethod() {
        println(strcat("listMetricsInternal method execute time=>", str(timeMillis()-startlTime)));  
        println("-------------------------------------------");  
    }

    @OnMethod(clazz="com.xx.monitoring.api.service.MetricServiceImpl", method="listMetricsMap")
    public static void startmMethod() {
        startmTime = timeMillis();
    }

    @OnMethod(clazz="com.xx.monitoring.api.service.MetricServiceImpl", method="listMetricsMap", location=@Location(Kind.RETURN))
    public static void endmMethod() {
        println(strcat("listMetricsMap execute time=>", str(timeMillis()-startmTime)));  
        println("-------------------------------------------");  
    }
}

 编译:

 

$ btracec TracingScript.java

开始trace

$ btrace  19416 TracingScript.class

结果:

 

DEBUG: received com.sun.btrace.comm.MessageCommand@5b6f7412
listSize method execute time=>3
DEBUG: received com.sun.btrace.comm.MessageCommand@27973e9b
-------------------------------------------
DEBUG: received com.sun.btrace.comm.MessageCommand@312b1dae
listRange method execute time=>18
DEBUG: received com.sun.btrace.comm.MessageCommand@7530d0a
-------------------------------------------
DEBUG: received com.sun.btrace.comm.MessageCommand@27bc2616
listRange method execute time=>19
DEBUG: received com.sun.btrace.comm.MessageCommand@3941a79c
-------------------------------------------
DEBUG: received com.sun.btrace.comm.MessageCommand@506e1b77
listMetricsInternal method execute time=>4820
DEBUG: received com.sun.btrace.comm.MessageCommand@4fca772d
-------------------------------------------
DEBUG: received com.sun.btrace.comm.MessageCommand@9807454
listMetricsMap execute time=>4821

 可以看到listMetricsInternal执行了4820毫秒

这个方法就是把metric的id循环从redis取metric bean,初步判断是id太多导致很慢,继续trace

添加方法:

 

    @OnMethod(clazz="com.xx.monitoring.api.persistence.MetricRedisDao", method="listMetricByIds", location=@Location(Kind.RETURN))
    void endbMethod(java.util.List<String> metricIds) {
        println("metrcIds.size: " + str(metricIds));
        println("-------------------------------------------");  
    }

 执行结果:

 

 

metrcIds.size: [weblogic.datasource.waiting_for_connection_high_count, weblogic.datasource.leaked_connection_count, weblogic.datasource.active_connections_currentCount, weblogic.threadpool.standby_thread_count, weblogic.threadpool.active_thread_idle_count, weblogic.threadpool.execute_thread_total_count, weblogic.session.open_sessions_current_count, weblogic.session.open_sessions_high_count, weblogic.session.sessions_opened_total_count, weblogic.server.health_state, weblogic.server.activation_time, server.os.kernel_version, server.os.version, server.os.id, server.interfaces.tx, server.interfaces.rx, server.disk.used_percent, server.cpu.load, server.memory.free_percent, server.memory.used_percent, server.cpu.idle, server.cpu.iowait, server.cpu.sys, server.cpu.user, oracle.info.status, oracle.worst.sql, oracle.time_ratio.cpu_time_ratio, oracle.time_ratio.wait_time_ratio, oracle.connection.count, jvm.memory.used, jvm.memory.max, jvm.memory.committed, jvm.thread.total_started_thread_count, jvm.thread.daemon_thread_count, jvm.thread.peak_thread_count, jvm.thread.count, jvm.gc.collection_time, jvm.gc.collection_count, jvm.classsloading.unloaded_class_count, jvm.classsloading.loaded_class_count, jvm.classsloading.total_loaded_class_count, jvm.info.input_arguments, jvm.info.vm_id, jvm.info.vm_version, weblogic.datasource.waiting_for_connection_high_count, weblogic.datasource.leaked_connection_count, weblogic.datasource.active_connections_currentCount, weblogic.threadpool.standby_thread_count, weblogic.threadpool.active_thread_idle_count, weblogic.threadpool.execute_thread_total_count, weblogic.session.open_sessions_current_count, weblogic.session.open_sessions_high_count, weblogic.session.sessions_opened_total_count, weblogic.server.health_state, weblogic.server.activation_time, server.os.kernel_version, server.os.version, serve



还有很多
...

一共3610个,从redis取一个1ms,那也要3610ms,怪不得。

 

原因找到了,改代码就很简单了

另外发现BTrace还有一个工具专门分析性能的BTraceUtils.Profiling,代码:

 

import com.sun.btrace.BTraceUtils;
import com.sun.btrace.Profiler;
import com.sun.btrace.annotations.*;
import static com.sun.btrace.BTraceUtils.*;


@BTrace class Profiling {
    @Property
    Profiler swingProfiler = BTraceUtils.Profiling.newProfiler();
    
    @OnMethod(clazz="/com\\.xx\\.monitoring\\.api\\..*/", method="/.*/")
    void entry(@ProbeMethodName(fqn=false) String probeMethod) { //fqn是否打印长方法名
        BTraceUtils.Profiling.recordEntry(swingProfiler, probeMethod);
    }
    
    @OnMethod(clazz="/com\\.xx\\.monitoring\\.api\\..*/", method="/.*/", location=@Location(value=Kind.RETURN))
    void exit(@ProbeMethodName(fqn=false) String probeMethod, @Duration long duration) { 
        BTraceUtils.Profiling.recordExit(swingProfiler, probeMethod, duration);
    }
    
    @OnTimer(5000) //每5秒打印一次
    void timer() {
        BTraceUtils.Profiling.printSnapshot("Performance profile", swingProfiler);
    }
}

 结果:

 

 

Performance profile
  Block                Invocations  SelfTime.Total  SelfTime.Avg  SelfTime.Min  SelfTime.Max  WallTime.Total  WallTime.Avg  WallTime.Min  WallTime.Max
  preHandle                      1           96000         96000         96000         96000           96000         96000         96000         96000
  listSize                       1         2231000       2231000       2231000       2231000         2231000       2231000       2231000       2231000
  listRange                      2        18447000       9223500         23000      18424000        36871000      18435500      18424000      18447000
  getMetric                   3610        20915000          5793          1000        384000      3805159000       1054060        689000       5752000
  getMetricIndexOfId          3610       189140000         52393         19000        930000       189140000         52393         19000        930000
  get                         3610      3583769000        992733        629000       4758000      3595104000        995873        633000       4787000
  <init>                      3610          760000           210             0         10000          760000           210             0         10000
  setId                       3610         1983000           549             0         20000         1983000           549             0         20000
  setUnit                     3610         2019000           559             0         51000         2019000           559             0         51000
  setCreated                  3610         2111000           584             0         32000         2111000           584             0         32000
  setUpdated                  3610         1556000           431             0         26000         1556000           431             0         26000
  setValueType                3610         1251000           346             0         19000         1251000           346             0         19000
  setDisplayName              3610         1476000           408             0         54000         1476000           408             0         54000
  setCustomized                271          179000           660             0         16000          179000           660             0         16000
  doFilter                       1        22833000      22833000      22833000      22833000      3867649000    3867649000    3867649000    3867649000
  listMetricsMap                 2         2151000       1075500         49000       2102000      7685645000    3842822500    3842798000    3842847000
  listMetricsInternal            1           13000         13000         13000         13000      3839969000    3839969000    3839969000    3839969000
  listMetrics                    1         2332000       2332000       2332000       2332000      3839956000    3839956000    3839956000    3839956000
  listMetricByIds                1        11787000      11787000      11787000      11787000      3816946000    3816946000    3816946000    3816946000
  getId                       3676          737000           200             0         12000          737000           200             0         12000
  isCustomized                  66           11000           166             0          1000           11000           166             0          1000
  getUnit                       66            8000           121             0          1000            8000           121             0          1000
  getCreated                    66            9000           136             0          1000            9000           136             0          1000
  getUpdated                    66            9000           136             0          1000            9000           136             0          1000
  getValueType                  66           14000           212             0          1000           14000           212             0          1000
  getDisplayName                66            9000           136             0          1000            9000           136             0          1000
  postHandle                     1         1803000       1803000       1803000       1803000         1803000       1803000       1803000       1803000

 可见getMetric执行了3610次

 

怎么样,BTrace还是很强大的吧,分析线上问题很方便

 

 

 

 
 

你可能感兴趣的:(java,BTrace)