问题描述:
测试运行一段时间后,测试客户端CPU100%,Loadrunner界面有错误报出。
问题分析过程:
抓取堆栈信息
分析堆栈发现线程有1000多,大部分为BLOCKED状态,ACTIVE状态基本看到的都是nio的,暂时没看到问题。
搜索测试代码类名,看看有没有测试代码引起的问题。
发现测试代码有好几个以下的堆栈
Thread 1134: (state = IN_NATIVE)
- java.net.NetworkInterface.getAll() @bci=0 (Compiled frame; information may be imprecise)
- java.net.NetworkInterface.getNetworkInterfaces() @bci=0, line=334 (Compiled frame)
- com.alibaba.rocketmq.remoting.common.RemotingUtil.getLocalAddress() @bci=0, line=112 (Compiled frame)
- com.alibaba.rocketmq.client.ClientConfig.<init>() @bci=19, line=32 (Compiled frame)
- com.alibaba.rocketmq.client.producer.DefaultMQProducer.<init>
(java.lang.String, com.alibaba.rocketmq.remoting.RPCHook) @bci=1, line=95 (Compiled frame)
- com.alibaba.rocketmq.client.producer.DefaultMQProducer.<init>(java.lang.String) @bci=3, line=86 (Compiled frame)
- ********************MQProducer.<init>(java.lang.String, java.lang.String) @bci=71, line=62 (Compiled frame)
- ********************.RocketMQ.sendMessage() @bci=76,line=119 (Compiled frame) //119为源代码行号
- ********************.RocketMQ$1$1.safeRun() @bci=7, line=53 (Compiled frame)
- ********************.SafeRunnable.run() @bci=1, line=13 (Compiled frame)
- java.util.concurrent.ThreadPoolExecutor.runWorker
(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1145 (Compiled frame)
- java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=615 (Interpreted frame)
- java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
发现 java.net.NetworkInterface.getAll() ,此方法比较耗费CPU,之前遇到过类似案例。接着分析为什么这几个线程会卡到这。
以下是问题代码,119行是(1)的位置
public void sendMessage() {
try {
// 略…
t = UserManager.getTransManager().CreateTransaction("Performace-client", trace);
Message msg = new Message("Performace", msgContent.getBytes("UTF-8"));
if (producer == null) {
producer = new MQProducer("Performace", "192.168.143.135:9876"); (1)
producer.start();
rst = producer.product(msg);
} else {
rst = producer.product(msg);
}
// 略…
} catch (Exception e) {
// 略…
if (producer != null) {
producer.shutdown();
producer = null;
}
}
}
问题分析
sendMessage方法会被随机的注册到一个timer线程池上,有可能会在同一时间点或者很近时间点同时执行该方法。
producer.product(msg);为给远端发送信息,如果因为网络原因或者其他未知原因导致Exception,会把producer赋值为null。
当再次执行sendMessage会重新初始化producer,如果恰好有多线程并发执行sendMessage,可能会导致重复初始化以及其他并发问题,导致恶性循环。
修改后
public void sendMessage() {
try {
// 略…
t = UserManager.getTransManager().CreateTransaction("Performace-client", trace);
Message msg = new Message("Performace", msgContent.getBytes("UTF-8"));
synchronized (this) {
if (producer == null) {
producer = new MQProducer("Performace", "192.168.143.135:9876");
producer.start();
}
}
rst = producer.product(msg);
// 略…
} catch (Exception e) {
// 略…
if (producer != null) {
producer.shutdown();
producer = null;
}
}
}
加个同步等待,问题解决
作者:No.40