问题描述:上线服务器后,机器直接死掉,服务不可用。
分析问题:机器上线后通过mdc观察可知,cpu飙升,tomcat的catalina.out文件几分钟6G,以下是日志的部分内容
10:30:03.973 [localhost-startStop-1-SendThread(10.168.180.94:2181)] DEBUG org.apache.zookeeper.ClientCnxn – Reading reply sessionid:0x2497977a96e368f, packet:: clientPath:null serverPath:null finished:false header:: 2054411,4 replyHeader:: 2054411,30431451472,0 request:: ‘/cache-redis/cluster-cache-kop/monitors/redis-10.168.163.13-6379/monitor-5063e8eb-2c44-4560-8f32-821d9da0928e,F response:: #7b226865616c7468223a36383539347d,s{30421042529,30431451467,1437462953967,1437532203953,68595,0,0,164796416226375311,16,0,30421042529} 10:30:03.974 [localhost-startStop-1-SendThread(10.168.180.94:2181)] DEBUG org.apache.zookeeper.ClientCnxn – Reading reply sessionid:0x2497977a96e368f, packet:: clientPath:null serverPath:null finished:false header:: 2054412,4 replyHeader:: 2054412,30431451472,0 request:: ‘/cache-redis/cluster-cache-kop/monitors/redis-10.168.163.13-6379/monitor-fd44c3c8-49ed-4493-9fb3-15c5e9b32b48,F response:: #7b226865616c7468223a36363534397d,s{30421293932,30431451332,1437465020671,1437532203056,66550,0,0,164796416226375313,16,0,30421293932} 10:30:03.974 [localhost-startStop-1-SendThread(10.168.180.94:2181)] DEBUG org.apache.zookeeper.ClientCnxn – Reading reply sessionid:0x2497977a96e368f, packet:: clientPath:null serverPath:null finished:false header:: 2054413,4 replyHeader:: 2054413,30431451472,0 request:: ‘/cache-redis/cluster-cache-kop/monitors/redis-10.168.163.13-6379/monitor-a0485714-92a8-4ec8-b1cf-eb52f134f53c,F response:: #7b226865616c7468223a3438383434327d,s{30369309074,30431451356,1437030682965,1437532203246,488443,0,0,236854010264761444,17,0,30369309074} 10:30:03.974 [localhost-startStop-1-SendThread(10.168.180.94:2181)] DEBUG org.apache.zookeeper.ClientCnxn – Reading reply sessionid:0x2497977a96e368f, packet:: clientPath:null serverPath:null finished:false header:: 2054414,4 replyHeader:: 2054414,30431451472,0 request:: ‘/cache-redis/cluster-cache-kop/monitors/redis-10.168.163.13-6379/monitor-6cbc3490-d818-4f87-9528-9caa948f1595,F response:: #7b226865616c7468223a3438383330307d,s{30369305395,30431451394,1437030658897,1437532203574,488301,0,0,164796416226375097,17,0,30369305395}
这些基本都是zookeeper的debug输出的心跳日志,而我们的log4j文件配置的root logger都是info级别的,并且之前的系统并不会打印这些debug级别日志
日志基本都是org.apache.zookeeper.ClientCnxn输出的,看了ClientCnxn的源码,zookeeper 3.3.3里的ClientCnxn的log直接用log4j,zookeeper 3.4.5里的ClientCnxn用的是slf4j,如下:
public class ClientCnxn {
private static final Logger LOG = LoggerFactory.getLogger(ClientCnxn.class);
此时我们可以清楚的知道我们的log4j在这里是没有生效的,是自己的应用依赖的jar隐式依赖了logback日志框架。
问题现在就可以转换为log4j和slf4j的冲突问题。
解决问题:
对于排除同个jar包多个版本的冲突问题:
首先搞清楚各个Jar包之间的依赖关系
:对于用eclipse工具的mvn工程,可以直接查看pom文件中的dependency Hierarchy
或者用命令 mvn dependency:tree
将依赖关系展现出来,这样就知道哪个间接的版本有问题;
然后选择合适的版本,剔除不想要的版本**
:通常是高版本优先
**,可以在Pom文件中比较靠前的位置显示声明一个高版本,或者排除某个低版本依赖;
当Slf4j下同时存在log4j和logback包时,会存在多个StaticLoggerBinder类冲突
,Slf4j会随机选择
一个StaticLoggerBinder加载生成实例。 修改内容如下
总结:Log4j和logback是有冲突的,这样会导致Log4j的日志级别降低到DEBUG级别。maven依赖中把logback的隐式依赖给去掉。