一、问题描述:
最近使用storm之后发现zookeeper总是出现警告,故此跟踪了一下源码,知道问题产生在哪里。警告信息如下:
2014-09-19 13:41:29,400 [myid:1] - WARN [SyncThread:1:FileTxnLog@321] - fsync-ing the write ahead log in SyncThread:1 took 4575ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide
2014-09-19 13:41:30,453 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when following the leader
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.DataInputStream.readInt(DataInputStream.java:370)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
二、问题定位:(FileTxnLog)
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.class
/**
* commit the logs. make sure that evertyhing hits the
* disk
*/
public synchronized void commit() throws IOException {
if (logStream != null) {
logStream.flush();
}
for (FileOutputStream log : streamsToFlush) {
log.flush();
if (forceSync) {
long startSyncNS = System.nanoTime();
//执行这个任务时间如果超过fsyncWarningThresholdMS就告警
log.getChannel().force(false);
long syncElapsedMS =
TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - startSyncNS);
if (syncElapsedMS > fsyncWarningThresholdMS) {
LOG.warn("fsync-ing the write ahead log in "
+ Thread.currentThread().getName()
+ " took " + syncElapsedMS
+ "ms which will adversely effect operation latency. "
+ "See the ZooKeeper troubleshooting guide");
}
}
}
while (streamsToFlush.size() > 1) {
streamsToFlush.removeFirst().close();
}
}
三、分析解决方案:
1)fsyncWarningThresholdMS默认值是多少
/** Maximum time we allow for elapsed fsync before WARNing */
private final static long fsyncWarningThresholdMS;
static {
LOG = LoggerFactory.getLogger(FileTxnLog.class);
String size = System.getProperty("zookeeper.preAllocSize");
if (size != null) {
try {
preAllocSize = Long.parseLong(size) * 1024;
} catch (NumberFormatException e) {
LOG.warn(size + " is not a valid value for preAllocSize");
}
}
//如果获取不到默认值就是1000
fsyncWarningThresholdMS = Long.getLong("fsync.warningthresholdms", 1000);
}
public static Long getLong(String nm, long val) {
Long result = Long.getLong(nm, null);
return (result == null) ? Long.valueOf(val) : result;
}
public static Long getLong(String nm, Long val) {
String v = null;
try {
v = System.getProperty(nm);
} catch (IllegalArgumentException e) {
} catch (NullPointerException e) {
}
if (v != null) {
try {
return Long.decode(v);
} catch (NumberFormatException e) {
}
}
return val;
}
此处参考源码阅读一的这个方法:也没有发现给fsyncWarningThresholdMS 赋值的情况
/**
* Parse config from a Properties.
* @param zkProp Properties to parse from.
* @throws IOException
* @throws ConfigException
*/
public void parseProperties(Properties zkProp)
throws IOException, ConfigException {
int clientPort = 0;
String clientPortAddress = null;
for (Entry<Object, Object> entry : zkProp.entrySet()) {
String key = entry.getKey().toString().trim();
String value = entry.getValue().toString().trim();
if (key.startsWith("group")) {
int dot = key.indexOf('.');
long gid = Long.parseLong(key.substring(dot + 1));
numGroups++;
String parts[] = value.split(":");
for(String s : parts){
long sid = Long.parseLong(s);
if(serverGroup.containsKey(sid))
throw new ConfigException("Server " + sid + "is in multiple groups");
else
serverGroup.put(sid, gid);
}
} else if(key.startsWith("weight")) {
int dot = key.indexOf('.');
long sid = Long.parseLong(key.substring(dot + 1));
serverWeight.put(sid, Long.parseLong(value));
} else {
System.setProperty("zookeeper." + key, value);
}
}
}
fsync.warningthresholdms(java:fsync.warningthresholdms)
当日志中的fsync函数超出了该值的长度,就会在日志出输出警告信息。默认值是1000(毫秒级),是系统属性。
解决方案:
在zoo.cfg中配置不生效。在conf/java.env中添加java系统属性:、
export JVMFLAGS="-Dfsync.warningthresholdms=20 $JVMFLAGS"