在跑ycsb的时候,发现在做heavy的get操作的时候,ycsb统计的latency很大,100个thread的时候就接近100ms,而从ganglia上看hbase的“get_avg_time"这个metric发现只有20~30左右。最终查看code,发现100个thread共享同一个连接,所有Call的请求数据都走这一个连接,所以在大量请求并发时会造成拥堵,latency变大。具体看下面这个方法:
public Writable[] call(Writable[] params, InetSocketAddress[] addresses) throws IOException { if (addresses.length == 0) return new Writable[0]; ParallelResults results = new ParallelResults(params.length); synchronized (results) { for (int i = 0; i < params.length; i++) { ParallelCall call = new ParallelCall(params[i], results, i); try { Connection connection = getConnection(addresses[i], null, call); connection.sendParam(call); // send each parameter } catch (IOException e) { // log errors LOG.info("Calling "+addresses[i]+" caught: " + e.getMessage(),e); results.size--; // wait for one fewer result } } while (results.count != results.size) { try { results.wait(); // wait for all results } catch (InterruptedException e) {} } return results.values; } }
而sendParm方法是synchorinized。
public void sendParam(Call call) {
if (shouldCloseConnection.get()) {
return;
}
DataOutputBuffer d=null;
try {
synchronized (this.out) {
if (LOG.isDebugEnabled())
LOG.debug(getName() + " sending #" + call.id);
//for serializing the
//data to be written
d = new DataOutputBuffer();
d.writeInt(call.id);
call.param.write(d);
byte[] data = d.getData();
int dataLength = d.getLength();
out.writeInt(dataLength); //first put the data length
out.write(data, 0, dataLength);//write the data
out.flush();
}
} catch(IOException e) {
markClosed(e);
} finally {
//the buffer is just an in-memory buffer, but it is still polite to
// close early
IOUtils.closeStream(d);
}
}
对同一个目的地的RegionServer来说,connection只有一个。这样设计的初衷是为了共享连接,减少维护多个连接的开销,但在这种情形下会造成latency增大。具体解释可以下面两个链接。
http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/11534
https://issues.apache.org/jira/browse/HBASE-2939
在后一个连接中给出了解决问题的一个折衷patch,就是不再只用一个connection,而是用几个,这也算是一个tradeoff吧。