使用YCSB对HBase进行测试

YCSB介绍

YCSB(Yahoo! Cloud Serving Benchmark)是雅虎开源的一款通用的性能测试工具。

通过这个工具我们可以对各类NoSQL产品进行相关的性能测试,包括:

  • PNUTS
  • BigTable
  • HBase
  • Hypertable
  • Azure
  • Cassandra
  • CouchDB
  • Voldemort
  • MongoDb
  • Dynomite

关于YCSB的相关说明可以参考:

  1. Getting Started
  2. Running a Workload
  3. Adding a Database

YCSB与HBase自带的性能测试工具(PerformanceEvaluation)相比,好处在于:

  • 扩展:进行性能测试的客户端不仅仅只是HBase一款产品,而且可以是HBase不同的版本。
  • 灵活:进行性能测试的时候,可以选择进行测试的方式:read+write,read+scan等,还可以选择不同操作的频度与选取Key的方式。
  • 监控:
    • 进行性能测试的时候,可以实时显示测试进行的进度:
    • 1340 sec: 751515 operations; 537.74 current ops/sec; [INSERT AverageLatency(ms)= 1.77 ]
      1350 sec: 755945 operations; 442.82 current ops/sec; [INSERT AverageLatency(ms)= 2.18 ]
      1360 sec: 761545 operations; 559.72 current ops/sec; [INSERT AverageLatency(ms)= 1.71 ]
      1370 sec: 767616 operations; 606.92 current ops/sec; [INSERT AverageLatency(ms)= 1.58 ]
    • 测试完成之后,会显示整体的测试情况:
    •  
      [OVERALL], RunTime(ms), 1762019.0
      [OVERALL], Throughput(ops/sec), 567.5307700995279
      [INSERT], Operations, 1000000
      [INSERT], AverageLatency(ms), 1.698302
      [INSERT], MinLatency(ms), 0
      [INSERT], MaxLatency(ms), 14048
      [INSERT], 95thPercentileLatency(ms), 2
      [INSERT], 99thPercentileLatency(ms), 3
      [INSERT], Return= 0 , 1000000
      [INSERT], 0 , 29
      [INSERT], 1 , 433925
      [INSERT], 2 , 549176
      [INSERT], 3 , 10324
      [INSERT], 4 , 3629
      [INSERT], 5 , 1303
      [INSERT], 6 , 454
      [INSERT], 7 , 140

YCSB的不足在于:

自带的workload模型还是过于简单,不提供MR的形式进行测试,所以进行测试的时候如果希望开启多线程的方式会比较麻烦。

比如导入的时候开启多线程只能是启动多个导入进程,然后在不同的启动参数中指定“开始Key的值”。在进行Transaction测试的时候,只能在多台机器开启多个线程来操作。

使用YCSB测试HBase-0.90.4

从官网下载YCSB-0.1.3的源代码

http://github.com/brianfrankcooper/YCSB/tarball/0.1.3

编译YCSB

hdfs@hd0004-sw1 guopeng$ cd YCSB-0.1.3/
hdfs@hd0004-sw1 YCSB-0.1.3$ pwd
/home/hdfs/guopeng/YCSB-0.1.3
hdfs@hd0004-sw1 YCSB-0.1.3$ ant
Buildfile: /home/hdfs/guopeng/YCSB-0.1.3/build.xml

compile:
javac /home/hdfs/guopeng/YCSB-0.1.3/build.xml:50: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds

makejar:

BUILD SUCCESSFUL
Total time: 0 seconds
hdfs@hd0004-sw1 YCSB-0.1.3$

由于YCSB自带的HBase客户端代码有一些兼容性的问题,所以我们使用如下的代码替换YCSB中自带的文件(db/hbase/src/com/yahoo/ycsb/db/HBaseClient.java):

package com.yahoo.ycsb.db;

import java.io.IOException;
import java.util.ConcurrentModificationException;
import java.util.HashMap;
import java.util.Map;
import java.util.Set;
import java.util.Vector;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.Delete;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.util.Bytes;

import com.yahoo.ycsb.DBException;

/**
* HBase client for YCSB framework
*
@see http://blog.data-works.org
*
@see http://gpcuster.cnblogs.com/
*/
public class HBaseClient extends com.yahoo.ycsb.DB {
private static final Configuration config = new Configuration();

static {
config.addResource(
"hbase-default.xml");
config.addResource(
"hbase-site.xml");
}

public boolean _debug = false;

public String _table = "";
public HTable _hTable = null;
public String _columnFamily = "";
public byte _columnFamilyBytes[];

public static final int Ok = 0;
public static final int ServerError = -1;
public static final int HttpError = -2;
public static final int NoMatchingRecord = -3;

public static final Object tableLock = new Object();

/**
* Initialize any state for this DB. Called once per DB instance; there is
* one DB instance per client thread.
*/
public void init() throws DBException {
if ((getProperties().getProperty("debug") != null)
&& (getProperties().getProperty("debug").compareTo("true") == 0)) {
_debug
= true;
}

_columnFamily
= getProperties().getProperty("columnfamily");
if (_columnFamily == null) {
System.err
.println(
"Error, must specify a columnfamily for HBase table");
throw new DBException("No columnfamily specified");
}
_columnFamilyBytes
= Bytes.toBytes(_columnFamily);

// read hbase client settings.
for (Object key : getProperties().keySet()) {
String pKey
= key.toString();
if (pKey.startsWith("hbase.")) {
String pValue
= getProperties().getProperty(pKey);
if (pValue != null) {
config.set(pKey, pValue);
}
}
}
}

/**
* Cleanup any state for this DB. Called once per DB instance; there is one
* DB instance per client thread.
*/
public void cleanup() throws DBException {
try {
if (_hTable != null) {
_hTable.flushCommits();
}
}
catch (IOException e) {
throw new DBException(e);
}
}

public void getHTable(String table) throws IOException {
synchronized (tableLock) {
_hTable
= new HTable(config, table);
}

}

/**
* Read a record from the database. Each field/value pair from the result
* will be stored in a HashMap.
*
*
@param table
* The name of the table
*
@param key
* The record key of the record to read.
*
@param fields
* The list of fields to read, or null for all of them
*
@param result
* A HashMap of field/value pairs for the result
*
@return Zero on success, a non-zero error code on error
*/
public int read(String table, String key, Set<String> fields,
HashMap
<String, String> result) {
// if this is a "new" table, init HTable object. Else, use existing one
if (!_table.equals(table)) {
_hTable
= null;
try {
getHTable(table);
_table
= table;
}
catch (IOException e) {
System.err.println(
"Error accessing HBase table: " + e);
return ServerError;
}
}

Result r
= null;
try {
if (_debug) {
System.out.println(
"Doing read from HBase columnfamily "
+ _columnFamily);
System.out.println(
"Doing read for key: " + key);
}
Get g
= new Get(Bytes.toBytes(key));
if (fields == null) {
g.addFamily(_columnFamilyBytes);
}
else {
for (String field : fields) {
g.addColumn(_columnFamilyBytes, Bytes.toBytes(field));
}
}
r
= _hTable.get(g);
}
catch (IOException e) {
System.err.println(
"Error doing get: " + e);
return ServerError;
}
catch (ConcurrentModificationException e) {
// do nothing for now...need to understand HBase concurrency model
// better
return ServerError;
}

for (KeyValue kv : r.raw()) {
result.put(Bytes.toString(kv.getQualifier()),
Bytes.toString(kv.getValue()));
if (_debug) {
System.out.println(
"Result for field: "
+ Bytes.toString(kv.getQualifier()) + " is: "
+ Bytes.toString(kv.getValue()));
}

}
return Ok;
}

/**
* Perform a range scan for a set of records in the database. Each
* field/value pair from the result will be stored in a HashMap.
*
*
@param table
* The name of the table
*
@param startkey
* The record key of the first record to read.
*
@param recordcount
* The number of records to read
*
@param fields
* The list of fields to read, or null for all of them
*
@param result
* A Vector of HashMaps, where each HashMap is a set field/value
* pairs for one record
*
@return Zero on success, a non-zero error code on error
*/
public int scan(String table, String startkey, int recordcount,
Set
<String> fields, Vector<HashMap<String, String>> result) {
// if this is a "new" table, init HTable object. Else, use existing one
if (!_table.equals(table)) {
_hTable
= null;
try {
getHTable(table);
_table
= table;
}
catch (IOException e) {
System.err.println(
"Error accessing HBase table: " + e);
return ServerError;
}
}

Scan s
= new Scan(Bytes.toBytes(startkey));
// HBase has no record limit. Here, assume recordcount is small enough
// to bring back in one call.
// We get back recordcount records
s.setCaching(recordcount);

// add specified fields or else all fields
if (fields == null) {
s.addFamily(_columnFamilyBytes);
}
else {
for (String field : fields) {
s.addColumn(_columnFamilyBytes, Bytes.toBytes(field));
}
}

// get results
ResultScanner scanner = null;
try {
scanner
= _hTable.getScanner(s);
int numResults = 0;
for (Result rr = scanner.next(); rr != null; rr = scanner.next()) {
// get row key
String key = Bytes.toString(rr.getRow());
if (_debug) {
System.out.println(
"Got scan result for key: " + key);
}

HashMap
<String, String> rowResult = new HashMap<String, String>();

for (KeyValue kv : rr.raw()) {
rowResult.put(Bytes.toString(kv.getQualifier()),
Bytes.toString(kv.getValue()));
}
// add rowResult to result vector
result.add(rowResult);
numResults
++;
if (numResults >= recordcount) // if hit recordcount, bail out
{
break;
}
}
// done with row

}

catch (IOException e) {
if (_debug) {
System.out
.println(
"Error in getting/parsing scan result: " + e);
}
return ServerError;
}

finally {
scanner.close();
}

return Ok;
}

/**
* Update a record in the database. Any field/value pairs in the specified
* values HashMap will be written into the record with the specified record
* key, overwriting any existing values with the same field name.
*
*
@param table
* The name of the table
*
@param key
* The record key of the record to write
*
@param values
* A HashMap of field/value pairs to update in the record
*
@return Zero on success, a non-zero error code on error
*/
public int update(String table, String key, HashMap<String, String> values) {
// if this is a "new" table, init HTable object. Else, use existing one
if (!_table.equals(table)) {
_hTable
= null;
try {
getHTable(table);
_table
= table;
}
catch (IOException e) {
System.err.println(
"Error accessing HBase table: " + e);
return ServerError;
}
}

if (_debug) {
System.out.println(
"Setting up put for key: " + key);
}
Put p
= new Put(Bytes.toBytes(key));
for (Map.Entry<String, String> entry : values.entrySet()) {
if (_debug) {
System.out.println(
"Adding field/value " + entry.getKey() + "/"
+ entry.getValue() + " to put request");
}
p.add(_columnFamilyBytes, Bytes.toBytes(entry.getKey()),
Bytes.toBytes(entry.getValue()));
}

try {
_hTable.put(p);
}
catch (IOException e) {
if (_debug) {
System.err.println(
"Error doing put: " + e);
}
return ServerError;
}
catch (ConcurrentModificationException e) {
// do nothing for now...hope this is rare
return ServerError;
}

return Ok;
}

/**
* Insert a record in the database. Any field/value pairs in the specified
* values HashMap will be written into the record with the specified record
* key.
*
*
@param table
* The name of the table
*
@param key
* The record key of the record to insert.
*
@param values
* A HashMap of field/value pairs to insert in the record
*
@return Zero on success, a non-zero error code on error
*/
public int insert(String table, String key, HashMap<String, String> values) {
return update(table, key, values);
}

/**
* Delete a record from the database.
*
*
@param table
* The name of the table
*
@param key
* The record key of the record to delete.
*
@return Zero on success, a non-zero error code on error
*/
public int delete(String table, String key) {
// if this is a "new" table, init HTable object. Else, use existing one
if (!_table.equals(table)) {
_hTable
= null;
try {
getHTable(table);
_table
= table;
}
catch (IOException e) {
System.err.println(
"Error accessing HBase table: " + e);
return ServerError;
}
}

if (_debug) {
System.out.println(
"Doing delete for key: " + key);
}

Delete d
= new Delete(Bytes.toBytes(key));
try {
_hTable.delete(d);
}
catch (IOException e) {
if (_debug) {
System.err.println(
"Error doing delete: " + e);
}
return ServerError;
}

return Ok;
}
}

修改后的HBase客户端可以直接在命令行中指定测试需要使用的客户端参数,如zk的连接信息:-p hbase.zookeeper.quorum=hd0004-sw1.dc.sh-wgq.sdo.com,hd0001-sw1.dc.sh- wgq.sdo.com,客户端的本地缓存大小:-p hbase.client.write.buffer=100,等等。

然后拷贝编译和使用依赖的Jar包和配置信息。

[hdfs@hd0004-sw1 YCSB-0.1.3]$ cp ~/hbase-current/*.jar ~/hbase-current/lib/*.jar ~/hbase-current/conf/hbase-*.xml db/hbase/lib/
[hdfs@hd0004-sw1 YCSB-0.1.3]$ 

现在就可以编译HBase的客户端:

[hdfs@hd0004-sw1 YCSB-0.1.3]$ ant dbcompile-hbase
Buildfile: /home/hdfs/guopeng/YCSB-0.1.3/build.xml

compile:
    [javac] /home/hdfs/guopeng/YCSB-0.1.3/build.xml:50: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds

makejar:

dbcompile-hbase:

dbcompile:
    [javac] /home/hdfs/guopeng/YCSB-0.1.3/build.xml:63: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds

makejar:

BUILD SUCCESSFUL
Total time: 0 seconds
[hdfs@hd0004-sw1 YCSB-0.1.3]$ 

最后要建立测试用的HBase表(usertable):

hbase(main):004:0> create 'usertable', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'}
0 row(s) in 1.2940 seconds

这样环境就准备好了。

然后使用下面的命令就可以开始导入需要测试的数据了:

java -cp build/ycsb.jar:db/hbase/lib/* com.yahoo.ycsb.Client -load -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloada -p columnfamily=f1 -p recordcount=1000000 -p hbase.zookeeper.quorum=hd0004-sw1.dc.sh-wgq.sdo.com,hd0001-sw1.dc.sh-wgq.sdo.com,hd0003-sw1.dc.sh-wgq.sdo.com,hd0149-sw18.dc.sh-wgq.sdo.com,hd0165-sw13.dc.sh-wgq.sdo.com -s

你可能感兴趣的:(使用YCSB对HBase进行测试)