There are many new serving databases available, including:

  • PNUTS
  • BigTable
  • HBase
  • Hypertable
  • Azure
  • Cassandra
  • CouchDB
  • Voldemort
  • MongoDb
  • Dynomite
    …and many others

It is difficult to decide which system is right for your application, partially because the features differ between systems, and partially because there is not an easy way to compare the performance of one system versus another.


The goal of the YCSB project is to develop a framework and common set of workloads for evaluating the performance of different “key-value” and “cloud” serving stores. The project comprises two things:

  • The YCSB Client, an extensible workload generator
  • The Core workloads, a set of workload scenarios to be executed by the generator

Although the core workloads provide a well rounded picture of a system’s performance, the Client is extensible so that you can define new and different workloads to examine system aspects, or application scenarios, not adequately covered by the core workload. Similarly, the Client is extensible to support benchmarking different databases. Although we include sample code for benchmarking HBase, Cassandra and MongoDB, it is straightforward to write a new interface layer to benchmark your favorite database.


A common use of the tool is to benchmark multiple systems and compare them. For example, you can install multiple systems on the same hardware configuration, and run the same workloads against each system. Then you can plot the performance of each system (for example, as latency versus throughput curves) to see when one system does better than another.

 

 

文章来源: http://blog.lars-francke.de/2010/08/16/performance-testing-hbase-using-ycsb/

 

 

I assume most of you know what HBase is but just in case here is a snippet from Wikipedia :

HBase is an open source, non-relational, distributed database modeled after Google’s BigTable and is written in Java.

Yahoo has published a paper and the accompanying tool (YCSB) about Benchmarking Cloud Serving Systems with YCSB . At the moment I am not interested in comparing different database systems against each other but instead to only benchmark HBase. This is useful to test custom patches and their performance impact or to test different configuration options.

No matter which kind of workload you choose however keep in mind that this is an artificial benchmark and it can’t replace a test with your real data and load.

In this short blog post I’m going to outline how to get YCSB running against a current version of HBase. I’m going to show this on a single machine. In a real test setup you should of course be running YCSB on a different machine (or multiple machines ) than your HBase cluster. A YCSB benchmark consists of two phases: a load and a transaction phase. The load phase measures various statistics while importing a bunch of data into the database while the transaction phase does just that, i.e. transactions on the data. There are multiple predefined workloads that mimic typical database usage scenarios and you can also define your own.

Requirements/Setup

I am using a clean Ubuntu 10.04 installation but this should work on other distributions just as well.

While you’ll probably run it against an already set up cluster I will be using HBase in standalone mode here in its second development release of 0.89.

For YSCB I’ve used the latest version checked out from Github but the latest released version (0.1.2 at the time of this writing) should work equally well. So do this:

$ sudo apt-get -y install ant openjdk-6-jdk git-core $ export JAVA_HOME= /usr/lib/jvm/java-6-openjdk/ $ wget http: //apache .easy-webs.de /hbase/hbase-0 .89.20100726 /hbase-0 .89.20100726-bin. tar .gz $ tar xvzf hbase-0.89.20100726-bin. tar .gz $ hbase-0.89.20100726 /bin/start-hbase .sh $ hbase-0.89.20100726 /bin/hbase shell create 'usertable' , 'family' exit $ git clone http: //github .com /brianfrankcooper/YCSB .git $ cp hbase-0.89.20100726 /lib/ * YCSB /db/hbase/lib $ cd YCSB $ ant $ ant dbcompile-hbase

 

   

As you can see YCSB requires a table called usertable in HBase and it has to contain one column family with an arbitrary name (i.e. family in my case). YCSB also needs all the libraries (jars) that the HBase client needs to run. The easiest is to just copy everything from HBase’s lib directory to the appropriate directory in YCSB.

Running YCSB

At this point we should have HBase running somewhere and YCSB and its HBase driver compiled. Time to load some data into HBase.

1
java - cp build /ycsb .jar:db /hbase/lib/ * com.yahoo.ycsb.Client -load -db com.yahoo.ycsb.db.HBaseClient -P workloads /workloada -p columnfamily=family -p recordcount=1000 -s > load.dat

A few things to note here:

  • This loads only 1000 records into HBase. You will want to increase the number to 100 million or more on a real test.
  • The documentation is pretty good so make sure to read it should you have problems.
  • The documentation suggests not specifying properties (like recordcount) on the command line but in a property file instead. You’ll find instructions on how to do this on the aforementioned page.
  • The -s parameter causes YCSB to print status messages to System.err every ten seconds, remove it if you don’t want them.
  • After the load operation has finished you can find statistics in the load.dat file

Now we’ll run the transactions part of the workload (again, for explanations see the documentation of YCSB):

1
java - cp build /ycsb .jar:db /hbase/lib/ * com.yahoo.ycsb.Client -t -db com.yahoo.ycsb.db.HBaseClient -P workloads /workloada -p columnfamily=family -p operationcount=1000000 -s -threads 10 -target 100 > transactions.dat

or

1
java - cp build /ycsb .jar:db /hbase/lib/ * com.yahoo.ycsb.Client -t -db com.yahoo.ycsb.db.HBaseClient -P workloads /workloada -p columnfamily=family -p operationcount=1000000 -s -threads 10 -target 100 -p measurementtype=timeseries -p timeseries.granularity=2000 > transactions.dat

After each run you should inspect the transactions.dat file. For explanations I’ll once again refer to the documentation. We’ve used workloada in these examples but there are in fact multiple predefined workloads (which are listed and explained in the documentation ).

That’s it. As you can see YCSB is pretty easy to set up. I still hope this guide was helpful in getting started with it. Let me know if you have any questions.

 

So you have a HBase cluster running somewhere and now you’re trying to run YCSB from another machine but it doesn’t work because it can’t connect to ZooKeeper?

If so try to copy your hbase-site.xml config from your cluster in the classpath of YCSB and try again.

Copy your hbase-site.xml with all the configuration options to the db/hbase/conf directory and add it to your classpath like this: java -cp "build/ycsb.jar:db/hbase/lib/*:db/hbase/conf/" ...

 

更多信息参考:

Getting Started

https://github.com/brianfrankcooper/YCSB/wiki/Getting-Started

Running a Workload:

https://github.com/brianfrankcooper/YCSB/wiki/Running-a-Workload