Hadoop 2.6的新特性包含了Trace功能,一个类似于Google Dapper的分布式跟踪工具,为Hadoop系列应用提供请求跟踪和性能分析。虽然Hadoop 2.6中使用的还是pre-apache版本的HTrace,但是在2.7中支持了Apache version of HTrace。并且,HTrace也集成到了HBase中,HBase 1.0.0使用Apache 3.1.0 release。参考http://events.linuxfoundation.org/sites/events/files/slides/2015-03-05_apachecon2015__introducing_apache_htrace.pdf。
本文主要介绍如何在HDFS、HBase、HBaseClient(YCSB中的hbase客户端程序为例)中开启HTrace。
在MAVEN Repository中,htrace有多个地址,比如org.htrace(3.0.4版本最新),org.cloudera.htrace(2.05版本最新),org.apache.htrace(3.1.0, 3.2.0),我们采用org.apache.htrace(htrace-core-3.1.0-incubating.jar)。参考地址http://search.maven.org/#browse%7C-1954581700。
前提:Hadoop(hdfs和yarn都正常启动),Hbase(包括Zookeeper)正常启动。
一下的配置都要在所有Hadoop、HBase节点上进行。
1. Hadoop 2.7.0
下载htrace-core-3.1.0-incubating.jar到hdfs库目录中。
库目录:hadoop-2.7.0/share/hadoop/hdfs/lib/
在hadoop-2.7.0/etc/hadoop/core-site.xml中添加hadoop.htrace配置项:
<property>
<name>hadoop.htrace.spanreceiver.classes</name>
<value>org.apache.htrace.impl.LocalFileSpanReceiver</value>
</property>
<property>
<name>hadoop.htrace.local-file-span-receiver.path</name>
<value>/var/log/hadoop/htrace.out</value>
</property>
重启hadoop。这时候,不出意外,/var/log/hadoop/中已经生成了htrace.out文件。执行hadoop fs -ls / 来测试hadoop输出的htrace日志,如下:
{"i":"f19bfdb594269132","s":"befdd7ab18bc944f","b":1434990777657,"e":1434990777700,"d":"org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo","r":"NameNode","p":["e65fa1aa1d4ea5f9"]}
{"i":"f19bfdb594269132","s":"e65fa1aa1d4ea5f9","b":1434990777462,"e":1434990777708,"d":"ClientNamenodeProtocol#getFileInfo","r":"FsShell","p":["3266f445372f0b7d"],"t":[{"t":1434990777481,"m":"IPC client connecting to centos6-1/10.10.10.20:8020"},{"t":1434990777507,"m":"IPC client connected to centos6-1/10.10.10.20:8020"}]}
{"i":"f19bfdb594269132","s":"3266f445372f0b7d","b":1434990777453,"e":1434990777793,"d":"getFileInfo","r":"FsShell","p":[],"n":{"path":"/"}}
{"i":"d36e237682b818fa","s":"72d96c8b2749892b","b":1434990777798,"e":1434990777804,"d":"org.apache.hadoop.hdfs.protocol.ClientProtocol.getListing","r":"NameNode","p":["8f48cc4ceefdabf0"]}
{"i":"d36e237682b818fa","s":"8f48cc4ceefdabf0","b":1434990777797,"e":1434990777804,"d":"ClientNamenodeProtocol#getListing","r":"FsShell","p":["c19f4b5baba59f34"]}
{"i":"d36e237682b818fa","s":"c19f4b5baba59f34","b":1434990777796,"e":1434990777809,"d":"listPaths","r":"FsShell","p":[],"n":{"path":"/"}}
说明hdfs中的htrace已经成功开启。
2. HBase-1.0.1.1
hbase-1.0.1.1/lib/目录下自带了htrace-core-3.1.0-incubating.jar。
http://apache-hbase.679495.n3.nabble.com/HTrace-td4056705.html中说到用hbase.trace.spanreceiver.classes来开启HBase中的Htrace。但是在这样配置重启hbase后遇到了错误:
错误点有两个,一个是hbase.local-file-span-receiver.path被遗弃了,使用hbase.htrace.local-file-span-receiver.path,另一个错误是SpanReceiverBuilder在org.cloudera.htrace中找不到。可能是 因为htrace-core版本不匹配的原因。所以,我们还是参考Hadoop中的版本和配置。
保持hbase-1.0.1.1/lib/htrace-core-3.1.0-incubating.jar不变,配置hbase-1.0.1.1/conf/hbase-site.xml:
<property>
<name>hbase.htrace.sampler</name>
<value>AlwaysSampler</value>
</property>
<property>
<name>hbase.trace.spanreceiver.classes</name>
<value>org.apache.htrace.impl.LocalFileSpanReceiver</value>
</property>
<property>
<name>hbase.htrace.local-file-span-receiver.path</name>
<value>/var/log/hbase/htrace.out</value>
</property>
启动hbase,成功输出/var/log/hbase/htrace.out,但是只是空文件。所以,我们还需要一个HBase客户端程序来测试,使用YCSB中的hbase benchmark程序即可。
3. YCSB
# git clone git://github.com/brainfrankcooper/YCSB.git
# cd YCSB
# vim hbase/pom.xml
<dependencies>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>1.0.1.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.5.1</version>
</dependency>
<dependency>
<groupId>org.apache.htrace</groupId>
<artifactId>htrace-core</artifactId>
<version>3.1.0-incubating</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.5.1</version>
</dependency>
<dependency>
<groupId>com.yahoo.ycsb</groupId>
<artifactId>core</artifactId>
<version>${project.version}</version>
</dependency>
</dependencies>
这些依赖包会从maven2仓库下载到~/.m2/repository/目录下。
然后修改hbase/src/main/java/com/yahoo/ycsb/db/HBaseClient.java和HBaseClient10.java,添加htrace包:
import org.apache.hadoop.hbase.trace.SpanReceiverHost; //hbase-common-1.0.1.1.jar
import org.apache.htrace.Trace; //htrace-core.jar
import org.apache.htrace.Sampler;
import org.apache.htrace.TraceScope;
修改public int read方法,添加红色标记的代码:
public int read(String table, String key, Set<String> fields, HashMap<String, ByteIterator> result)
{
SpanReceiverHost.getInstance(config);
TraceScope ts = Trace.startSpan("Read", Sampler.ALWAYS);
...
try {
...
Get g = new Get(Bytes.toBytes(key));
...
r = _hTable.get(g);
}
finally {
if (ts != null) ts.close();
}
}
编译YCSB:
# mvn clean package
YCSB测试Hbase参考http://blog.csdn.net/hustsselbj/article/details/46540377,这里直接测试。
# bin/ycsb run hbase -P workloads/workloada -cp $CLASSPATH -p table=usertable -p columnfamily=family
成功输出htrace日志:
{"i":"b5513fcf4efa6045","s":"f821e0cb5fad7a11","b":1435074896908,"e":1435074897129,"d":"RecoverableZookeeper.exists","r":"Client","p":["2f1a5e4d23cda084"]}
{"i":"b5513fcf4efa6045","s":"3a46953dc1de9d01","b":1435074897130,"e":1435074897132,"d":"RecoverableZookeeper.getData","r":"Client","p":["2f1a5e4d23cda084"]}
{"i":"b5513fcf4efa6045","s":"9bd89473eb3e5e22","b":1435074897876,"e":1435074897877,"d":"RecoverableZookeeper.getData","r":"Client","p":["2f1a5e4d23cda084"]}
{"i":"b5513fcf4efa6045","s":"119e73bde052c260","b":1435074897888,"e":1435074898077,"d":"hconnection-0x3a6460b2-shared--pool2-t1","r":"Client","p":["2f1a5e4d23cda084"]}
{"i":"b5513fcf4efa6045","s":"2f1a5e4d23cda084","b":1435074896265,"e":1435074898276,"d":"Read","r":"Client","p":[]}
{"i":"3c37f2097526ab67","s":"1a60cf8765bcd808","b":1435074898363,"e":1435074898363,"d":"Read","r":"Client","p":[]}
{"i":"82a0c0b41ab0af24","s":"bb9e4db8db9d1f59","b":1435074898367,"e":1435074898368,"d":"Read","r":"Client","p":[]}
我的YCSB仓库地址https://github.com/hustlbj/YCSB
附一些jar包中的类: