最近一个群友的boss让研究hbase,让hbase的入库速度达到5w+/s,这可愁死了,4台个人电脑组成的集群,多线程入库调了好久,速度也才1w左右,都没有达到理想的那种速度,然后就想到了这种方式,但是网上多是用mapreduce来实现入库,而现在的需求是实时入库,不生成文件了,所以就只能自己用代码实现了,但是网上查了很多资料都没有查到,最后在一个网友的指引下,看了源码,最后找到了生成Hfile的方式,实现了之后,发现单线程入库速度才达到1w4左右,和之前的多线程的全速差不多了,百思不得其解之时,调整了一下代码把列的Byte.toBytes(cols)这个方法调整出来只做一次,速度立马就到3w了,提升非常明显,这是我的电脑上的速度,估计在它的集群上能更快一点吧,下面把代码和大家分享一下。
String tableName = "taglog" [] family = Bytes.toBytes("logs" Configuration conf = conf.set("hbase.master", "192.168.1.133:60000" conf.set("hbase.zookeeper.quorum", "192.168.1.135" conf.set("hbase.metrics.showTableName", "false" String outputdir = "hdfs://hadoop.Master:8020/user/SEA/hfiles/" Path dir = Path familydir = FileSystem fs = BloomType bloomType = HFileDataBlockEncoder encoder = blockSize = 64000 Configuration tempConf = tempConf.set("hbase.metrics.showTableName", "false" tempConf.setFloat(HConstants.HFILE_BLOCK_CACHE_SIZE_KEY, 1.0f StoreFile.Writer writer = StoreFile.WriterBuilder(conf, start = DecimalFormat df = DecimalFormat("0000000" KeyValue kv1 = KeyValue kv2 = KeyValue kv3 = KeyValue kv4 = KeyValue kv5 = KeyValue kv6 = KeyValue kv7 = KeyValue kv8 = [] cn = Bytes.toBytes("cn" [] dt = Bytes.toBytes("dt" [] ic = Bytes.toBytes("ic" [] ifs = Bytes.toBytes("if" [] ip = Bytes.toBytes("ip" [] le = Bytes.toBytes("le" [] mn = Bytes.toBytes("mn" [] pi = Bytes.toBytes("pi" maxLength = 3000000 ( i=0;i<maxLength;i++ String currentTime = ""+System.currentTimeMillis() + current = kv1 = family, cn,current,KeyValue.Type.Put,Bytes.toBytes("3" kv2 = family, dt,current,KeyValue.Type.Put,Bytes.toBytes("6" kv3 = family, ic,current,KeyValue.Type.Put,Bytes.toBytes("8" kv4 = family, ifs,current,KeyValue.Type.Put,Bytes.toBytes("7" kv5 = family, ip,current,KeyValue.Type.Put,Bytes.toBytes("4" kv6 = family, le,current,KeyValue.Type.Put,Bytes.toBytes("2" kv7 = family, mn,current,KeyValue.Type.Put,Bytes.toBytes("5" kv8 = family,pi,current,KeyValue.Type.Put,Bytes.toBytes("1" HTable table = LoadIncrementalHFiles loader = loader.doBulkLoad(dir, table); 最后再附上查看hfile的方式,查询正确的hfile和自己生成的hfile,方便查找问题。