批量生成HFile格式方法

最近一直在搞批量生成HFile格式的代码,使用了好多方法,具体如下:


方法一、KeyValue生成,代码大致如下:

KeyValue kev = new KeyValue(Bytes.toBytes(row.toString()), Bytes
                .toBytes("info"), Bytes.toBytes(tableField), Bytes.toBytes(value));

但是总是报错,如下:

Added a key not lexically larger than previous

网上找了些解决方法,都是治标不治本,无结果。



方法二、Put生成,代码大致如下:

Put put = new Put(Bytes.toBytes(row));

put.add(Bytes.toBytes("info"), Bytes.toBytes(tableField), Bytes.toBytes(value));


但是总是报错,如下:

12/05/29 09:35:00 INFO mapred.JobClient:Task Id : attempt_201205181722_0988_r_000000_0, Status : FAILED
org.apache.hadoop.hbase.ZooKeeperConnectionException:HBase is able to connect to ZooKeeper but the connection closes immediately.This could be a sign that the server has too many connections (30 is thedefault). Consider inspecting your ZK server logs for that error and then makesure you are reusing HBaseConfiguration as often as you can. See HTable'sjavadoc for more information.

         atorg.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:160)

         atorg.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1209)

         atorg.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:511)

         atorg.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.(HConnectionManager.java:502)

         atorg.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:172)

         atorg.apache.hadoop.hbase.client.HTable.(HTable.java:175)

同样网上找了些解决方法,同样都是治标不治本,同样无结果。

方法三:参考hbase批量导入类src\org\apache\hadoop\hbase\mapreduce\ImportTsv.java代码,代码生成关键大致如下:

                  KeyValue kv = new KeyValue(lineBytes, parsed
                            .getRowKeyOffset(), parsed.getRowKeyLength(),
                            parser.getFamily(i), 0, parser.getFamily(i).length,
                            parser.getQualifier(i), 0,
                            parser.getQualifier(i).length, ts,
                            KeyValue.Type.Put, lineBytes, parsed
                                    .getColumnOffset(i), parsed
                                    .getColumnLength(i));

折腾了一上午,因为要和业务相关联,所以改了好多代码。搞定!


写的比较简单,但是过程相当痛苦!

希望大家拍砖!呵呵。

希望和各位有云经验的一起进步。


你可能感兴趣的:(云计算)