HBase Java API 开发:批量操作 第3关:批量导入数据至HBase

每一次只添加一个数据显然不像是大数据开发,在开发项目的时候也肯定会涉及到大量的数据操作。

使用Java进行批量数据操作,其实就是循环的在Put对象中添加数据最后在通过Table对象提交。

如何进行批量操作呢,讲到批量操作,相信大家肯定第一时间会想到循环?

没错,使用循环确实就可以添加多个数据了,示例:

Table tableStep3 = connection.getTable(tableStep3Name);
// 循环添加数据
byte[] row = Bytes.toBytes("20001");
Put put = new Put(row);
for (int i = 1; i <= 4; i++) {
byte[] columnFamily = Bytes.toBytes("data");
byte[] qualifier = Bytes.toBytes(String.valueOf(i));
byte[] value = Bytes.toBytes("value" + i);
put.addColumn(columnFamily, qualifier, value);
}

tableStep3.put(put);

代码执行结果:

可以发现,这一段代码向同一个行中添加了四列数据。

我们要添加多行数据应该如何处理呢,我猜你肯定想到了:使用集合!

List puts = new ArrayList<>();
// 循环添加数据
for (int i = 1; i <= 4; i++) {
byte[] row = Bytes.toBytes("row" + i);
Put put = new Put(row);
byte[] columnFamily = Bytes.toBytes("data");
byte[] qualifier = Bytes.toBytes(String.valueOf(i));
byte[] value = Bytes.toBytes("value" + i);
put.addColumn(columnFamily, qualifier, value);
puts.add(put);
}
Table table = connection.getTable(tableName);
table.put(puts);

上述代码向HBase中添加了四行数据,结合上次实训,可以发现table对象的put()方法是一个重载方法既可以接收Put对象也可以接收Put集合

添加完数据的表结构:

HBase Java API 开发:批量操作 第3关:批量导入数据至HBase_第1张图片

编程要求

好了,到你啦,在右侧编辑器begin-end中编写Java代码向HBasestu表(表需要自己创建)中添加数据如下:

表名 行键 列族:列
stu 20181122 basic_info:name 阿克蒙德
stu 20181122 basic_info:gender male
stu 20181122 basic_info:birthday 1987-05-23
stu 20181122 basic_info:connect tel:13974036666
stu 20181122 basic_info:address HuNan-ChangSha
stu 20181122 school_info:college ChengXing
stu 20181122 school_info:class class 1 grade 2
stu 20181122 school_info:object Software
stu 20181123 basic_info:name 萨格拉斯
stu 20181123 basic_info:gender male
stu 20181123 basic_info:birthday 1986-05-23
stu 20181123 basic_info:connect tel:18774036666
stu 20181123 basic_info:address HuNan-ChangSha
stu 20181123 school_info:college ChengXing
stu 20181123 school_info:class class 2 grade 2
stu 20181123 school_info:object Software

可以发现这里有两个列族,如何添加多个列族呢?

在我们之前讲到的建表中setColumnFamily(family)方法,这个方法是可以调用多次的。

package step3;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.TableDescriptors;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.ColumnFamilyDescriptor;
import org.apache.hadoop.hbase.client.ColumnFamilyDescriptorBuilder;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.client.TableDescriptor;
import org.apache.hadoop.hbase.client.TableDescriptorBuilder;
import org.apache.hadoop.hbase.util.Bytes;
public class Task {
 public void batchPut()throws Exception{
   /********* Begin *********/
   Configuration config = new Configuration();
   Connection conn = ConnectionFactory.createConnection(config);
   Admin admin = conn.getAdmin();
   // 建表
   TableName tableName = TableName.valueOf(Bytes.toBytes("stu"));
   TableDescriptorBuilder builder = TableDescriptorBuilder.newBuilder(tableName);
   ColumnFamilyDescriptor family = ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes("basic_info")).build();
   ColumnFamilyDescriptor family2 = ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes("school_info")).build();
   builder.setColumnFamily(family);
   builder.setColumnFamily(family2);
   admin.createTable(builder.build());
   List puts = new ArrayList<>();
   String[] rows = {"20181122","20181123"};
   String[][] basic_infos = {{"阿克蒙德","male","1987-05-23","tel:139********","HUNan-ChangSha"},{"萨格拉斯","male","1986-05-23","tel:187********","HUNan-ChangSha"}};
   String[] basic_colums = {"name","gender","birthday","connect","address"};
   String[][] school_infos = {{"ChengXing","class 1 grade 2","Software"},{"ChengXing","class 2 grade 2","Software"}};
   String[] school_colums = {"college","class","object"};
   for (int x = 0; x < rows.length; x++) {
     // 循环添加数据
     Put put = new Put(Bytes.toBytes(rows[x]));
     for (int i = 0; i < basic_infos.length; i++) {
       byte[] columFamily = Bytes.toBytes("basic_info");
       byte[] qualifier = Bytes.toBytes(basic_colums[i]);
       byte[] value = Bytes.toBytes(basic_infos[x][i]);
       put.addColumn(columFamily, qualifier, value);
     }
     for (int i = 0; i < school_infos.length; i++) {
       byte[] columFamily = Bytes.toBytes("school_info");
       byte[] qualifier = Bytes.toBytes(school_colums[i]);
       byte[] value = Bytes.toBytes(school_infos[x][i]);
       put.addColumn(columFamily, qualifier, value);
     }
     puts.add(put);
   }
   Table table = conn.getTable(tableName);
   table.put(puts);
   /********* End *********/
 }
}

你可能感兴趣的:(hbase,hadoop,Educoder,java,大数据,数据库)