使用MapReduce解析HDFS中的文件生成HFile文件导入HBase(三)

使用MapReduce生成HFile文件是导入大量数据到HBase的最快方法

总共分为两部分,生成HFile和导入到HBase

一、生成HFile

1.主程序ConvertToHFiles.java


public class ConvertToHFiles extends Configured implements Tool {

    private static final Log LOG = LogFactory.getLog(ConvertToHFiles.class);

    public static void main(String[] args) throws Exception {
        int res = ToolRunner.run(new Configuration(), new ConvertToHFiles(), args);
        System.exit(res);
    }

    @Override
    public int run(String[] args) throws Exception {
        try {
            Configuration conf = HBaseConfiguration.create();
            conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
            conf.set("fs.file.impl",org.apache.hadoop.fs.LocalFileSystem.class.getName());

            String inputPath = args[0];
            String outputPath = args[1];
            final TableName tableName = TableName.valueOf(args[2]);

            //create hbase connection
            Connection connection = ConnectionFactory.createConnection(conf);
            Table table = connection.getTable(tableName);

            //create job
            Job job = Job.getInstance(conf, "ConvertToHFiles: Convert File to HFiles");
            job.setInputFormatClass(TextInputFormat.class);
            job.setJarByClass(ConvertToHFiles.class); 

            job.setMapperClass(ConvertToHFilesMapper.class);
            job.setMapOutputKeyClass(ImmutableBytesWritable.class);
            job.setMapOutputValueClass(KeyValue.class);

            HFileOutputFormat2.configureIncrementalLoad(job, table, connection.getRegionLocator(tableName));

            FileInputFormat.setInputPaths(job, inputPath);
            HFileOutputFormat2.setOutputPath(job, new Path(outputPath));

            if (!job.waitForCompletion(true)) {
                LOG.error("Failure");
            } else {
                LOG.info("Success");
                return 0;
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
        return 1;
    }
}

2.Mapper端 ConvertToHFilesMapper.java

public class ConvertToHFilesMapper extends Mapper<LongWritable, Text, ImmutableBytesWritable, Cell> {

    public static final byte[] CF = Bytes.toBytes("f");
    public static final ImmutableBytesWritable rowKey = new ImmutableBytesWritable();
    static ArrayList<byte[]> qualifiers = new ArrayList<>();

    @Override
    protected void setup(Context context) throws IOException, InterruptedException {
        super.setup(context);
        context.getCounter("Convert", "mapper").increment(1);

        //列的字段,这里是三列
        byte[] name = Bytes.toBytes("name");
        byte[] xxx = Bytes.toBytes("xxx");
        byte[] score = Bytes.toBytes("score");
        qualifiers.add(name);
        qualifiers.add(xxx);
        qualifiers.add(score);
    }

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

        //字段以逗号分割
        String[] line = value.toString().split(",");

        byte[] rowKeyBytes = DigestUtils.md5Hex(line[0]).getBytes();
        rowKey.set(rowKeyBytes);

        context.getCounter("Convert", line[2]).increment(1);

        for (int i = 0; i < line.length - 1; i++) {
            KeyValue kv = new KeyValue(rowKeyBytes, CF, qualifiers.get(i), Bytes.toBytes(line[i + 1]));

            if (null != kv) {
                context.write(rowKey, kv);
            }
        }
    }
}

这样就会在out目录下生成_SUCCESS和对应columnFamily的文件夹,文件夹下就是HFile文件
使用MapReduce解析HDFS中的文件生成HFile文件导入HBase(三)_第1张图片
columnFamily的文件夹下的HFile文件:

使用MapReduce解析HDFS中的文件生成HFile文件导入HBase(三)_第2张图片

二、将生成的HFIle导入到HBase

public class HFile2HBase {

    public static void main(String[] args) {
        String table_name = args[0];
        String output_dir = args[1];
        //配置文件设置
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.zookeeper.quorum", "192.168.x.xx");
        conf.set("hbase.metrics.showTableName", "false");

        Path dir = new Path(output_dir);

        //把生成的HFile导入到hbase当中
        try {
            Connection conn = ConnectionFactory.createConnection(conf);
            // get table
            Table table = conn.getTable(TableName.valueOf(table_name));
            //get regionLocator
            RegionLocator regionLocator = conn.getRegionLocator(TableName.valueOf(table_name));

            LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf);
            //run bulkLoad
            loader.doBulkLoad(dir, new HBaseAdmin(conn), table, regionLocator);

        } catch (IOException e) {
            e.printStackTrace();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

大功告成,去hbase里查看就可以了~

同时遇到了一个问题,看了一些博客说HFile导入仅适合初次数据导入,即表内数据为空,或者每次入库表内都无数据的情况。但是我第二次导入了不同的HFIle文件到同一个表也导入成功了,数据也增加了,不知道怎么回事,难道hbase更新了,这个问题待解决。

你可能感兴趣的:(HBase)