Bulkload找不到分区文件 IllegalArgumentException: Can't read partitions file

问题

使用bulkload方式导入数据到hbase时,mapper任务可能会发生找不到分区文件错误

错误信息如下:

Error: java.lang.IllegalArgumentException: Can't read partitions file at      org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:707) at 


原因分析

嫌啰嗦的可以直接跳到最后 解决方法总结 。

bulkload都会使用HFileOutputFormat2作为OutputFormat,配置代码如下:

HFileOutputFormat2.configureIncrementalLoad(job, table, table.getRegionLocator());

通过阅读源码,发现该方法会自动配置reduce任务,并且根据RegionLocator获取表的region边界,从而自动生成分区文件并写入到hdfs上。

在client程序启动时,写入分区文件日志如下:

19/07/19 14:36:20 INFO mapreduce.HFileOutputFormat2: Writing partition information to /user/vsearch/offline/bangxi/tmp/partitions_66430514-f3de-488b-84fa-967bee9a5cc7

经过观察发现,当client提交了mapreduce job后立即退出,mapper任务就会报错,而hdfs上的/user/vsearch/offline/bangxi/tmp/partitions_66430514-f3de-488b-84fa-967bee9a5cc7文件也确实不见了。所以怀疑文件是在client退出后被删除的。

 

再次阅读HFileOutputFormat2.configureIncrementalLoad(job, table, table.getRegionLocator())源码,发现其中一行代码fs.deleteOnExit(partitionsPath)可能是删除文件的根源。

static void configurePartitioner(Job job, List splitPoints)
      throws IOException {
    Configuration conf = job.getConfiguration();
    // create the partitions file
    FileSystem fs = FileSystem.get(conf);
    String hbaseTmpFsDir =
        conf.get(HConstants.TEMPORARY_FS_DIRECTORY_KEY,
          HConstants.DEFAULT_TEMPORARY_HDFS_DIRECTORY);
    Path partitionsPath = new Path(hbaseTmpFsDir, "partitions_" + UUID.randomUUID());
    fs.makeQualified(partitionsPath);
    writePartitions(conf, partitionsPath, splitPoints);
    //就是这行代码
    fs.deleteOnExit(partitionsPath);

    // configure job to use it
    job.setPartitionerClass(TotalOrderPartitioner.class);
    TotalOrderPartitioner.setPartitionFile(conf, partitionsPath);
  }

于是添加了取消删除文件的逻辑

HFileOutputFormat2.configureIncrementalLoad(job, table, table.getRegionLocator());
//获取文件系统对象
FileSystem fs = FileSystem.get(config);
//获取分区文件路径
Path partitionsPath = new Path(config.get(PARTITIONER_PATH, DEFAULT_PATH));
fs.makeQualified(partitionsPath);
//取消退出删除文件任务
fs.cancelDeleteOnExit(partitionsPath);

打包上传,mapreduce job提交后,立马退出client,mapper没有报找不到分区文件错误,大功告成。


解决方法总结

添加一行代码、一个方法(看注释)。

public static void main(String[] args) throws Exception {
        Configuration config = ConfigUtils.getConfig();
        config.setFloat(COMPLETED_MAPS_FOR_REDUCE_SLOWSTART,0.98f );
        new GenericOptionsParser(config, args);

        Job job = Job.getInstance(config, "bulkLoad il 2 hbase");
        job.setJarByClass(BulkLoadIl2HBase.class);
        job.setMapperClass(MyMapper.class);
        job.setMapOutputKeyClass(ImmutableBytesWritable.class);
        job.setMapOutputValueClass(Put.class);
        FileInputFormat.setInputPaths((JobConf)job.getConfiguration(), "/user/vsearch/offline/bangxi/il/");
        String hFilePath = "/user/vsearch/offline/bangxi/bulkload/v9";
        FileOutputFormat.setOutputPath(job, new Path(hFilePath));

        HTable table = (HTable) HBaseUtils.getInstance(ConfigUtils.getConfig()).getTable(HbaseTable.LINK_TABLE_V9);
        HFileOutputFormat2.configureIncrementalLoad(job, table, table.getRegionLocator());
        //添加这一行代码
        BulkLoadUtil.cancelDeleteOnExit(job);

        boolean success = job.waitForCompletion(true);
        if (success){
            BulkLoadUtil.doBulkLoad(hFilePath, HbaseTable.LINK_TABLE_V9, args);
        } else {
            System.exit(1);
        }
    }
public class BulkLoadUtil {

    public static void doBulkLoad(String hFilePath, String tableName, String[] args) throws Exception {
        Configuration config = ConfigUtils.getConfig();
        new GenericOptionsParser(config, args);
        LoadIncrementalHFiles loadFiles = new LoadIncrementalHFiles(config);
        loadFiles.doBulkLoad(new Path(hFilePath), (HTable) ConnectionFactory.createConnection(config).getTable(TableName.valueOf(tableName)));
    }


    //添加这个方法
    public static void cancelDeleteOnExit(Job job) throws IOException {
        Configuration conf = job.getConfiguration();
        FileSystem fs = FileSystem.get(conf);
        String partitionsFile = conf.get(PARTITIONER_PATH, DEFAULT_PATH);
        Path partitionsPath = new Path(partitionsFile);
        fs.makeQualified(partitionsPath);
        fs.cancelDeleteOnExit(partitionsPath);
    }

}

 

 

 

你可能感兴趣的:(hbase,hadoop,大数据)