问题
使用bulkload方式导入数据到hbase时,mapper任务可能会发生找不到分区文件错误
错误信息如下:
Error: java.lang.IllegalArgumentException: Can't read partitions file at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.
(MapTask.java:707) at
原因分析
嫌啰嗦的可以直接跳到最后 解决方法总结 。
bulkload都会使用HFileOutputFormat2作为OutputFormat,配置代码如下:
HFileOutputFormat2.configureIncrementalLoad(job, table, table.getRegionLocator());
通过阅读源码,发现该方法会自动配置reduce任务,并且根据RegionLocator获取表的region边界,从而自动生成分区文件并写入到hdfs上。
在client程序启动时,写入分区文件日志如下:
19/07/19 14:36:20 INFO mapreduce.HFileOutputFormat2: Writing partition information to /user/vsearch/offline/bangxi/tmp/partitions_66430514-f3de-488b-84fa-967bee9a5cc7
经过观察发现,当client提交了mapreduce job后立即退出,mapper任务就会报错,而hdfs上的/user/vsearch/offline/bangxi/tmp/partitions_66430514-f3de-488b-84fa-967bee9a5cc7文件也确实不见了。所以怀疑文件是在client退出后被删除的。
再次阅读HFileOutputFormat2.configureIncrementalLoad(job, table, table.getRegionLocator())源码,发现其中一行代码fs.deleteOnExit(partitionsPath)可能是删除文件的根源。
static void configurePartitioner(Job job, List splitPoints)
throws IOException {
Configuration conf = job.getConfiguration();
// create the partitions file
FileSystem fs = FileSystem.get(conf);
String hbaseTmpFsDir =
conf.get(HConstants.TEMPORARY_FS_DIRECTORY_KEY,
HConstants.DEFAULT_TEMPORARY_HDFS_DIRECTORY);
Path partitionsPath = new Path(hbaseTmpFsDir, "partitions_" + UUID.randomUUID());
fs.makeQualified(partitionsPath);
writePartitions(conf, partitionsPath, splitPoints);
//就是这行代码
fs.deleteOnExit(partitionsPath);
// configure job to use it
job.setPartitionerClass(TotalOrderPartitioner.class);
TotalOrderPartitioner.setPartitionFile(conf, partitionsPath);
}
于是添加了取消删除文件的逻辑
HFileOutputFormat2.configureIncrementalLoad(job, table, table.getRegionLocator());
//获取文件系统对象
FileSystem fs = FileSystem.get(config);
//获取分区文件路径
Path partitionsPath = new Path(config.get(PARTITIONER_PATH, DEFAULT_PATH));
fs.makeQualified(partitionsPath);
//取消退出删除文件任务
fs.cancelDeleteOnExit(partitionsPath);
打包上传,mapreduce job提交后,立马退出client,mapper没有报找不到分区文件错误,大功告成。
解决方法总结
添加一行代码、一个方法(看注释)。
public static void main(String[] args) throws Exception {
Configuration config = ConfigUtils.getConfig();
config.setFloat(COMPLETED_MAPS_FOR_REDUCE_SLOWSTART,0.98f );
new GenericOptionsParser(config, args);
Job job = Job.getInstance(config, "bulkLoad il 2 hbase");
job.setJarByClass(BulkLoadIl2HBase.class);
job.setMapperClass(MyMapper.class);
job.setMapOutputKeyClass(ImmutableBytesWritable.class);
job.setMapOutputValueClass(Put.class);
FileInputFormat.setInputPaths((JobConf)job.getConfiguration(), "/user/vsearch/offline/bangxi/il/");
String hFilePath = "/user/vsearch/offline/bangxi/bulkload/v9";
FileOutputFormat.setOutputPath(job, new Path(hFilePath));
HTable table = (HTable) HBaseUtils.getInstance(ConfigUtils.getConfig()).getTable(HbaseTable.LINK_TABLE_V9);
HFileOutputFormat2.configureIncrementalLoad(job, table, table.getRegionLocator());
//添加这一行代码
BulkLoadUtil.cancelDeleteOnExit(job);
boolean success = job.waitForCompletion(true);
if (success){
BulkLoadUtil.doBulkLoad(hFilePath, HbaseTable.LINK_TABLE_V9, args);
} else {
System.exit(1);
}
}
public class BulkLoadUtil {
public static void doBulkLoad(String hFilePath, String tableName, String[] args) throws Exception {
Configuration config = ConfigUtils.getConfig();
new GenericOptionsParser(config, args);
LoadIncrementalHFiles loadFiles = new LoadIncrementalHFiles(config);
loadFiles.doBulkLoad(new Path(hFilePath), (HTable) ConnectionFactory.createConnection(config).getTable(TableName.valueOf(tableName)));
}
//添加这个方法
public static void cancelDeleteOnExit(Job job) throws IOException {
Configuration conf = job.getConfiguration();
FileSystem fs = FileSystem.get(conf);
String partitionsFile = conf.get(PARTITIONER_PATH, DEFAULT_PATH);
Path partitionsPath = new Path(partitionsFile);
fs.makeQualified(partitionsPath);
fs.cancelDeleteOnExit(partitionsPath);
}
}