HBase数据导入-Bulkload问题

使用bulkload导入,执行以下命令

hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /user/pingz/data/$logDate $tableName

最终执行速度特别慢,300G数据需要半个小时左右,原因是生成的hfile数据没有直接move,而是采用了copy的方式,主要跟集群是federation方式有关。

具体 参照强哥提交的issues https://issues.apache.org/jira/browse/HBASE-17429?jql=project%20%3D%20HBASE%20AND%20text%20~%20viewfs

最终调用的是HRegionFileSystem的bulkLoadStoreFile方法

Path bulkLoadStoreFile(final String familyName, Path srcPath, long seqNum)
      throws IOException {
    // Copy the file if it's on another filesystem
    FileSystem srcFs = srcPath.getFileSystem(conf);
    FileSystem desFs = fs instanceof HFileSystem ? ((HFileSystem)fs).getBackingFs() : fs;

    // We can't compare FileSystem instances as equals() includes UGI instance
    // as part of the comparison and won't work when doing SecureBulkLoad
    // TODO deal with viewFS
    if (!FSHDFSUtils.isSameHdfs(conf, srcFs, desFs)) {
      LOG.info("Bulk-load file " + srcPath + " is on different filesystem than " +
          "the destination store. Copying file over to destination filesystem.");
      Path tmpPath = createTempName();
      FileUtil.copy(srcFs, srcPath, fs, tmpPath, false, conf);
      LOG.info("Copied " + srcPath + " to temporary path on destination filesystem: " + tmpPath);
      srcPath = tmpPath;
    }
return commitStoreFile(familyName, srcPath, seqNum, true);
  }

数据生成的结果目录是放在/user目录下,该目录挂载到XXNameNode2,而HBase目录是挂载到XXNameNode,

具体解决办法一个是升级HBase或者打一个patch,或者是新建一个目录,挂载到和HBase同一个namespace下

    
       fs.viewfs.mounttable.XX.link./user
       hdfs://XXNameNode2:8020/user
    

    
       fs.viewfs.mounttable.XX.link./hbase
       hdfs://XXNameNode:8020/hbase
    

    
       fs.viewfs.mounttable.XX.link./bulkload
       hdfs://XXNameNode:8020/bulkload
    

同时修改脚本

hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles hdfs://XXNameNode:8020/bulkload/appdata/$logDate $tableName

参考1

你可能感兴趣的:(HBase数据导入-Bulkload问题)