HDFS append File 出现AlreadyBeingCreatedException

前述

前段时间,公司刚开始往HDFS append File时,用的是hadoop-hdfs-WebHDFS API 进行的,运行了3个多月一切正常,当我对项目进行重构时,上线不到两周就出现了

AlreadyBeingCreatedException,append File一直被Forbidden,老大以为是我改代码的问题,开始问题没有确认,只好背着黑锅,硬着头皮看代码,找错误。后来发现HDFS

的append功能确实存在问题,其他公司也出现过这个问题。这样我总算松了口气,不多说,切入正题。

HDFS Append File 功能

这个功能也是最近才提出来的。设计文档,问题见JIRA Issue  

Further investigation revealed that the following sequence leads to AlreadyBeingCreatedException:
  1. LEASE_LIMIT=500; cluster.setLeasePeriod(LEASE_LIMIT, LEASE_LIMIT);
  2. thread A gets a lease on a file
  3. thread B sleeps 2*soft limit
  4. thread B tries to get lease on a file, triggers lease recovery and gets RecoveryInProgressException
  5. before lease recovery ends, namenode LeaseManager.java:checkLeases finds out that hard limit was also expired, start a new recovery, resets timeouts
  6. thread B tries to get lease again, timeout is not expired (it was reset in previous step) so it gets AlreadyBeingCreatedException
There are two problems in the code that lead to this:
  1. hard limit should not be set to such a low value, it makes it very likely for recovery to not finish before it's taken over by another recovery (because of expired hard limit)
  2. namenode should recognize that even though limit is not expired the recovery is ongoing and return RecoveryInProgressException instead of AlreadyBeingCreatedException (in FSNamesystem.java:startFileInternal, when it's deciding what to do if the file is under construction)


觉得项目中出现这个问题也是这个原因吧,虽然是单进程,但是我们append很频繁。 后来改为批量append,就没有出现这个问题了。

Python

    cmd = "hdfs dfs -stat {0}".format(hfile_path)
    lg.info("EXECUTE CMD:{0}".format(cmd))
    ret = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    out, err = ret.communicate()
    if err:
        lg.info("HDFS PATH:{0} IS NOT EXISTS. ERROR:{1}".format(hfile_path, err))
        cmd = "hdfs dfs -mkdir -p {0}".format(hfile_path)
        lg.info("EXECUTE CMD:{0}".format(cmd))
        ret = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
        out, err = ret.communicate()
        if err:
            lg.error("HDFS MKDIR ERROR: {0}".format(err))
            return ErrorCode.MKDIR_ERROR

    hfilename = "{bpath}/{key}/{ymd}/".format(bpath=hadoop_info['base_path'], ymd=ymd, key=key)
    # 当前用户必须为HDFS用户
    cmd = 'hdfs dfs -copyFromLocal {0} {1}'.format(file_path, hfilename)
    lg.info("EXECUTE CMD:{0}".format(cmd))
    p = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    out, err = p.communicate()
    if err:
        if err.find('File exists') != -1:
            lg.error("WRITE FILE DATA TO HADOOP SERVER FAILED,  FILE ALREADY EXISTS,  path:{name}, error:{e}.".format(name=file_path, e=err))
            return ErrorCode.FILE_EXISTS
        elif err.find('No such file or directory') != -1:
            lg.error("WRITE FILE DATA TO HADOOP SERVER FAILED,  NO SUCH FILE OR DIRECTORY, {name}, error:{e}.".format(name=file_path, e=err))
            return ErrorCode.NO_SUCH_FILE_OR_DIR
        else:
            lg.error("WRITE FILE DATA TO HADOOP SERVER FAILED:{name}, error :{e}".format(tp=topic, name=file_path, e=err))
        return False
    else:
        lg.info("WRITE FILE DATA TO HADOOP SERVER OK:path:{name}.".format(name=file_path))
        # 删除文件,下次写时在创建
        os.remove(file_path)
        return True

上述就是利用Python 写的部分代码。


参考文章

1. http://yanbohappy.sinaapp.com/?p=175

2. http://blog.csdn.net/chenpingbupt/article/details/7972589

3. https://issues.apache.org/jira/secure/attachment/12445209/appendDesign3.pdf


你可能感兴趣的:(File,hdfs,append)