HDFS垃圾回收源码解析

1. TrashPolicy类:所有的垃圾回收策略都需要实现该类,hdfs默认的实现方式是:TrashPolicyDefault,可通过fs.trash.classname  来配置。

2. TrashPolicy和TrashPolicyDefault类如下,这里只截取部分代码:

    /** 
 * This interface is used for implementing different Trash policies.
 * Provides factory method to create instances of the configured Trash policy.
 */
@InterfaceAudience.Public
@InterfaceStability.Evolving
public abstract class TrashPolicy extends Configured {
  protected FileSystem fs; // the FileSystem
  protected Path trash; // path to trash directory
  protected long deletionInterval; // deletion interval for Emptier

  ......

public class TrashPolicyDefault extends TrashPolicy {
  private static final Logger LOG =
      LoggerFactory.getLogger(TrashPolicyDefault.class);

  private static final Path CURRENT = new Path("Current");

  private static final FsPermission PERMISSION =
    new FsPermission(FsAction.ALL, FsAction.NONE, FsAction.NONE);

  private static final DateFormat CHECKPOINT = new SimpleDateFormat("yyMMddHHmmss");
  /** Format of checkpoint directories used prior to Hadoop 0.23. */
  private static final DateFormat OLD_CHECKPOINT =
      new SimpleDateFormat("yyMMddHHmm");
  private static final int MSECS_PER_MINUTE = 60*1000;

  private long emptierInterval;

几个关键参数和方法的说明:

protected Path trash; // 垃圾回收目录

protected long deletionInterval; // 当前时间-deletionInterval >检查点(时间)的检查点会被删除

private long emptierInterval; // 每过这么长时间就会进行一次删除检查点和创建检查点的操作,即deleteCheckpoint()和                  createCheckpoint()操作

deletionInterval和emptierInterval不配置,则默认取值为0,即禁用垃圾回收功能。deletionInterval取值fs.trash.interval,emptierInterval取值fs.trash.checkpoint.interval,没配置则取值deletionInterval;emptierInterval大于deletionInterval,则取值deletionInterval。

public void createCheckpoint() throws IOException {
创建检查点,就是把Current目录重命名为当前时间yyMMddHHmmss
public void deleteCheckpoint() throws IOException {
删除检查点,就是把yyMMddHHmmss的日期超过deletionInterval的都删掉

清理线程的主要逻辑就是:睡眠emptierInterval时间,先删除检查点,再建立检查点。

protected class Emptier implements Runnable {
     @Override
    public void run() {
      if (emptierInterval == 0)
        return;                                   // trash disabled
      long now = Time.now();
      long end;
      while (true) {
        end = ceiling(now, emptierInterval);
        try {                                     // sleep for interval
          Thread.sleep(end - now);
        } catch (InterruptedException e) {
          break;                                  // exit on interrupt
        }

        try {
          now = Time.now();
          if (now >= end) {
            Collection trashRoots;
            trashRoots = fs.getTrashRoots(true);      // list all trash dirs

            for (FileStatus trashRoot : trashRoots) {   // dump each trash
              if (!trashRoot.isDirectory())
                continue;
              try {
                TrashPolicyDefault trash = new TrashPolicyDefault(fs, conf);
                trash.deleteCheckpoint(trashRoot.getPath());
                trash.createCheckpoint(trashRoot.getPath(), new Date(now));
              } catch (IOException e) {
                LOG.warn("Trash caught: "+e+". Skipping " +
                    trashRoot.getPath() + ".");
              } 
            }
          }
        } catch (Exception e) {
          LOG.warn("RuntimeException during Trash.Emptier.run(): ", e); 
        }
      }
      try {
        fs.close();
      } catch(IOException e) {
        LOG.warn("Trash cannot close FileSystem: ", e);
      }
    }

你可能感兴趣的:(hadoop,hadoop,hdfs,大数据)