Hbase选择Store file做compaction的算法

The algorithm is basically as follows:

Run over the set of all store files, from oldest to youngest

If there are more than 3 (hbase.hstore.compactionThreshold) store files left and the current store file is 20% larger then the sum of all younger store files, and it is larger than the memstore flush size, then we go on to the next, younger, store file and repeat step 2.

Once one of the conditions in step two is not valid anymore, the store files from the current one to the youngest one are the ones that will be merged together. If there are less than the compactionThreshold, no merge will be performed. There is also a limit which prevents more than 10 (hbase.hstore.compaction.max) store files to be merged in one compaction.

与compaction相关的配置参数,可以在Hbase-default.xml或者Hbase-site.xml进行查看或者配置。

this.minFilesToCompact = Math.max(2, conf.getInt("hbase.hstore.compaction.min", /*old name*/ conf.getInt("hbase.hstore.compactionThreshold", 3)));this.majorCompactionTime = getNextMajorCompactTime();this.maxFilesToCompact = conf.getInt("hbase.hstore.compaction.max", 10);this.minCompactSize = conf.getLong("hbase.hstore.compaction.min.size", this.region.memstoreFlushSize);this.maxCompactSize = conf.getLong("hbase.hstore.compaction.max.size", Long.MAX_VALUE); 

2011/7/11更新选择哪些store files去做min compaction的代码注释:

  //////////////////////////////////////////////////////////////////////////////
  // Compaction
  //////////////////////////////////////////////////////////////////////////////

  /**
   * Compact the StoreFiles.  This method may take some time, so the calling
   * thread must be able to block for long periods.
   *
   * <p>During this time, the Store can work as usual, getting values from
   * StoreFiles and writing new StoreFiles from the memstore.
   *
   * Existing StoreFiles are not destroyed until the new compacted StoreFile is
   * completely written-out to disk.
   *
   * <p>The compactLock prevents multiple simultaneous compactions.
   * The structureLock prevents us from interfering with other write operations.
   *
   * <p>We don't want to hold the structureLock for the whole time, as a compact()
   * can be lengthy and we want to allow cache-flushes during this period.
   *
   * @param forceMajor True to force a major compaction regardless of thresholds
   * @return row to split around if a split is needed, null otherwise
   * @throws IOException
   */
  StoreSize compact(final boolean forceMajor) throws IOException {
    boolean forceSplit = this.region.shouldForceSplit();
    boolean majorcompaction = forceMajor;
    synchronized (compactLock) { // 一次只能有一个thread进行compaction, store范围,region范围还是region server范围?
      this.lastCompactSize = 0;

      // filesToCompact are sorted oldest to newest.
      List<StoreFile> filesToCompact = this.storefiles;
      if (filesToCompact.isEmpty()) {
        LOG.debug(this.storeNameStr + ": no store files to compact");
        return null;
      }

      // Check to see if we need to do a major compaction on this region.
      // If so, change doMajorCompaction to true to skip the incremental
      // compacting below. Only check if doMajorCompaction is not true.
      if (!majorcompaction) {
        majorcompaction = isMajorCompaction(filesToCompact);
      }

      boolean references = hasReferences(filesToCompact);
      if (!majorcompaction && !references &&
          (forceSplit || (filesToCompact.size() < compactionThreshold))) {
        return checkSplit(forceSplit);
      }

      /* get store file sizes for incremental compacting selection.
       * normal skew:
       *
       *         older ----> newer
       *     _
       *    | |   _
       *    | |  | |   _
       *  --|-|- |-|- |-|---_-------_-------  minCompactSize(参数配置)
       *    | |  | |  | |  | |  _  | |
       *    | |  | |  | |  | | | | | |
       *    | |  | |  | |  | | | | | |
       */
      int countOfFiles = filesToCompact.size();
      long [] fileSizes = new long[countOfFiles];
      long [] sumSize = new long[countOfFiles];
      for (int i = countOfFiles-1; i >= 0; --i) {
        StoreFile file = filesToCompact.get(i);
        Path path = file.getPath();
        if (path == null) {
          LOG.error("Path is null for " + file);
          return null;
        }
        StoreFile.Reader r = file.getReader();
        if (r == null) {
          LOG.error("StoreFile " + file + " has a null Reader");
          return null;
        }
        fileSizes[i] = file.getReader().length();
        // calculate the sum of fileSizes[i,i+maxFilesToCompact-1) for algo
        int tooFar = i + this.maxFilesToCompact - 1;
        /**
         * e.g:sum_size保存的是相邻的maxFielsToCompact个storeFile大小的和
         * index : 0,   1,  2,  3   4,  5
         * f size: 10, 20, 15, 25, 15, 10
         * fooFar:  2,  3,  4,  5,  6,  7
         * s size: 45, 60, 55, 50, 25, 10 (maxFilesToCompact = 3, countOfFiles = 6)
         */
        sumSize[i] = fileSizes[i]
                   + ((i+1    < countOfFiles) ? sumSize[i+1]      : 0)
                   - ((tooFar < countOfFiles) ? fileSizes[tooFar] : 0);
      }

      long totalSize = 0;
      if (!majorcompaction && !references) {
        // we're doing a minor compaction, let's see what files are applicable
        int start = 0;
        double r = this.compactRatio;

        /* Start at the oldest file and stop when you find the first file that
         * meets compaction criteria:
         * 从老的storefile到新的storefile进行遍历,停止的条件是当遇到一个storefile的
         * 大小小于minCompactSize的时候,或者是小于后面maxFilesToCompact个storefile
         * 大小的和乘以compactRatio(默认1.2)
         * 
         * X <= minCompactSize || X <= SUM_ * compactRatio ==> 停止
         * X > minCompactSize && x > SUM_ * compactRation  ==> 继续扫描
         * X > max(minCompactSize, SUM_ * compactRation)   ==> 继续扫描
         *   (1) a recently-flushed, small file (i.e. <= minCompactSize)
         *      OR
         *   (2) within the compactRatio of sum(newer_files)
         * Given normal skew, any newer files will also meet this criteria
         *
         * Additional Note:
         * If fileSizes.size() >> maxFilesToCompact, we will recurse on
         * compact().  Consider the oldest files first to avoid a
         * situation where we always compact [end-threshold,end).  Then, the
         * last file becomes an aggregate of the previous compactions.
         */
        /*
         * 至少有compactionThreshold这么多个store files
         * 至少满足停止条件(1)(2)的时候
         * ==>
         * 才能进行min compaction
         */
        while(countOfFiles - start >= this.compactionThreshold &&
              fileSizes[start] >
                Math.max(minCompactSize, (long)(sumSize[start+1] * r))) {
          ++start;
        }
        
        // 确定我们一次min compaction最多只能有maxFilesToCompact个store file
        int end = Math.min(countOfFiles, start + this.maxFilesToCompact);
        // 包含在这次min compaction里面的store file总大小
        totalSize = fileSizes[start]
                  + ((start+1 < countOfFiles) ? sumSize[start+1] : 0);

        // if we don't have enough files to compact, just wait
        if (end - start < this.compactionThreshold) {
          if (LOG.isDebugEnabled()) {
            LOG.debug("Skipped compaction of " + this.storeNameStr
              + " because only " + (end - start) + " file(s) of size "
              + StringUtils.humanReadableInt(totalSize)
              + " meet compaction criteria.");
          }
          return checkSplit(forceSplit);
        }

        if (0 == start && end == countOfFiles) {
          // we decided all the files were candidates! major compact
          majorcompaction = true;
        } else {
          // 从待compaction的store file list中先切除一批满足条件的store file去做min compaction
          filesToCompact = new ArrayList<StoreFile>(filesToCompact.subList(start,
            end));
        }
        // 进入if的条件是major compaction为false,出来的时候major compaction可能是false,也可能是true
      } else {
        // all files included in this compaction
        for (long i : fileSizes) {
          totalSize += i;
        }
      }
      this.lastCompactSize = totalSize;

      // Max-sequenceID is the last key in the files we're compacting
      long maxId = StoreFile.getMaxSequenceIdInList(filesToCompact);

      // Ready to go.  Have list of files to compact.
      LOG.info("Started compaction of " + filesToCompact.size() + " file(s) in cf=" +
          this.storeNameStr +
        (references? ", hasReferences=true,": " ") + " into " +
          region.getTmpDir() + ", seqid=" + maxId +
          ", totalSize=" + StringUtils.humanReadableInt(totalSize));
      // 选择好了store file, compact就是真正去做merge的函数了
      StoreFile.Writer writer = compact(filesToCompact, majorcompaction, maxId); 
      // Move the compaction into place.
      StoreFile sf = completeCompaction(filesToCompact, writer);
      if (LOG.isInfoEnabled()) {
        LOG.info("Completed" + (majorcompaction? " major ": " ") +
          "compaction of " + filesToCompact.size() +
          " file(s), new file=" + (sf == null? "none": sf.toString()) +
          ", size=" + (sf == null? "none": StringUtils.humanReadableInt(sf.getReader().length())) +
          "; total size for store is " + StringUtils.humanReadableInt(storeSize));
      }
    }
    return checkSplit(forceSplit);
  }


2011.12.13添加

几个关于compaction的hbase jira:

HBASE-3189 Stagger Major Compactions

HBASE-3209 : New Compaction Algorithm

HBASE-1476 Multithreaded Compactions

HBASE-3857 Change the HFile Format


你可能感兴趣的:(thread,File,hbase,null,Path,merge)