MetadataCleaner

阅读更多
MetadataCleaner运行定时任务周期性的清理元数据(metadata),有6种类型的元数据:MAP_OUTPUT_TRACKER,executor跟踪各个map任务输出的存储位置的数据,根据spark.cleaner.ttl.MAP_OUTPUT_TRACKER设置清理时间,默认值为-1,表示不清理;

SPARK_CONTEXT,SparkContext中记录缓存到内存中的RDD的数据结构,根据spark.cleaner.ttl.SPARK_CONTEXT设置清理时间,默认值为-1,表示不清理;;

HTTP_BROADCAST,采用http方式广播broadcast的元数据,根据spark.cleaner.ttl.HTTP_BROADCAST设置清理时间,默认值为-1,表示不清理;;

BLOCK_MANAGER,BlockManager中非Broadcast类型的Block数据,根据spark.cleaner.ttl.BLOCK_MANAGER设置清理时间,默认值为-1,表示不清理;;

SHUFFLE_BLOCK_MANAGER,shuffle输出的数据,根据spark.cleaner.ttl.SHUFFLE_BLOCK_MANAGER设置清理时间,默认值为-1,表示不清理;;

BROADCAST_VARS,Torrent方式广播broadcast的元数据,底层依赖于BlockManager,根据spark.cleaner.ttl.BROADCAST_VARS设置清理时间,默认值为-1,表示不清理。


  val name = cleanerType.toString

  private val delaySeconds = MetadataCleaner.getDelaySeconds(conf, cleanerType)
  private val periodSeconds = math.max(10, delaySeconds / 10)
  private val timer = new Timer(name + " cleanup timer", true)


  private val task = new TimerTask {
    override def run() {
      try {
        cleanupFunc(System.currentTimeMillis() - (delaySeconds * 1000))
        logInfo("Ran metadata cleaner for " + name)
      } catch {
        case e: Exception => logError("Error running cleanup task for " + name, e)
      }
    }
  }

  if (delaySeconds > 0) {
    logDebug(
      "Starting metadata cleaner for " + name + " with delay of " + delaySeconds + " seconds " +
      "and period of " + periodSeconds + " secs")
    timer.schedule(task, delaySeconds * 1000, periodSeconds * 1000)
  }

  def cancel() {
    timer.cancel()
  }

*1000说明 单位是秒


举例一个SPARK_CONTEXT

    _metadataCleaner = new MetadataCleaner(MetadataCleanerType.SPARK_CONTEXT, this.cleanup, _conf)




  private[spark] def cleanup(cleanupTime: Long) {
    persistentRdds.clearOldValues(cleanupTime)
  }



  private[spark] val persistentRdds = new TimeStampedWeakValueHashMap[Int, RDD[_]]


  def clearOldValues(threshTime: Long): Unit = internalMap.clearOldValues(threshTime)

  /** Remove entries with values that are no longer strongly reachable. */
  def clearNullValues() {
    val it = internalMap.getEntrySet.iterator
    while (it.hasNext) {
      val entry = it.next()
      if (entry.getValue.value.get == null) {
        logDebug("Removing key " + entry.getKey + " because it is no longer strongly reachable.")
        it.remove()
      }
    }
  }



MetadataCleaner 可以理解外置定时器



你可能感兴趣的:(spark,数据结构,JVM,hadoop)