大写的UFO

[spark] 内存管理 MemoryManager 解析

概述

spark的内存管理有两套方案，新旧方案分别对应的类是UnifiedMemoryManager和StaticMemoryManager。

旧方案是静态的，storageMemory（存储内存）和executionMemory（执行内存）拥有的内存是独享的不可相互借用，故在其中一方内存充足，另一方内存不足但又不能借用的情况下会造成资源的浪费。新方案是统一管理的，初始状态是内存各占一半，但其中一方内存不足时可以向对方借用，对内存资源进行合理有效的利用，提高了整体资源的利用率。

总的来说内存分为三大块，包括storageMemory、executionMemory、系统预留，其中storageMemory用来缓存rdd，unroll partition，存放direct task result、广播变量，在 Spark Streaming receiver 模式中存放每个 batch 的 blocks。executionMemory用于shuffle、join、sort、aggregation 中的缓存。除了这两者以外的内存都是预留给系统的。

旧方案 StaticMemoryManager

在SparkEnv中会创建memoryManager：

val useLegacyMemoryManager = conf.getBoolean("spark.memory.useLegacyMode", false)
    val memoryManager: MemoryManager =
      if (useLegacyMemoryManager) {
        new StaticMemoryManager(conf, numUsableCores)
      } else {
        UnifiedMemoryManager(conf, numUsableCores)
      }

默认使用的是统一管理方案UnifiedMemoryManager，这里我们简要的看看旧方案StaticMemoryManager。

storageMemory能分到的内存是：

systemMaxMemory * memoryFraction * safetyFraction

其中：

systemMaxMemory ：Runtime.getRuntime.maxMemory，即JVM能获得的最大内存空间。
memoryFraction：由参数spark.storage.memoryFraction控制，默认0.6。
safetyFraction：由参数spark.storage.safetyFraction控制，默认是0.9，因为cache block都是估算的，所以需要一个安全系数来保证安全。

executionMemory能分到的内存是：

systemMaxMemory * memoryFraction * safetyFraction

其中：

systemMaxMemory ：Runtime.getRuntime.maxMemory，即JVM能获得的最大内存空间。
memoryFraction：由参数spark.shuffle.memoryFraction控制，默认0.2。
safetyFraction：由参数spark.shuffle.safetyFraction控制，默认是0.8。

memoryFraction系数之外和安全系数之外的内存就是给系统预留的了。

executionMemory能分到的内存直接影响了shuffle中spill的频率，增加executionMemory可减少spill的次数，但storageMemory能cache的容量也相应减少。

execution 和 storage 被分配到内存后大小就一直不变了，每次申请内存都只能申请自己独有的不能相互借用，会造成资源的浪费。另外，只有 execution 内存支持 off heap，storage 内存不支持 off heap。

新方案 UnifiedMemoryManager

由于新方案中storageMemory和executionMemory是统一管理的，我们看看两者一共能拿到多少内存。

private def getMaxMemory(conf: SparkConf): Long = {
    val systemMemory = conf.getLong("spark.testing.memory", Runtime.getRuntime.maxMemory)
    val reservedMemory = conf.getLong("spark.testing.reservedMemory",
      if (conf.contains("spark.testing")) 0 else RESERVED_SYSTEM_MEMORY_BYTES)
    val minSystemMemory = (reservedMemory * 1.5).ceil.toLong
    if (systemMemory < minSystemMemory) {
      throw new IllegalArgumentException(s"System memory $systemMemory must " +
        s"be at least $minSystemMemory. Please increase heap size using the --driver-memory " +
        s"option or spark.driver.memory in Spark configuration.")
    }
    // SPARK-12759 Check executor memory to fail fast if memory is insufficient
    if (conf.contains("spark.executor.memory")) {
      val executorMemory = conf.getSizeAsBytes("spark.executor.memory")
      if (executorMemory < minSystemMemory) {
        throw new IllegalArgumentException(s"Executor memory $executorMemory must be at least " +
          s"$minSystemMemory. Please increase executor memory using the " +
          s"--executor-memory option or spark.executor.memory in Spark configuration.")
      }
    }
    val usableMemory = systemMemory - reservedMemory
    val memoryFraction = conf.getDouble("spark.memory.fraction", 0.6)
    (usableMemory * memoryFraction).toLong
  }

首先给系统内存reservedMemory预留了300M，若jvm能拿到的最大内存和配置的executor内存分别不足以reservedMemory的1.5倍即450M都会抛出异常，最后storage和execution能拿到的内存为：

 (heap space - 300) * spark.memory.fraction （默认为0.6）

storage和execution各占所获内存的50%。

申请storage内存

为某个blockId申请numBytes大小的内存：

override def acquireStorageMemory(
      blockId: BlockId,
      numBytes: Long,
      memoryMode: MemoryMode): Boolean = synchronized {
    assertInvariants()
    assert(numBytes >= 0)
    val (executionPool, storagePool, maxMemory) = memoryMode match {
      case MemoryMode.ON_HEAP => (
        onHeapExecutionMemoryPool,
        onHeapStorageMemoryPool,
        maxOnHeapStorageMemory)
      case MemoryMode.OFF_HEAP => (
        offHeapExecutionMemoryPool,
        offHeapStorageMemoryPool,
        maxOffHeapMemory)
    }
    // 申请的内存大于storage和execution内存之和
    if (numBytes > maxMemory) {
      // Fail fast if the block simply won't fit
      logInfo(s"Will not store $blockId as the required space ($numBytes bytes) exceeds our " +
        s"memory limit ($maxMemory bytes)")
      return false
    }
    // 大于storage空闲内存
    if (numBytes > storagePool.memoryFree) {
      // There is not enough free memory in the storage pool, so try to borrow free memory from
      // the execution pool.
      val memoryBorrowedFromExecution = Math.min(executionPool.memoryFree, numBytes)
      executionPool.decrementPoolSize(memoryBorrowedFromExecution)
      storagePool.incrementPoolSize(memoryBorrowedFromExecution)
    }
    storagePool.acquireMemory(blockId, numBytes)
  }

若申请的numBytes比两者总共的内存还大，直接返回false，说明申请失败。
若numBytes比storage空闲的内存大，则需要向executionPool借用
- 借用的大小为此时execution的空闲内存和numBytes的较小值（个人观点应该是和(numBytes-storage空闲内存)的较小值）
- 减小execution的poolSize
- 增加storage的poolSize

即使向executionPool借用了内存，但不一定就够numBytes，因为不可能把execution正在使用的内存都接过来，接着调用了storagePool的acquireMemory方法在不够numBytes的情况下去释放storage中共cache的rdd，以增加storagePool.memoryFree的值：

def acquireMemory(blockId: BlockId, numBytes: Long): Boolean = lock.synchronized {
    val numBytesToFree = math.max(0, numBytes - memoryFree)
    acquireMemory(blockId, numBytes, numBytesToFree)
  }

计算出向execution借了内存后还差多少内存才能满足numBytes，即需要释放的内存numBytesToFree 。接着调用了acquireMemory方法：

def acquireMemory(
      blockId: BlockId,
      numBytesToAcquire: Long,
      numBytesToFree: Long): Boolean = lock.synchronized {
    assert(numBytesToAcquire >= 0)
    assert(numBytesToFree >= 0)
    assert(memoryUsed <= poolSize)
    if (numBytesToFree > 0) {
      memoryStore.evictBlocksToFreeSpace(Some(blockId), numBytesToFree, memoryMode)
    }
    // NOTE: If the memory store evicts blocks, then those evictions will synchronously call
    // back into this StorageMemoryPool in order to free memory. Therefore, these variables
    // should have been updated.
    val enoughMemory = numBytesToAcquire <= memoryFree
    if (enoughMemory) {
      _memoryUsed += numBytesToAcquire
    }
    enoughMemory
  }

当numBytesToFree 大于0的情况下，就真的要去释放缓存在memory中的block，释放完后再看空闲内存是否能满足numBytes，若满足则将numBytes加到已使用的变量里。

看看当需要从storay中释放block的时候是怎么释放的：

private[spark] def evictBlocksToFreeSpace(
      blockId: Option[BlockId],
      space: Long,
      memoryMode: MemoryMode): Long = {
    assert(space > 0)
    memoryManager.synchronized {
      var freedMemory = 0L
      val rddToAdd = blockId.flatMap(getRddId)
      val selectedBlocks = new ArrayBuffer[BlockId]
      def blockIsEvictable(blockId: BlockId, entry: MemoryEntry[_]): Boolean = {
        entry.memoryMode == memoryMode && (rddToAdd.isEmpty || rddToAdd != getRddId(blockId))
      }
      // This is synchronized to ensure that the set of entries is not changed
      // (because of getValue or getBytes) while traversing the iterator, as that
      // can lead to exceptions.
      entries.synchronized {
        val iterator = entries.entrySet().iterator()
        while (freedMemory < space && iterator.hasNext) {
          val pair = iterator.next()
          val blockId = pair.getKey
          val entry = pair.getValue
          if (blockIsEvictable(blockId, entry)) {
            // We don't want to evict blocks which are currently being read, so we need to obtain
            // an exclusive write lock on blocks which are candidates for eviction. We perform a
            // non-blocking "tryLock" here in order to ignore blocks which are locked for reading:
            if (blockInfoManager.lockForWriting(blockId, blocking = false).isDefined) {
              selectedBlocks += blockId
              freedMemory += pair.getValue.size
            }
          }
        }
      }

      def dropBlock[T](blockId: BlockId, entry: MemoryEntry[T]): Unit = {
        val data = entry match {
          case DeserializedMemoryEntry(values, _, _) => Left(values)
          case SerializedMemoryEntry(buffer, _, _) => Right(buffer)
        }
        val newEffectiveStorageLevel =
          blockEvictionHandler.dropFromMemory(blockId, () => data)(entry.classTag)
        if (newEffectiveStorageLevel.isValid) {
          // The block is still present in at least one store, so release the lock
          // but don't delete the block info
          blockInfoManager.unlock(blockId)
        } else {
          // The block isn't present in any store, so delete the block info so that the
          // block can be stored again
          blockInfoManager.removeBlock(blockId)
        }
      }

      if (freedMemory >= space) {
        logInfo(s"${selectedBlocks.size} blocks selected for dropping " +
          s"(${Utils.bytesToString(freedMemory)} bytes)")
        for (blockId <- selectedBlocks) {
          val entry = entries.synchronized { entries.get(blockId) }
          // This should never be null as only one task should be dropping
          // blocks and removing entries. However the check is still here for
          // future safety.
          if (entry != null) {
            dropBlock(blockId, entry)
          }
        }
        logInfo(s"After dropping ${selectedBlocks.size} blocks, " +
          s"free memory is ${Utils.bytesToString(maxMemory - blocksMemoryUsed)}")
        freedMemory
      } else {
        blockId.foreach { id =>
          logInfo(s"Will not store $id")
        }
        selectedBlocks.foreach { id =>
          blockInfoManager.unlock(id)
        }
        0L
      }
    }
  }

spark中内存中的block都是通过memoryStore来存储的，用

private val entries = new LinkedHashMap[BlockId, MemoryEntry[_]](32, 0.75f, true)

来维护了blockId和MemoryEntry（对应value的包装）的关联，另外方法中还定义了两个方法，blockIsEvictable方法是判断遍历到的blockId和当前blockId是否属于同一个rdd，因为不能提出同一个rdd的另外一个block。dropBlock方法就是真正执行从内存中移除block的，若StorageLevel包括了使用disk，则会写到磁盘文件。

整段代码的逻辑简单概述就是：遍历当前memoryStore中存的每个block（不是和当前请求的block属于同于同一rdd），直到block对应的内存之和大于所需释放的内存才停止遍历，也有可能遍历完了都还不能满足所需的内存。若能释放的内存满足所需的内存，则真正执行移除，否则不移除，因为不可能一个block在内存中一部分，在磁盘一部分，最后返回真正剔除block释放的内存。

总结一下向StorageMemory申请内存的过程（在MemoryMode.ON_HEAP模式下）：

若numBytes大于storage和execution内存之和，抛异常。
若numBytes大于storage空闲内存，向execution借用min（executionFree,numBytes）大的内存，并更新各自的poolSize。
若申请完后还不够，则释放storage中的block来补足。
- memoryStore缓存的block大小满足需要补足的大小，则真正执行剔除（遍历block直到内存满足需求对应的block），否则不剔除。
最终若空闲内存满足numBytes则返回true，否则返回false。

申请execution内存

在execution内存不足向storage借用时，还是不满足所需内存的情况下能借多少借多少。看看在需要向execution申请内存时是怎么处理的（MemoryMode.ON_HEAP模式下）：

override private[memory] def acquireExecutionMemory(
      numBytes: Long,
      taskAttemptId: Long,
      memoryMode: MemoryMode): Long = synchronized {
    assertInvariants()
    assert(numBytes >= 0)
    val (executionPool, storagePool, storageRegionSize, maxMemory) = memoryMode match {
      case MemoryMode.ON_HEAP => (
        onHeapExecutionMemoryPool,
        onHeapStorageMemoryPool,
        onHeapStorageRegionSize,
        maxHeapMemory)
      case MemoryMode.OFF_HEAP => (
        offHeapExecutionMemoryPool,
        offHeapStorageMemoryPool,
        offHeapStorageMemory,
        maxOffHeapMemory)
    }

    /**
     * Grow the execution pool by evicting cached blocks, thereby shrinking the storage pool.
     *
     * When acquiring memory for a task, the execution pool may need to make multiple
     * attempts. Each attempt must be able to evict storage in case another task jumps in
     * and caches a large block between the attempts. This is called once per attempt.
     */
    def maybeGrowExecutionPool(extraMemoryNeeded: Long): Unit = {
      if (extraMemoryNeeded > 0) {
        // There is not enough free memory in the execution pool, so try to reclaim memory from
        // storage. We can reclaim any free memory from the storage pool. If the storage pool
        // has grown to become larger than `storageRegionSize`, we can evict blocks and reclaim
        // the memory that storage has borrowed from execution.
        val memoryReclaimableFromStorage = math.max(
          storagePool.memoryFree,
          storagePool.poolSize - storageRegionSize)
        if (memoryReclaimableFromStorage > 0) {
          // Only reclaim as much space as is necessary and available:
          val spaceToReclaim = storagePool.freeSpaceToShrinkPool(
            math.min(extraMemoryNeeded, memoryReclaimableFromStorage))
          storagePool.decrementPoolSize(spaceToReclaim)
          executionPool.incrementPoolSize(spaceToReclaim)
        }
      }
    }

    /**
     * The size the execution pool would have after evicting storage memory.
     *
     * The execution memory pool divides this quantity among the active tasks evenly to cap
     * the execution memory allocation for each task. It is important to keep this greater
     * than the execution pool size, which doesn't take into account potential memory that
     * could be freed by evicting storage. Otherwise we may hit SPARK-12155.
     *
     * Additionally, this quantity should be kept below `maxMemory` to arbitrate fairness
     * in execution memory allocation across tasks, Otherwise, a task may occupy more than
     * its fair share of execution memory, mistakenly thinking that other tasks can acquire
     * the portion of storage memory that cannot be evicted.
     */
    def computeMaxExecutionPoolSize(): Long = {
      maxMemory - math.min(storagePool.memoryUsed, storageRegionSize)
    }

    executionPool.acquireMemory(
      numBytes, taskAttemptId, maybeGrowExecutionPool, computeMaxExecutionPoolSize)
  }

这里先讲解这里面的两个方法：

maybeGrowExecutionPool就是需要向storage借内存的方法，能借用的最大内存memoryReclaimableFromStorage 为storage的空闲内存和storage向execution借用的内存（即已经使用也要释放来归还）的较大值，若memoryReclaimableFromStorage为0，则说明storage之前没有向execution借用内存，并且此时storage没有空闲的内存可借。

最终申请借用的是所需内存和memoryReclaimableFromStorage的较小值（缺多少借多少），跟进storagePool.freeSpaceToShrinkPool方法看看其实现：

def freeSpaceToShrinkPool(spaceToFree: Long): Long = lock.synchronized {
    val spaceFreedByReleasingUnusedMemory = math.min(spaceToFree, memoryFree)
    val remainingSpaceToFree = spaceToFree - spaceFreedByReleasingUnusedMemory
    if (remainingSpaceToFree > 0) {
      // If reclaiming free memory did not adequately shrink the pool, begin evicting blocks:
      val spaceFreedByEviction =
        memoryStore.evictBlocksToFreeSpace(None, remainingSpaceToFree, memoryMode)
      // When a block is released, BlockManager.dropFromMemory() calls releaseMemory(), so we do
      // not need to decrement _memoryUsed here. However, we do need to decrement the pool size.
      spaceFreedByReleasingUnusedMemory + spaceFreedByEviction
    } else {
      spaceFreedByReleasingUnusedMemory
    }
  }

若storage空闲内存不足以所申请的内存，则需要通过释放storage中缓存的block来补充。

方法computeMaxExecutionPoolSize即计算的是execution拥有的最大可用内存。

接着通过这两个函数作为参数调用了方法executionPool.acquireMemory：

private[memory] def acquireMemory(
      numBytes: Long,
      taskAttemptId: Long,
      maybeGrowPool: Long => Unit = (additionalSpaceNeeded: Long) => Unit,
      computeMaxPoolSize: () => Long = () => poolSize): Long = lock.synchronized {
    assert(numBytes > 0, s"invalid number of bytes requested: $numBytes")

    // TODO: clean up this clunky method signature

    // Add this task to the taskMemory map just so we can keep an accurate count of the number
    // of active tasks, to let other tasks ramp down their memory in calls to `acquireMemory`
    if (!memoryForTask.contains(taskAttemptId)) {
      memoryForTask(taskAttemptId) = 0L
      // This will later cause waiting tasks to wake up and check numTasks again
      lock.notifyAll()
    }

    // Keep looping until we're either sure that we don't want to grant this request (because this
    // task would have more than 1 / numActiveTasks of the memory) or we have enough free
    // memory to give it (we always let each task get at least 1 / (2 * numActiveTasks)).
    // TODO: simplify this to limit each task to its own slot
    while (true) {
      val numActiveTasks = memoryForTask.keys.size
      val curMem = memoryForTask(taskAttemptId)

      // In every iteration of this loop, we should first try to reclaim any borrowed execution
      // space from storage. This is necessary because of the potential race condition where new
      // storage blocks may steal the free execution memory that this task was waiting for.
      maybeGrowPool(numBytes - memoryFree)

      // Maximum size the pool would have after potentially growing the pool.
      // This is used to compute the upper bound of how much memory each task can occupy. This
      // must take into account potential free memory as well as the amount this pool currently
      // occupies. Otherwise, we may run into SPARK-12155 where, in unified memory management,
      // we did not take into account space that could have been freed by evicting cached blocks.
      val maxPoolSize = computeMaxPoolSize()
      val maxMemoryPerTask = maxPoolSize / numActiveTasks
      val minMemoryPerTask = poolSize / (2 * numActiveTasks)

      // How much we can grant this task; keep its share within 0 <= X <= 1 / numActiveTasks
      val maxToGrant = math.min(numBytes, math.max(0, maxMemoryPerTask - curMem))
      // Only give it as much memory as is free, which might be none if it reached 1 / numTasks
      val toGrant = math.min(maxToGrant, memoryFree)

      // We want to let each task get at least 1 / (2 * numActiveTasks) before blocking;
      // if we can't give it this much now, wait for other tasks to free up memory
      // (this happens if older tasks allocated lots of memory before N grew)
      if (toGrant < numBytes && curMem + toGrant < minMemoryPerTask) {
        logInfo(s"TID $taskAttemptId waiting for at least 1/2N of $poolName pool to be free")
        lock.wait()
      } else {
        memoryForTask(taskAttemptId) += toGrant
        return toGrant
      }
    }
    0L  // Never reached
  }

里面定义了一个Task能使用的execution内存：

val maxPoolSize = computeMaxPoolSize()
      val maxMemoryPerTask = maxPoolSize / numActiveTasks
      val minMemoryPerTask = poolSize / (2 * numActiveTasks)

其中maxPoolSize 为从 storage 借用了内存后，executionMemoryPool 的最大可用内存，保证一个Task可用的内存在 1/2*numActiveTasks ~ 1/numActiveTasks 范围内，整体保证各个Task资源占用平衡。

向execution申请内存代码流程：

先获取Task目前已经分配到的内存。
当numBytes大于execution空闲内存，则会通过maybeGrowPool方法向storage借内存。
能获取的最大内存maxToGrant为numBytes和（maxMemoryPerTask - curMem）的较小值。
本次循环能获取真正的内存toGrant为maxToGrant和（execution向memory借用后可用的内存）的较小值。
若最终能申请的内存小于numBytes且申请的内存加上原来有的内存还不足以一个Task最小的使用内存minMemoryPerTask，则会阻塞，直到有足够的内存或者有新的Task进来减小了minMemoryPerTask的值。
否则直接返回本次分配到的内存。

对于向storage和execution申请内存以及相互借用内存的方式至此讲解完成。用到storage和execution内存的地方很多（看概述），其中缓存rdd会向storage申请内存，运行Task会向execution申请内存，接下来分别看看是在什么时候申请的。

缓存 RDD

final def iterator(split: Partition, context: TaskContext): Iterator[T] = {
    if (storageLevel != StorageLevel.NONE) {
      getOrCompute(split, context)
    } else {
      computeOrReadCheckpoint(split, context)
    }
  }

每个rdd分区的数据都是通过对应的迭代器得到，其中若存储级别不为NONE，则会先尝试从储存介质中（内存、磁盘文件等）获取，第一次获取当然都没有，只有先计算完缓存起来以供后续的计算直接获取。缓存序列化和非序列化的数据的缓存方式不一样，非序列化的缓存的代码是：

memoryStore.putIteratorAsValues(blockId, iterator(), classTag)

 private[storage] def putIteratorAsValues[T](
      blockId: BlockId,
      values: Iterator[T],
      classTag: ClassTag[T]): Either[PartiallyUnrolledIterator[T], Long] = {

    require(!contains(blockId), s"Block $blockId is already present in the MemoryStore")

    // Number of elements unrolled so far
    var elementsUnrolled = 0
    // Whether there is still enough memory for us to continue unrolling this block
    var keepUnrolling = true
    // Initial per-task memory to request for unrolling blocks (bytes).
    val initialMemoryThreshold = unrollMemoryThreshold
    // How often to check whether we need to request more memory
    val memoryCheckPeriod = 16
    // Memory currently reserved by this task for this particular unrolling operation
    var memoryThreshold = initialMemoryThreshold
    // Memory to request as a multiple of current vector size
    val memoryGrowthFactor = 1.5
    // Keep track of unroll memory used by this particular block / putIterator() operation
    var unrollMemoryUsedByThisBlock = 0L
    // Underlying vector for unrolling the block
    var vector = new SizeTrackingVector[T]()(classTag)

    // Request enough memory to begin unrolling
    keepUnrolling =
      reserveUnrollMemoryForThisTask(blockId, initialMemoryThreshold, MemoryMode.ON_HEAP)

    if (!keepUnrolling) {
      logWarning(s"Failed to reserve initial memory threshold of " +
        s"${Utils.bytesToString(initialMemoryThreshold)} for computing block $blockId in memory.")
    } else {
      unrollMemoryUsedByThisBlock += initialMemoryThreshold
    }

    // Unroll this block safely, checking whether we have exceeded our threshold periodically
    while (values.hasNext && keepUnrolling) {
      vector += values.next()
      if (elementsUnrolled % memoryCheckPeriod == 0) {
        // If our vector's size has exceeded the threshold, request more memory
        val currentSize = vector.estimateSize()
        if (currentSize >= memoryThreshold) {
          val amountToRequest = (currentSize * memoryGrowthFactor - memoryThreshold).toLong
          keepUnrolling =
            reserveUnrollMemoryForThisTask(blockId, amountToRequest, MemoryMode.ON_HEAP)
          if (keepUnrolling) {
            unrollMemoryUsedByThisBlock += amountToRequest
          }
          // New threshold is currentSize * memoryGrowthFactor
          memoryThreshold += amountToRequest
        }
      }
      elementsUnrolled += 1
    }

    if (keepUnrolling) {
      // We successfully unrolled the entirety of this block
      val arrayValues = vector.toArray
      vector = null
      val entry =
        new DeserializedMemoryEntry[T](arrayValues, SizeEstimator.estimate(arrayValues), classTag)
      val size = entry.size
      def transferUnrollToStorage(amount: Long): Unit = {
        // Synchronize so that transfer is atomic
        memoryManager.synchronized {
          releaseUnrollMemoryForThisTask(MemoryMode.ON_HEAP, amount)
          val success = memoryManager.acquireStorageMemory(blockId, amount, MemoryMode.ON_HEAP)
          assert(success, "transferring unroll memory to storage memory failed")
        }
      }
      // Acquire storage memory if necessary to store this block in memory.
      val enoughStorageMemory = {
        if (unrollMemoryUsedByThisBlock <= size) {
          val acquiredExtra =
            memoryManager.acquireStorageMemory(
              blockId, size - unrollMemoryUsedByThisBlock, MemoryMode.ON_HEAP)
          if (acquiredExtra) {
            transferUnrollToStorage(unrollMemoryUsedByThisBlock)
          }
          acquiredExtra
        } else { // unrollMemoryUsedByThisBlock > size
          // If this task attempt already owns more unroll memory than is necessary to store the
          // block, then release the extra memory that will not be used.
          val excessUnrollMemory = unrollMemoryUsedByThisBlock - size
          releaseUnrollMemoryForThisTask(MemoryMode.ON_HEAP, excessUnrollMemory)
          transferUnrollToStorage(size)
          true
        }
      }
      if (enoughStorageMemory) {
        entries.synchronized {
          entries.put(blockId, entry)
        }
        logInfo("Block %s stored as values in memory (estimated size %s, free %s)".format(
          blockId, Utils.bytesToString(size), Utils.bytesToString(maxMemory - blocksMemoryUsed)))
        Right(size)
      } else {
        assert(currentUnrollMemoryForThisTask >= unrollMemoryUsedByThisBlock,
          "released too much unroll memory")
        Left(new PartiallyUnrolledIterator(
          this,
          MemoryMode.ON_HEAP,
          unrollMemoryUsedByThisBlock,
          unrolled = arrayValues.toIterator,
          rest = Iterator.empty))
      }
    } else {
      // We ran out of space while unrolling the values for this block
      logUnrollFailureMessage(blockId, vector.estimateSize())
      Left(new PartiallyUnrolledIterator(
        this,
        MemoryMode.ON_HEAP,
        unrollMemoryUsedByThisBlock,
        unrolled = vector.iterator,
        rest = values))
    }
  }

代码太长了，我自己看到都头大了，没事，咱一点一点的慢慢来~

参数中的blockId是一个block的唯一标示，格式是"rdd_" + rddId + "_" + splitIndex，value就是该partition对应数据的迭代器。

通过reserveUnrollMemoryForThisTask方法向Storage申请initialMemoryThreshold（初始值可通过spark.storage.unrollMemoryThreshold配置，默认1M）的内存来unroll 迭代器：

def reserveUnrollMemoryForThisTask(
   blockId: BlockId,
   memory: Long,
   memoryMode: MemoryMode): Boolean = {
 memoryManager.synchronized {
   val success = memoryManager.acquireUnrollMemory(blockId, memory, memoryMode)
   if (success) {
     val taskAttemptId = currentTaskAttemptId()
     val unrollMemoryMap = memoryMode match {
       case MemoryMode.ON_HEAP => onHeapUnrollMemoryMap
       case MemoryMode.OFF_HEAP => offHeapUnrollMemoryMap
     }
     unrollMemoryMap(taskAttemptId) = unrollMemoryMap.getOrElse(taskAttemptId, 0L) + memory
   }
   success
  }
}

跟进acquireUnrollMemory可看见底层调用的就是前面所讲的向storage申请内存的方法acquireStorageMemory，若申请成功则将对应的onHeapUnrollMemoryMap加上申请到的内存，即unroll使用的内存。

若申请成功则跟新unrollMemoryUsedByThisBlock的值，即在该block上unroll使用的内存。
接着进行遍历，停止遍历的条件有两个，一是迭代器全部遍历完，二是没有申请到内存。
- 每迭代一条数据都会加到SizeTrackingVector类型的vector中（底层由数组实现），每迭代16次都会估算vector的大小是否超过了memoryThreshold（申请的内存）。
- 若超过了memoryThreshold，则会计算再次申请内存的大小，1.5倍当前vector大小-已经申请到的内存大小。
- 再次向Storage申请内存，若申请成功，则跟新unrollMemoryUsedByThisBlock，继续遍历进入下次循环，否则停止遍历。
循环结束后，若keepUnrolling 为 true，则说明values 一定被全部展开了；若为false，则没有全部被展开，说明没有申请到足够的内存来展开这个values，意味着该partition缓存到内存失败。
在values全部成功展开的前提下，会将vector构造成一个DeserializedMemoryEntry对象，其中包括数据的大小，接着会将展开后的数据大小和申请的内存大小作比较：
- 若申请的内存比数据小，则再次向storage申请对应的大小，申请成功则将unroll使用的内存转化到storage中去，转化对应的逻辑是：释放掉该Task占用的所有unroll内存，又向storage申请对应的内存，其实unroll内存就是storage内存，即操作的都是storage的内存，减去某值又加上某值，结果没有变，但流程还得这么走，因为为了将 MemoryStore 和 MemoryManager 的解耦。
- 若申请的内存比数据大，则释放掉对应的unroll内存，接着将unroll使用的内存转化到storage中去。
- 最后将blockId和对应的entry加入到memorySore所管理的entries中去。

缓存序列化rdd支持 ON_HEAP 和 OFF_HEAP，和缓存非序列化rdd的方式类似，只是以流的形式写到bytebuffer中，其中MemoryMode 如果是 ON_HEAP，这里的 ByteBuffer 是 HeapByteBuffer（堆上内存）；而如果是 OFF_HEAP，这里的 ByteBuffer 则是 DirectByteBuffer（指向的是堆外内存）。最后根据数据构建成SerializedMemoryEntry来保存在memoryStore的entries中。

shuffle中execution内存的使用

在shuffle write的时候，并不会直接将数据写到磁盘（详情请看Shuffle Write解析），而是先写到一个集合中，此集合占用的内存就是execution内存，初始给的大小是5M，可通过spark.shuffle.spill.initialMemoryThreshold进行设置，每写一次数据就判断是否需要溢写到磁盘，溢写之前还尝试会向execution申请来避免溢写，代码如下：

protected def maybeSpill(collection: C, currentMemory: Long): Boolean = {
    var shouldSpill = false
    if (elementsRead % 32 == 0 && currentMemory >= myMemoryThreshold) {
      // Claim up to double our current memory from the shuffle memory pool
      val amountToRequest = 2 * currentMemory - myMemoryThreshold
      val granted = acquireMemory(amountToRequest)
      myMemoryThreshold += granted
      // If we were granted too little memory to grow further (either tryToAcquire returned 0,
      // or we already had more memory than myMemoryThreshold), spill the current collection
      shouldSpill = currentMemory >= myMemoryThreshold
    }
    shouldSpill = shouldSpill || _elementsRead > numElementsForceSpillThreshold
    // Actually spill
    if (shouldSpill) {
      _spillCount += 1
      logSpillage(currentMemory)
      spill(collection)
      _elementsRead = 0
      _memoryBytesSpilled += currentMemory
      releaseMemory()
    }
    shouldSpill
  }

当insert&update的次数是32的倍数且当前集合的大小已经大于等于了已经申请到的内存，此时会尝试向execution申请更多的内存来避免spill，申请的大小为2倍当前集合大小减去已经申请到的内存大小，跟进acquireMemory方法：

 public long acquireMemory(long size) {
    long granted = taskMemoryManager.acquireExecutionMemory(size, this);
    used += granted;
    return granted;
  }

这不就是我们前面讲的向execution申请内存的方法吗，这里就不再叙述。

参考

http://www.jianshu.com/p/999ef21dffe8

你可能感兴趣的:(spark)

Hadoop 和 Spark 的内存管理机制分析王子良. 经验分享 hadoop spark 大数据
欢迎来到我的博客！非常高兴能在这里与您相遇。在这里，您不仅能获得有趣的技术分享，还能感受到轻松愉快的氛围。无论您是编程新手，还是资深开发者，都能在这里找到属于您的知识宝藏，学习和成长。博客内容包括：Java核心技术与微服务：涵盖Java基础、JVM、并发编程、Redis、Kafka、Spring等，帮助您全面掌握企业级开发技术。大数据技术：涵盖Hadoop（HDFS）、Hive、Spark、Fli
大数据学习（五）：如何使用 Livy提交spark批量任务--转载 zuoseve01 livy
Livy是一个开源的REST接口，用于与Spark进行交互，它同时支持提交执行代码段和完整的程序。Livy封装了spark-submit并支持远端执行。启动服务器执行以下命令，启动livy服务器。./bin/livy-server这里假设spark使用yarn模式，所以所有文件路径都默认位于HDFS中。如果是本地开发模式的话，直接使用本地文件即可（注意必须配置livy.conf文件，设置livy.
Spark Livy 指南及livy部署访问实践 house.zhang 大数据-Spark 大数据
背景：ApacheSpark是一个比较流行的大数据框架、广泛运用于数据处理、数据分析、机器学习中，它提供了两种方式进行数据处理，一是交互式处理：比如用户使用spark-shell，编写交互式代码编译成spark作业提交到集群上去执行；二是批处理，通过spark-submit提交打包好的spark应用jar到集群中进行执行。这两种运行方式都需要安装spark客户端配置好yarn集群信息，并打通集群网
大数据学习（四）：Livy的安装配置及pyspark的会话执行猪笨是念来过倒大数据 pyspark
一个基于Spark的开源REST服务，它能够通过REST的方式将代码片段或是序列化的二进制代码提交到Spark集群中去执行。它提供了以下这些基本功能：提交Scala、Python或是R代码片段到远端的Spark集群上执行；提交Java、Scala、Python所编写的Spark作业到远端的Spark集群上执行；提交批处理应用在集群中运行。从Livy所提供的基本功能可以看到Livy涵盖了原生Spar
探索数据科学新边界：Apache Livy 开源项目详解毕艾琳
探索数据科学新边界：ApacheLivy开源项目详解incubator-livyApacheLivyisanopensourceRESTinterfaceforinteractingwithApacheSparkfromanywhere.项目地址:https://gitcode.com/gh_mirrors/in/incubator-livyApacheLivy是一个为ApacheSpark提供的
大数据公司 Databricks 详解 Bj陈默大数据
Databricks是一家在大数据和人工智能领域具有重要影响力的美国企业软件公司，以下是关于它的详细技术解析：1.起源与背景：Databricks成立于2013年，由来自加州大学伯克利分校AMP实验室的Spark大数据处理系统的多位创始人联合创立，包括AliGhodsi、AndyKonwinski、IonStoica、PatrickWendell、ReynoldXin、MateiZaharia、A
全面解读 Databricks：从架构、引擎到优化策略克里斯蒂亚诺罗纳尔多阿维罗架构 spark 大数据
导语：Databricks是一家由ApacheSpark创始团队成员创立的公司，同时也是一个统一分析平台，帮助企业构建数据湖与数据仓库一体化（Lakehouse）的架构。在Databricks平台上，数据工程、数据科学与数据分析团队能够协作使用Spark、DeltaLake、MLflow等工具高效处理数据与构建机器学习应用。本文将深入介绍Databricks的平台概念、架构特点、优化机制、功能特性
使用 Hadoop 实现大数据的高效存储与查询王子良. 经验分享大数据 hadoop 分布式
欢迎来到我的博客！非常高兴能在这里与您相遇。在这里，您不仅能获得有趣的技术分享，还能感受到轻松愉快的氛围。无论您是编程新手，还是资深开发者，都能在这里找到属于您的知识宝藏，学习和成长。博客内容包括：Java核心技术与微服务：涵盖Java基础、JVM、并发编程、Redis、Kafka、Spring等，帮助您全面掌握企业级开发技术。大数据技术：涵盖Hadoop（HDFS）、Hive、Spark、Fli
Spark 源码分析(一) SparkRpc中序列化与反序列化Serializer的抽象类解读（正在更新中~）别人能写出来的，你也能行！多学习别人的思路，形成自己的思路，高薪工作奔你而来！小白的大数据历程 Spark源码解析开发语言 spark 大数据分布式 scala
后一篇链接在这接上一章请先看解读序列化抽象类第一部分（这是一个链接）目录接上一章请先看解读序列化抽象类第一部分2.Java序列化实现类JavaSerializer(1)JavaSerializationStream类代码实际例子1：序列化(2)JavaDeserializationStream代码实际例子2：反序列化Spark源码下类图在学习过程中，抓住主要问题，请思考问题为什么Kryo序列化更加
Spark 源码分析(一) SparkRpc中序列化与反序列化Serializer的抽象类解读（java序列化部分完结，正在更新RpcEnv部分~）小白的大数据历程 Spark源码解析 spark java python
目录(3)JavaSerializerInstance定义了一个Java序列化实例(1)构造方法参数(2)方法1：serializeStream(3)方法2：deserializeStreamdefaultClassLoader(4)方法3：deserializeStreamloader(5)方法4：serialize(6)方法5：deserializeloader(7)方法6：deseriali
大数据-257 离线数仓 - 数据质量监控监控方法 Griffin架构武子康大数据离线数仓大数据数据仓库 java 后端 hadoop hive
点一下关注吧！！！非常感谢！！持续更新！！！Java篇开始了！目前开始更新MyBatis，一起深入浅出！目前已经更新到了：Hadoop（已更完）HDFS（已更完）MapReduce（已更完）Hive（已更完）Flume（已更完）Sqoop（已更完）Zookeeper（已更完）HBase（已更完）Redis（已更完）Kafka（已更完）Spark（已更完）Flink（已更完）ClickHouse（已
pyspark 中删除hdfs的文件夹 TDengine （老段）大数据 spark hadoop hdfs mapreduce
在pyspark中保存rdd的内存到文件的时候，会遇到文件夹已经存在而失败，所以如果文件夹已经存在，需要先删除。搜索了下资料，发现pyspark并没有提供直接管理hdfs文件系统的功能。寻找到一个删除的方法，是通过调用shell命令hadoopfs-rm-f来删除，这个方法感觉不怎么好，所以继续找。后来通过查找hadoophdfs的源代码发现hdfs是通过java的包org.appache.had
Python 爬虫：获取网页数据的 5 种方法王子良. 经验分享 python python 开发语言爬虫
欢迎来到我的博客！非常高兴能在这里与您相遇。在这里，您不仅能获得有趣的技术分享，还能感受到轻松愉快的氛围。无论您是编程新手，还是资深开发者，都能在这里找到属于您的知识宝藏，学习和成长。博客内容包括：Java核心技术与微服务：涵盖Java基础、JVM、并发编程、Redis、Kafka、Spring等，帮助您全面掌握企业级开发技术。大数据技术：涵盖Hadoop（HDFS）、Hive、Spark、Fli
python捕获异常青云游子 python
try:name="aaa"id="aaa"exceptExceptionase:print("任务报错")print(str(e))print(str(traceback.print_exc()))spark.sql("""insertintotabledim.aaaselect'1','666','{name}','{id}',null,null,null,null,current_times
Spark任务提交流程尘世壹俗人大数据Spark技术大数据
当包含在applicationmaster中的spark-driver启动后，会与资源调度平台交互获取其他执行器资源，并通过反向注册通知对应的node节点启动执行容器。此外，还会根据程序的执行规划生成两个非常重要的东西，一个是根据spark任务执行计划生成n个ADG有向无环图，另一个是根据有向无环图生成对应的taskset，也可以统称为stage，ADG和taskset由于宽窄依赖以及程序的复杂度
spark读取、写入Clickhouse以及遇到的问题 Alex_81D 大数据基础大数据从入门到精通 clickhouse spark
最近需要处理Clickhouse里面的数据，经过上网查找总结一下spark读写Clickhouse的工具类已经遇到的问题点。具体Clickhouse的讲解本篇不做讲解，后面专门讲解这个。一、clickhouse代码操作话不多说直接看代码1.引入依赖：ru.yandex.clickhouseclickhouse-jdbc0.2.40.2.4这个版本用的比较多一点2.spark对象创建valspark
2024年最新Python：Page Object设计模式_python page object，BTAJ大厂最新面试题汇集 m0_60707708 程序员 python 设计模式开发语言
最后硬核资料：关注即可领取PPT模板、简历模板、行业经典书籍PDF。技术互助：技术群大佬指点迷津，你的问题可能不是问题，求资源在群里喊一声。面试题库：由技术群里的小伙伴们共同投稿，热乎的大厂面试真题，持续更新中。知识体系：含编程语言、算法、大数据生态圈组件（Mysql、Hive、Spark、Flink）、数据仓库、Python、前端等等。网上学习资料一大堆，但如果学到的知识不成体系，遇到问题时只是
2024年总结：大转向年度总结
本文于2025年1月2号首发于公众号“狗哥琐话”。2024年是个打工人苦命年，我看到几乎每个人都比以往辛苦。这让我想起了六字真言，钱难赚屎难吃。职业转向今年我在职业上尝试做了一个转向，具体的结果可能需要比较长的时间来检验我选择是否正确，所以转向的细节我就不全部展开了，可以确定是我依然会专注在Infra和BigData，比如今年我发布了SparkSQL和FlinkSQL的IDEA提效插件。那么我为什
Java爬虫——使用Spark进行数据清晰 Future_yzx java 爬虫 spark
1.依赖引入 org.apache.spark spark-core_2.13 3.5.3 org.apache.spark spark-sql_2.13 3.5.32.数据加载从MySQL数据库中加载jobTest表中的数据，使用Spark的JDBC功能连接到数据库。代码片段：//数据库连接信息StringjdbcUrl="jdbc:mysql://82.157.185.251:3306/
万字详解数仓分层设计架构 ODS-DWD-DWS-ADS _Jordan 自己写的数据仓库
参考：万字详解数仓分层设计架构ODS-DWD-DWS-ADS数据分层的意义1、清晰数据结构2、数据血缘追踪3、数据复用，减少重复开发4、把复杂问题简单化5、屏蔽原始数据的(影响)，屏蔽业务的影响ETL操作1、数据抽取2、数据清洗3、数据转换4、数据加载数据中台包含的内容很多，对应到具体工作中的话，它可以包含下面的这些内容：系统架构：以Hadoop、Spark等组件为中心的架构体系数据架构：顶层设计
Java 大视界 -- Java 开发 Spark 应用：RDD 操作与数据转换一只蜗牛儿 java spark 开发语言
ApacheSpark是一个强大的分布式计算框架，提供了高效的数据处理能力，广泛应用于大数据分析与机器学习。Spark提供了多种高级API，支持批处理和流处理。Spark提供了两种主要的数据抽象：RDD（弹性分布式数据集）和DataFrame。本文将重点介绍如何使用Java开发Spark应用，并深入探讨RDD的操作与数据转换。一、Spark环境搭建首先，确保您的环境中安装了Java和Spark。您
Spring Boot 和微服务：快速入门指南王子良. Java 经验分享 spring boot 微服务后端
欢迎来到我的博客！非常高兴能在这里与您相遇。在这里，您不仅能获得有趣的技术分享，还能感受到轻松愉快的氛围。无论您是编程新手，还是资深开发者，都能在这里找到属于您的知识宝藏，学习和成长。博客内容包括：Java核心技术与微服务：涵盖Java基础、JVM、并发编程、Redis、Kafka、Spring等，帮助您全面掌握企业级开发技术。大数据技术：涵盖Hadoop（HDFS）、Hive、Spark、Fli
CDP中的Hive3之Hive Metastore（HMS）对许 #Hive #Spark hive cdp
CDP中的Hive3之HiveMetastore（HMS）1、CDP中的HMS2、HMS表的存储（转换）3、HWC授权1、CDP中的HMSCDP中的HiveMetastore（HMS）是一种服务，用于在后端RDBMS（例如MySQL或PostgreSQL）中存储与ApacheHive和其他服务相关的元数据。Impala、Spark、Hive和其他服务共享元存储。与HMS的连接包括HiveServe
【YashanDB知识库】Hive 命令工具insert崖山数据库报错数据库
本文内容来自YashanDB官网，原文内容请见https://www.yashandb.com/newsinfo/7919217.html?templateId=171...【问题分类】功能兼容【关键字】spark30041、不兼容【问题描述】本项目的架构是hadoop+hive+yashandb使用崖山数据库，初始化所有的原数据表和数据新建表之后，插入数据时候报错，hadoopcode30041
初学者如何用 Python 写第一个爬虫？王子良. python 经验分享 python 开发语言爬虫
欢迎来到我的博客！非常高兴能在这里与您相遇。在这里，您不仅能获得有趣的技术分享，还能感受到轻松愉快的氛围。无论您是编程新手，还是资深开发者，都能在这里找到属于您的知识宝藏，学习和成长。博客内容包括：Java核心技术与微服务：涵盖Java基础、JVM、并发编程、Redis、Kafka、Spring等，帮助您全面掌握企业级开发技术。大数据技术：涵盖Hadoop（HDFS）、Hive、Spark、Fli
Apache PAIMON 学习潇锐killer 学习
参考：ApachePAIMON：实时数据湖技术框架及其实践数据湖不仅仅是一个存储不同类数据的技术手段，更是提高数据分析效率、支持数据驱动决策、加速AI发展的基础设施。新一代实时数据湖技术，ApachePAIMON兼容ApacheFlink、Spark等主流计算引擎，并支持流批一体化处理、快速查询和性能优化，成为加速AI转型的重要工具。ApachePAIMON是一个支持大规模实时数据更新的存储和分析
应急救援路径规划中的蚁群算法与路径评价研究【附代码】拉勾科研工作室算法
数据科学与大数据专业|数据分析与模型构建|数据驱动决策✨专业领域：数据挖掘与清洗大数据处理与存储技术机器学习与深度学习模型数据可视化与报告生成分布式计算与云计算数据安全与隐私保护擅长工具：Python/R/Matlab数据分析与建模Hadoop/Spark大数据处理平台SQL数据库管理与优化Tableau/PowerBI数据可视化工具TensorFlow/PyTorch深度学习框架✅具体问题可以私
Java 大视界 -- Java 开发 Spark 应用：RDD 操作与数据转换（四）青云交大数据新视界 Java 大视界 Spark RDD 数据转换大数据数据分区性能优化社交网络 java
亲爱的朋友们，热烈欢迎你们来到青云交的博客！能与你们在此邂逅，我满心欢喜，深感无比荣幸。在这个瞬息万变的时代，我们每个人都在苦苦追寻一处能让心灵安然栖息的港湾。而我的博客，正是这样一个温暖美好的所在。在这里，你们不仅能够收获既富有趣味又极为实用的内容知识，还可以毫无拘束地畅所欲言，尽情分享自己独特的见解。我真诚地期待着你们的到来，愿我们能在这片小小的天地里共同成长，共同进步。本博客的精华专栏：大数
大数据新视界 --大数据大厂之 Spark Streaming 实时数据处理框架：案例与实践青云交大数据新视界 #Spark 之道 Spark Streaming 大数据新视界实时数据处理案例分析实践技巧框架比较应用场景
亲爱的朋友们，热烈欢迎你们来到青云交的博客！能与你们在此邂逅，我满心欢喜，深感无比荣幸。在这个瞬息万变的时代，我们每个人都在苦苦追寻一处能让心灵安然栖息的港湾。而我的博客，正是这样一个温暖美好的所在。在这里，你们不仅能够收获既富有趣味又极为实用的内容知识，还可以毫无拘束地畅所欲言，尽情分享自己独特的见解。我真诚地期待着你们的到来，愿我们能在这片小小的天地里共同成长，共同进步。本博客的精华专栏：大数
nosql数据库技术与应用知识点皆过客，揽星河 NoSQL nosql 数据库大数据数据分析数据结构非关系型数据库
Nosql知识回顾大数据处理流程数据采集(flume、爬虫、传感器)数据存储(本门课程NoSQL所处的阶段)Hdfs、MongoDB、HBase等数据清洗(入仓)Hive等数据处理、分析(Spark、Flink等)数据可视化数据挖掘、机器学习应用(Python、SparkMLlib等)大数据时代存储的挑战(三高)高并发(同一时间很多人访问)高扩展(要求随时根据需求扩展存储)高效率(要求读写速度快)
PHP，安卓，UI，java，linux视频教程合集 cocos2d-x小菜 java UI linux PHP android
╔-----------------------------------╗┆
zookeeper admin 笔记 braveCS zookeeper
Required Software 1) JDK>=1.6 2)推荐使用ensemble的ZooKeeper(至少3台)，并run on separate machines 3)在Yahoo!，zk配置在特定的RHEL boxes里，2个cpu，2G内存，80G硬盘数据和日志目录 1)数据目录里的文件是zk节点的持久化备份，包括快照和事务日
Spring配置多个连接池 easterfly spring
项目中需要同时连接多个数据库的时候，如何才能在需要用到哪个数据库就连接哪个数据库呢？ Spring中有关于dataSource的配置： <bean id="dataSource" class="com.mchange.v2.c3p0.ComboPooledDataSource" &nb
Mysql 171815164 mysql
例如，你想myuser使用mypassword从任何主机连接到mysql服务器的话。 GRANT ALL PRIVILEGES ON *.* TO 'myuser'@'%'IDENTIFIED BY 'mypassword' WI TH GRANT OPTION; 如果你想允许用户myuser从ip为192.168.1.6的主机连接到mysql服务器，并使用mypassword作
CommonDAO（公共/基础DAO） g21121 DAO
好久没有更新博客了，最近一段时间工作比较忙，所以请见谅，无论你是爱看呢还是爱看呢还是爱看呢，总之或许对你有些帮助。 DAO(Data Access Object)是一个数据访问（顾名思义就是与数据库打交道）接口，DAO一般在业
直言有讳永夜-极光感悟随笔
1.转载地址:http://blog.csdn.net/jasonblog/article/details/10813313 精华: “直言有讳”是阿里巴巴提倡的一种观念，而我在此之前并没有很深刻的认识。为什么呢？就好比是读书时候做阅读理解，我喜欢我自己的解读，并不喜欢老师给的意思。在这里也是。我自己坚持的原则是互相尊重，我觉得阿里巴巴很多价值观其实是基本的做人
安装CentOS 7 和Win 7后，Win7 引导丢失随便小屋 centos
一般安装双系统的顺序是先装Win7，然后在安装CentOS，这样CentOS可以引导WIN 7启动。但安装CentOS7后，却找不到Win7 的引导，稍微修改一点东西即可。一、首先具有root 的权限。即进入Terminal后输入命令su，然后输入密码即可二、利用vim编辑器打开/boot/grub2/grub.cfg文件进行修改 v
Oracle备份与恢复案例 aijuans oracle
Oracle备份与恢复案例一. 理解什么是数据库恢复当我们使用一个数据库时，总希望数据库的内容是可靠的、正确的，但由于计算机系统的故障（硬件故障、软件故障、网络故障、进程故障和系统故障）影响数据库系统的操作，影响数据库中数据的正确性，甚至破坏数据库，使数据库中全部或部分数据丢失。因此当发生上述故障后，希望能重构这个完整的数据库，该处理称为数据库恢复。恢复过程大致可以分为复原(Restore)与
JavaEE开源快速开发平台G4Studio v5.0发布無為子
我非常高兴地宣布,今天我们最新的JavaEE开源快速开发平台G4Studio_V5.0版本已经正式发布。访问G4Studio网站 http://www.g4it.org 2013-04-06 发布G4Studio_V5.0版本功能新增 (1). 新增了调用Oracle存储过程返回游标，并将游标映射为Java List集合对象的标
Oracle显示根据高考分数模拟录取百合不是茶 PL/SQL编程 oracle例子模拟高考录取学习交流
题目要求: 1,创建student表和result表 2,pl/sql对学生的成绩数据进行处理 3,处理的逻辑是根据每门专业课的最低分线和总分的最低分数线自动的将录取和落选 1,创建student表,和result表学生信息表; create table student( student_id number primary key,--学生id
优秀的领导与差劲的领导 bijian1013 领导管理团队
责任优秀的领导：优秀的领导总是对他所负责的项目担负起责任。如果项目不幸失败了，那么他知道该受责备的人是他自己，并且敢于承认错误。差劲的领导：差劲的领导觉得这不是他的问题，因此他会想方设法证明是他的团队不行，或是将责任归咎于团队中他不喜欢的那几个成员身上。努力工作优秀的领导：团队领导应该是团队成员的榜样。至少，他应该与团队中的其他成员一样努力工作。这仅仅因为他
js函数在浏览器下的兼容 Bill_chen jquery 浏览器 IE DWR ext
做前端开发的工程师，少不了要用FF进行测试，纯js函数在不同浏览器下，名称也可能不同。对于IE6和FF，取得下一结点的函数就不尽相同： IE6：node.nextSibling,对于FF是不能识别的； FF：node.nextElementSibling,对于IE是不能识别的；兼容解决方式：var Div = node.nextSibl
【JVM四】老年代垃圾回收：吞吐量垃圾收集器(Throughput GC) bit1129 垃圾回收
吞吐量与用户线程暂停时间衡量垃圾回收算法优劣的指标有两个：吞吐量越高，则算法越好暂停时间越短，则算法越好首先说明吞吐量和暂停时间的含义。垃圾回收时，JVM会启动几个特定的GC线程来完成垃圾回收的任务，这些GC线程与应用的用户线程产生竞争关系，共同竞争处理器资源以及CPU的执行时间。GC线程不会对用户带来的任何价值，因此，好的GC应该占
J2EE监听器和过滤器基础白糖_ J2EE
Servlet程序由Servlet，Filter和Listener组成，其中监听器用来监听Servlet容器上下文。监听器通常分三类：基于Servlet上下文的ServletContex监听，基于会话的HttpSession监听和基于请求的ServletRequest监听。 ServletContex监听器 ServletContex又叫application
博弈AngularJS讲义(16) - 提供者 boyitech js AngularJS api Angular Provider
Angular框架提供了强大的依赖注入机制，这一切都是有注入器(injector)完成. 注入器会自动实例化服务组件和符合Angular API规则的特殊对象，例如控制器，指令，过滤器动画等。那注入器怎么知道如何去创建这些特殊的对象呢？ Angular提供了5种方式让注入器创建对象，其中最基础的方式就是提供者(provider), 其余四种方式(Value, Fac
java-写一函数f(a,b)，它带有两个字符串参数并返回一串字符，该字符串只包含在两个串中都有的并按照在a中的顺序。 bylijinnan java
public class CommonSubSequence { /** * 题目：写一函数f(a,b)，它带有两个字符串参数并返回一串字符，该字符串只包含在两个串中都有的并按照在a中的顺序。 * 写一个版本算法复杂度O(N^2)和一个O(N) 。 * * O(N^2)：对于a中的每个字符，遍历b中的每个字符，如果相同，则拷贝到新字符串中。 * O(
sqlserver 2000 无法验证产品密钥 Chen.H sql windows SQL Server Microsoft
在 Service Pack 4 (SP 4), 是运行 Microsoft Windows Server 2003、 Microsoft Windows Storage Server 2003 或 Microsoft Windows 2000 服务器上您尝试安装 Microsoft SQL Server 2000 通过卷许可协议 (VLA) 媒体。这样做, 收到以下错误信息CD KEY的 SQ
[新概念武器]气象战争 comsci
气象战争的发动者必须是拥有发射深空航天器能力的国家或者组织.... 原因如下: 地球上的气候变化和大气层中的云层涡旋场有密切的关系,而维持一个在大气层某个层次
oracle 中 rollup、cube、grouping 使用详解 daizj oracle grouping rollup cube
oracle 中 rollup、cube、grouping 使用详解 -- 使用oracle 样例表演示转自namesliu -- 使用oracle 的样列库，演示 rollup, cube, grouping 的用法与使用场景 --- ROLLUP ，为了理解分组的成员数量，我增加了分组的计数 COUNT(SAL)
技术资料汇总分享 Dead_knight 技术资料汇总分享
本人汇总的技术资料，分享出来，希望对大家有用。 http://pan.baidu.com/s/1jGr56uE 资料主要包含： Workflow->工作流相关理论、框架(OSWorkflow、JBPM、Activiti、fireflow...) Security->java安全相关资料(SSL、SSO、SpringSecurity、Shiro、JAAS...) Ser
初一下学期难记忆单词背诵第一课 dcj3sjt126com english word
could 能够 minute 分钟 Tuesday 星期二 February 二月 eighteenth 第十八 listen 听 careful 小心的，仔细的 short 短的 heavy 重的 empty 空的 certainly 当然 carry 携带；搬运 tape 磁带 basket 蓝子 bottle 瓶 juice 汁，果汁 head 头；头部
截取视图的图片, 然后分享出去 dcj3sjt126com OS Objective-C
OS 7 has a new method that allows you to draw a view hierarchy into the current graphics context. This can be used to get an UIImage very fast. I implemented a category method on UIView to get the vi
MySql重置密码 fanxiaolong MySql重置密码
方法一: 在my.ini的[mysqld]字段加入： skip-grant-tables 重启mysql服务，这时的mysql不需要密码即可登录数据库然后进入mysql mysql>use mysql; mysql>更新 user set password=password('新密码') WHERE User='root'; mysq
Ehcache（03）——Ehcache中储存缓存的方式 234390216 ehcache MemoryStore DiskStore 存储驱除策略
Ehcache中储存缓存的方式目录 1 堆内存（MemoryStore） 1.1 指定可用内存 1.2 驱除策略 1.3 元素过期 2 &nbs
spring mvc中的@propertysource jackyrong spring mvc
在spring mvc中，在配置文件中的东西，可以在java代码中通过注解进行读取了： @PropertySource 在spring 3.1中开始引入比如有配置文件 config.properties mongodb.url=1.2.3.4 mongodb.db=hello 则代码中 @PropertySource(&
重学单例模式 lanqiu17 单例 Singleton 模式
最近在重新学习设计模式，感觉对模式理解更加深刻。觉得有必要记下来。第一个学的就是单例模式，单例模式估计是最好理解的模式了。它的作用就是防止外部创建实例，保证只有一个实例。单例模式的常用实现方式有两种，就人们熟知的饱汉式与饥汉式，具体就不多说了。这里说下其他的实现方式静态内部类方式: package test.pattern.singleton.statics; publ
.NET开源核心运行时，且行且珍惜 netcome java .net 开源
背景 2014年11月12日，ASP.NET之父、微软云计算与企业级产品工程部执行副总裁Scott Guthrie，在Connect全球开发者在线会议上宣布，微软将开源全部.NET核心运行时，并将.NET 扩展为可在 Linux 和 Mac OS 平台上运行。.NET核心运行时将基于MIT开源许可协议发布，其中将包括执行.NET代码所需的一切项目——CLR、JIT编译器、垃圾收集器（GC）和核心
使用oscahe缓存技术减少与数据库的频繁交互 Everyday都不同 Web 高并发 oscahe缓存
此前一直不知道缓存的具体实现，只知道是把数据存储在内存中，以便下次直接从内存中读取。对于缓存的使用也没有概念，觉得缓存技术是一个比较”神秘陌生“的领域。但最近要用到缓存技术，发现还是很有必要一探究竟的。缓存技术使用背景：一般来说，对于web项目，如果我们要什么数据直接jdbc查库好了，但是在遇到高并发的情形下，不可能每一次都是去查数据库，因为这样在高并发的情形下显得不太合理——
Spring+Mybatis 手动控制事务 toknowme mybatis
@Override public boolean testDelete(String jobCode) throws Exception { boolean flag = false; &nbs
菜鸟级的android程序员面试时候需要掌握的知识点 xp9802 android
熟悉Android开发架构和API调用掌握APP适应不同型号手机屏幕开发技巧熟悉Android下的数据存储熟练Android Debug Bridge Tool 熟练Eclipse/ADT及相关工具熟悉Android框架原理及Activity生命周期熟练进行Android UI布局熟练使用SQLite数据库；熟悉Android下网络通信机制，S