有语忆语

Spark内核解析-数据存储5（六）

1、Spark的数据存储

Spark计算速度远胜于Hadoop的原因之一就在于中间结果是缓存在内存而不是直接写入到disk，本文尝试分析Spark中存储子系统的构成，并以数据写入和数据读取为例，讲述清楚存储子系统中各部件的交互关系。

1.1存储子系统概览

Storage模块主要分为两层：
1)通信层：storage模块采用的是master-slave结构来实现通信层，master和slave之间传输控制信息、状态信息，这些都是通过通信层来实现的。
2)存储层：storage模块需要把数据存储到disk或是memory上面，有可能还需replicate到远端，这都是由存储层来实现和提供相应接口。
而其他模块若要和storage模块进行交互，storage模块提供了统一的操作类BlockManager，外部类与storage模块打交道都需要通过调用BlockManager相应接口来实现。

上图是Spark存储子系统中几个主要模块的关系示意图，现简要说明如下
1)CacheManager RDD在进行计算的时候，通过CacheManager来获取数据，并通过CacheManager来存储计算结果
2)BlockManager CacheManager在进行数据读取和存取的时候主要是依赖BlockManager接口来操作，BlockManager决定数据是从内存(MemoryStore)还是从磁盘(DiskStore)中获取
3)MemoryStore 负责将数据保存在内存或从内存读取
4)DiskStore 负责将数据写入磁盘或从磁盘读入
5)BlockManagerWorker 数据写入本地的MemoryStore或DiskStore是一个同步操作，为了容错还需要将数据复制到别的计算结点，以防止数据丢失的时候还能够恢复，数据复制的操作是异步完成，由BlockManagerWorker来处理这一部分事情
6)ConnectionManager 负责与其它计算结点建立连接，并负责数据的发送和接收
7)BlockManagerMaster 注意该模块只运行在Driver Application所在的Executor，功能是负责记录下所有BlockIds存储在哪个SlaveWorker上，比如RDD Task运行在机器A，所需要的BlockId为3，但在机器A上没有BlockId为3的数值，这个时候Slave worker需要通过BlockManager向BlockManagerMaster询问数据存储的位置，然后再通过ConnectionManager去获取。

1.2启动过程分析

上述的各个模块由SparkEnv来创建，创建过程在SparkEnv.create中完成

val blockManagerMaster = new BlockManagerMaster(registerOrLookup(
        "BlockManagerMaster",
        new BlockManagerMasterActor(isLocal, conf)), conf)
val blockManager = new BlockManager(executorId, actorSystem, blockManagerMaster, serializer, conf)

val connectionManager = blockManager.connectionManager
val broadcastManager = new BroadcastManager(isDriver, conf)
val cacheManager = new CacheManager(blockManager)

这段代码容易让人疑惑，看起来像是在所有的cluster node上都创建了BlockManagerMasterActor，其实不然，仔细看registerOrLookup函数的实现。如果当前节点是driver则创建这个actor，否则建立到driver的连接。

def registerOrLookup(name: String, newActor: => Actor): ActorRef = {
    if (isDriver) {
        logInfo("Registering " + name)
        actorSystem.actorOf(Props(newActor), name = name)
    } else {
        val driverHost: String = conf.get("spark.driver.host", "localhost")
        val driverPort: Int = conf.getInt("spark.driver.port", 7077)
        Utils.checkHost(driverHost, "Expected hostname")
        val url = s"akka.tcp://spark@$driverHost:$driverPort/user/$name"
        val timeout = AkkaUtils.lookupTimeout(conf)
        logInfo(s"Connecting to $name: $url")
        Await.result(actorSystem.actorSelection(url).resolveOne(timeout), timeout)
    }
}

初始化过程中一个主要的动作就是BlockManager需要向BlockManagerMaster发起注册

1.3通信层

BlockManager包装了BlockManagerMaster，发送信息包装成BlockManagerInfo。Spark在Driver和Worker端都创建各自的BlockManager，并通过BlockManagerMaster进行通信，通过BlockManager对Storage模块进行操作。
BlockManager对象在SparkEnv.create函数中进行创建：

def registerOrLookupEndpoint(
        name: String, endpointCreator: => RpcEndpoint):
RpcEndpointRef = {
    if (isDriver) {
        logInfo("Registering " + name)
        rpcEnv.setupEndpoint(name, endpointCreator)
    } else {
        RpcUtils.makeDriverRef(name, conf, rpcEnv)
    }
}
…………
val blockManagerMaster = new BlockManagerMaster(registerOrLookupEndpoint(
        BlockManagerMaster.DRIVER_ENDPOINT_NAME,
        new BlockManagerMasterEndpoint(rpcEnv, isLocal, conf, listenerBus)),
        conf, isDriver)

// NB: blockManager is not valid until initialize() is called later.
val blockManager = new BlockManager(executorId, rpcEnv, blockManagerMaster,
        serializer, conf, mapOutputTracker, shuffleManager, blockTransferService,     securityManager,numUsableCores)

并且在创建之前对当前节点是否是Driver进行了判断。如果是，则创建这个Endpoint；否则，创建Driver的连接。
在创建BlockManager之后，BlockManager会调用initialize方法初始化自己。并且初始化的时候，会调用BlockManagerMaster向Driver注册自己，同时，在注册时也启动了Slave Endpoint。另外，向本地shuffle服务器注册Executor配置，如果存在的话。

def initialize(appId: String): Unit = {
…………
    master.registerBlockManager(blockManagerId, maxMemory, slaveEndpoint)

    // Register Executors' configuration with the local shuffle service, if one should exist.
    if (externalShuffleServiceEnabled && !blockManagerId.isDriver) {
        registerWithExternalShuffleServer()
    }
}

而BlockManagerMaster将注册请求包装成RegisterBlockManager注册到Driver。Driver的BlockManagerMasterEndpoint会调用register方法，通过对消息BlockManagerInfo检查，向Driver注册。

private def register(id: BlockManagerId, maxMemSize: Long, slaveEndpoint: RpcEndpointRef) {
    val time = System.currentTimeMillis()
    if (!blockManagerInfo.contains(id)) {
        blockManagerIdByExecutor.get(id.executorId) match {
            case Some(oldId) =>
                // A block manager of the same executor already exists, so remove it (assumed dead)
                logError("Got two different block manager registrations on same executor - "
                        + s" will replace old one $oldId with new one $id")
                removeExecutor(id.executorId)
            case None =>
        }
        logInfo("Registering block manager %s with %s RAM, %s".format(
                id.hostPort, Utils.bytesToString(maxMemSize), id))

        blockManagerIdByExecutor(id.executorId) = id

        blockManagerInfo(id) = new BlockManagerInfo(
                id, System.currentTimeMillis(), maxMemSize, slaveEndpoint)
    }
    listenerBus.post(SparkListenerBlockManagerAdded(time, id, maxMemSize))
}

不难发现BlockManagerInfo对象被保存到Map映射中。
在通信层中BlockManagerMaster控制着消息的流向，这里采用了模式匹配，所有的消息模式都在BlockManagerMessage中。

1.4存储层

Spark Storage的最小存储单位是block，所有的操作都是以block为单位进行的。
在BlockManager被创建的时候MemoryStore和DiskStore对象就被创建出来了

val diskBlockManager = new DiskBlockManager(this, conf)
private[spark] val memoryStore = new MemoryStore(this, maxMemory)
private[spark] val diskStore = new DiskStore(this, diskBlockManager)

1.4.1Disk Store

由于当前的Spark版本对Disk Store进行了更细粒度的分工，把对文件的操作提取出来放到了DiskBlockManager中，DiskStore仅仅负责数据的存储和读取。
Disk Store会配置多个文件目录，Spark会在不同的文件目录下创建文件夹，其中文件夹的命名方式是：spark-UUID（随机UUID码）。Disk Store在存储的时候创建文件夹。并且根据“高内聚，低耦合”原则，这种服务型的工具代码就放到了Utils中（调用路径：DiskStore.putBytes—>DiskBlockManager.createLocalDirs—>Utils.createDirectory）：

def createDirectory(root: String, namePrefix: String = "spark"): File = {
    var attempts = 0
    val maxAttempts = MAX_DIR_CREATION_ATTEMPTS
    var dir: File = null
    while (dir == null) {
        attempts += 1
        if (attempts > maxAttempts) {
            throw new IOException("Failed to create a temp directory (under " + root + ") after " +
                    maxAttempts + " attempts!")
        }
        try {
            dir = new File(root, namePrefix + "-" + UUID.randomUUID.toString)
            if (dir.exists() || !dir.mkdirs()) {
                dir = null
            }
        } catch { case e: SecurityException => dir = null; }
    }

    dir.getCanonicalFile
}

在DiskBlockManager里，每个block都被存储为一个file，通过计算blockId的hash值，将block映射到文件中。

def getFile(filename: String): File = {
    // Figure out which local directory it hashes to, and which subdirectory in that
    val hash = Utils.nonNegativeHash(filename)
    val dirId = hash % localDirs.length
    val subDirId = (hash / localDirs.length) % subDirsPerLocalDir

    // Create the subdirectory if it doesn't already exist
    val subDir = subDirs(dirId).synchronized {
        val old = subDirs(dirId)(subDirId)
        if (old != null) {
            old
        } else {
            val newDir = new File(localDirs(dirId), "%02x".format(subDirId))
            if (!newDir.exists() && !newDir.mkdir()) {
                throw new IOException(s"Failed to create local dir in $newDir.")
            }
            subDirs(dirId)(subDirId) = newDir
            newDir
        }
    }

    new File(subDir, filename)
}

def getFile(blockId: BlockId): File = getFile(blockId.name)

通过hash值的取模运算，求出dirId和subDirId。然后，在从subDirs中找到subDir，如果subDir不存在，则创建一个新subDir。最后，以subDir为路径，blockId的name属性为文件名，新建该文件。
文件创建完之后，那么Spark就会在DiskStore中向文件写与之映射的block：

override def putBytes(blockId: BlockId, _bytes: ByteBuffer, level: StorageLevel): PutResult = {
    val bytes = _bytes.duplicate()
    logDebug(s"Attempting to put block $blockId")
    val startTime = System.currentTimeMillis
    val file = diskManager.getFile(blockId)
    val channel = new FileOutputStream(file).getChannel
    Utils.tryWithSafeFinally {
        while (bytes.remaining > 0) {
            channel.write(bytes)
        }
    } {
        channel.close()
    }
    val finishTime = System.currentTimeMillis
    logDebug("Block %s stored as %s file on disk in %d ms".format(
            file.getName, Utils.bytesToString(bytes.limit), finishTime - startTime))
    PutResult(bytes.limit(), Right(bytes.duplicate()))
}

读取过程就简单了，DiskStore根据blockId读取与之映射的file内容，当然，这中间需要从DiskBlockManager中得到文件信息。

private def getBytes(file: File, offset: Long, length: Long): Option[ByteBuffer] = {
    val channel = new RandomAccessFile(file, "r").getChannel
    Utils.tryWithSafeFinally {
        // For small files, directly read rather than memory map
        if (length < minMemoryMapBytes) {
            val buf = ByteBuffer.allocate(length.toInt)
            channel.position(offset)
            while (buf.remaining() != 0) {
                if (channel.read(buf) == -1) {
                    throw new IOException("Reached EOF before filling buffer\n" +
                            s"offset=$offset\nfile=${file.getAbsolutePath}\nbuf.remaining=${buf.remaining}")
                }
            }
            buf.flip()
            Some(buf)
        } else {
            Some(channel.map(MapMode.READ_ONLY, offset, length))
        }
    } {
        channel.close()
    }
}

override def getBytes(blockId: BlockId): Option[ByteBuffer] = {
    val file = diskManager.getFile(blockId.name)
    getBytes(file, 0, file.length)
}

1.4.2Memory Store

相对Disk Store，Memory Store就显得容易很多。Memory Store用一个LinkedHashMap来管理，其中Key是blockId，Value是MemoryEntry样例类，MemoryEntry存储着数据信息。

private case class MemoryEntry(value: Any, size: Long, deserialized: Boolean)
private val entries = new LinkedHashMap[BlockId, MemoryEntry](32, 0.75f, true)

在MemoryStore中存储block的前提是当前内存有足够的空间存放。通过对tryToPut函数的调用对内存空间进行判断。

def putBytes(blockId: BlockId, size: Long, _bytes: () => ByteBuffer): PutResult = {
    // Work on a duplicate - since the original input might be used elsewhere.
    lazy val bytes = _bytes().duplicate().rewind().asInstanceOf[ByteBuffer]
    val putAttempt = tryToPut(blockId, () => bytes, size, deserialized = false)
    val data =
    if (putAttempt.success) {
        assert(bytes.limit == size)
        Right(bytes.duplicate())
    } else {
        null
    }
    PutResult(size, data, putAttempt.droppedBlocks)
}

在tryToPut函数中，通过调用enoughFreeSpace函数判断内存空间。如果内存空间足够，那么就把block放到LinkedHashMap中；如果内存不足，那么就告诉BlockManager内存不足，如果允许Disk Store，那么就把该block放到disk上。

private def tryToPut(blockId: BlockId, value: () => Any, size: Long, deserialized: Boolean): ResultWithDroppedBlocks = {
    var putSuccess = false
    val droppedBlocks = new ArrayBuffer[(BlockId, BlockStatus)]

    accountingLock.synchronized {
        val freeSpaceResult = ensureFreeSpace(blockId, size)
        val enoughFreeSpace = freeSpaceResult.success
        droppedBlocks ++= freeSpaceResult.droppedBlocks

        if (enoughFreeSpace) {
            val entry = new MemoryEntry(value(), size, deserialized)
            entries.synchronized {
                entries.put(blockId, entry)
                currentMemory += size
            }
            val valuesOrBytes = if (deserialized) "values" else "bytes"
            logInfo("Block %s stored as %s in memory (estimated size %s, free %s)".format(
                    blockId, valuesOrBytes, Utils.bytesToString(size), Utils.bytesToString(freeMemory)))
            putSuccess = true
        } else {
            lazy val data = if (deserialized) {
                Left(value().asInstanceOf[Array[Any]])
            } else {
                Right(value().asInstanceOf[ByteBuffer].duplicate())
            }
            val droppedBlockStatus = blockManager.dropFromMemory(blockId, () => data)
            droppedBlockStatus.foreach { status => droppedBlocks += ((blockId, status)) }
        }
        releasePendingUnrollMemoryForThisTask()
    }
    ResultWithDroppedBlocks(putSuccess, droppedBlocks)
}

Memory Store读取block也很简单，只需要从LinkedHashMap中取出blockId的Value即可。

override def getValues(blockId: BlockId): Option[Iterator[Any]] = {
    val entry = entries.synchronized {
        entries.get(blockId)
    }
    if (entry == null) {
        None
    } else if (entry.deserialized) {
        Some(entry.value.asInstanceOf[Array[Any]].iterator)
    } else {
        val buffer = entry.value.asInstanceOf[ByteBuffer].duplicate() // Doesn't actually copy data
        Some(blockManager.dataDeserialize(blockId, buffer))
    }
}

1.5数据写入过程分析

数据写入的简要流程
1)RDD.iterator是与storage子系统交互的入口
2)CacheManager.getOrCompute调用BlockManager的put接口来写入数据
3)数据优先写入到MemoryStore即内存，如果MemoryStore中的数据已满则将最近使用次数不频繁的数据写入到磁盘
4)通知BlockManagerMaster有新的数据写入，在BlockManagerMaster中保存元数据
5)将写入的数据与其它slave worker进行同步，一般来说在本机写入的数据，都会另先一台机器来进行数据的备份，即replicanumber=1
其实，我们在put和get block的时候并没有那么复杂，前面的细节BlockManager都包装好了，我们只需要调用BlockManager中的put和get函数即可。

def putBytes(
           blockId: BlockId,
           bytes: ByteBuffer,
           level: StorageLevel,
           tellMaster: Boolean = true,
           effectiveStorageLevel: Option[StorageLevel] = None): Seq[(BlockId, BlockStatus)] = {
       require(bytes != null, "Bytes is null")
       doPut(blockId, ByteBufferValues(bytes), level, tellMaster, effectiveStorageLevel)
   }
   private def doPut(
           blockId: BlockId,
           data: BlockValues,
           level: StorageLevel,
           tellMaster: Boolean = true,
           effectiveStorageLevel: Option[StorageLevel] = None)
: Seq[(BlockId, BlockStatus)] = {

       require(blockId != null, "BlockId is null")
       require(level != null && level.isValid, "StorageLevel is null or invalid")
       effectiveStorageLevel.foreach { level =>
           require(level != null && level.isValid, "Effective StorageLevel is null or invalid")
       }

       val updatedBlocks = new ArrayBuffer[(BlockId, BlockStatus)]

       val putBlockInfo = {
               val tinfo = new BlockInfo(level, tellMaster)
               val oldBlockOpt = blockInfo.putIfAbsent(blockId, tinfo)
       if (oldBlockOpt.isDefined) {
           if (oldBlockOpt.get.waitForReady()) {
               logWarning(s"Block $blockId already exists on this machine; not re-adding it")
               return updatedBlocks
           }
           oldBlockOpt.get
       } else {
           tinfo
       }
}

       val startTimeMs = System.currentTimeMillis

       var valuesAfterPut: Iterator[Any] = null

       var bytesAfterPut: ByteBuffer = null

       var size = 0L

       val putLevel = effectiveStorageLevel.getOrElse(level)

       val replicationFuture = data match {
           case b: ByteBufferValues if putLevel.replication > 1 =>
               // Duplicate doesn't copy the bytes, but just creates a wrapper
               val bufferView = b.buffer.duplicate()
               Future {
               replicate(blockId, bufferView, putLevel)
           }(futureExecutionContext)
           case _ => null
       }

       putBlockInfo.synchronized {
           logTrace("Put for block %s took %s to get into synchronized block"
                   .format(blockId, Utils.getUsedTimeMs(startTimeMs)))

           var marked = false
           try {
               val (returnValues, blockStore: BlockStore) = {
                   if (putLevel.useMemory) {
                       (true, memoryStore)
                   } else if (putLevel.useOffHeap) {
                       (false, externalBlockStore)
                   } else if (putLevel.useDisk) {
                       (putLevel.replication > 1, diskStore)
                   } else {
                       assert(putLevel == StorageLevel.NONE)
                       throw new BlockException(
                               blockId, s"Attempted to put block $blockId without specifying storage level!")
                   }
               }

               val result = data match {
                   case IteratorValues(iterator) =>
                       blockStore.putIterator(blockId, iterator, putLevel, returnValues)
                   case ArrayValues(array) =>
                       blockStore.putArray(blockId, array, putLevel, returnValues)
                   case ByteBufferValues(bytes) =>
                       bytes.rewind()
                       blockStore.putBytes(blockId, bytes, putLevel)
               }
               size = result.size
               result.data match {
                   case Left (newIterator) if putLevel.useMemory => valuesAfterPut = newIterator
                   case Right (newBytes) => bytesAfterPut = newBytes
                   case _ =>
               }

               if (putLevel.useMemory) {
                   result.droppedBlocks.foreach { updatedBlocks += _ }
               }

               val putBlockStatus = getCurrentBlockStatus(blockId, putBlockInfo)
               if (putBlockStatus.storageLevel != StorageLevel.NONE) {
                   marked = true
                   putBlockInfo.markReady(size)
                   if (tellMaster) {
                       reportBlockStatus(blockId, putBlockInfo, putBlockStatus)
                   }
                   updatedBlocks += ((blockId, putBlockStatus))
               }
           } finally {
               if (!marked) {
                   blockInfo.remove(blockId)
                   putBlockInfo.markFailure()
                   logWarning(s"Putting block $blockId failed")
               }
           }
       }
       logDebug("Put block %s locally took %s".format(blockId, Utils.getUsedTimeMs(startTimeMs)))

       if (putLevel.replication > 1) {
           data match {
               case ByteBufferValues(bytes) =>
                   if (replicationFuture != null) {
                       Await.ready(replicationFuture, Duration.Inf)
                   }
               case _ =>
                   val remoteStartTime = System.currentTimeMillis
                   if (bytesAfterPut == null) {
                       if (valuesAfterPut == null) {
                           throw new SparkException(
                                   "Underlying put returned neither an Iterator nor bytes! This shouldn't happen.")
                       }
                       bytesAfterPut = dataSerialize(blockId, valuesAfterPut)
                   }
                   replicate(blockId, bytesAfterPut, putLevel)
                   logDebug("Put block %s remotely took %s"
                           .format(blockId, Utils.getUsedTimeMs(remoteStartTime)))
           }
       }

       BlockManager.dispose(bytesAfterPut)

       if (putLevel.replication > 1) {
           logDebug("Putting block %s with replication took %s"
                   .format(blockId, Utils.getUsedTimeMs(startTimeMs)))
       } else {
           logDebug("Putting block %s without replication took %s"
                   .format(blockId, Utils.getUsedTimeMs(startTimeMs)))
       }

       updatedBlocks
   }

对于doPut函数，主要做了以下几个操作
创建BlockInfo对象存储block信息；
将BlockInfo加锁，然后根据Storage Level判断存储到Memory还是Disk。同时，对于已经准备好读的BlockInfo要进行解锁。
根据block的副本数量决定是否向远程发送副本。

1.5.1序列化与否

写入的具体内容可以是序列化之后的bytes也可以是没有序列化的value. 此处有一个对scala的语法中Either, Left, Right关键字的理解。

1.6数据读取过程分析

def get(blockId: BlockId): Option[Iterator[Any]] = {
    val local = getLocal(blockId)
    if (local.isDefined) {
        logInfo("Found block %s locally".format(blockId))
        return local
    }
    val remote = getRemote(blockId)
    if (remote.isDefined) {
        logInfo("Found block %s remotely".format(blockId))
        return remote
    }
    None
}

1.6.1本地读取

首先在查询本机的MemoryStore和DiskStore中是否有所需要的block数据存在，如果没有则发起远程数据获取。

1.6.2远程读取

远程获取调用路径， getRemote->doGetRemote, 在doGetRemote中最主要的就是调用BlockManagerWorker.syncGetBlock来从远程获得数据

def syncGetBlock(msg: GetBlock, toConnManagerId: ConnectionManagerId): ByteBuffer = {
    val blockManager = blockManagerWorker.blockManager
    val connectionManager = blockManager.connectionManager
    val blockMessage = BlockMessage.fromGetBlock(msg)
    val blockMessageArray = new BlockMessageArray(blockMessage)
    val responseMessage = connectionManager.sendMessageReliablySync(
            toConnManagerId, blockMessageArray.toBufferMessage)
    responseMessage match {
        case Some(message) => {
            val bufferMessage = message.asInstanceOf[BufferMessage]
            logDebug("Response message received " + bufferMessage)
            BlockMessageArray.fromBufferMessage(bufferMessage).foreach(blockMessage => {
                    logDebug("Found " + blockMessage)
            return blockMessage.getData
      })
        }
        case None => logDebug("No response message received")
    }
    null
}

上述这段代码中最有意思的莫过于sendMessageReliablySync，远程数据读取毫无疑问是一个异步i/o操作，这里的代码怎么写起来就像是在进行同步的操作一样呢。也就是说如何知道对方发送回来响应的呢？
别急，继续去看看sendMessageReliablySync的定义

def sendMessageReliably(connectionManagerId: ConnectionManagerId, message: Message)
  : Future[Option[Message]] = {
    val promise = Promise[Option[Message]]
    val status = new MessageStatus(
            message, connectionManagerId, s => promise.success(s.ackMessage))
    messageStatuses.synchronized {
        messageStatuses += ((message.id, status))
    }
    sendMessage(connectionManagerId, message)
    promise.future
}

要是我说秘密在这里，你肯定会说我在扯淡，但确实在此处。注意到关键字Promise和Future没。
如果这个future执行完毕，返回s.ackMessage。我们再看看这个ackMessage是在什么地方被写入的呢。看一看ConnectionManager.handleMessage中的代码片段

case bufferMessage: BufferMessage =>

{
    if (authEnabled) {
        val res = handleAuthentication(connection, bufferMessage)
        if (res == true) {
            // message was security negotiation so skip the rest
            logDebug("After handleAuth result was true, returning")
            return
        }
    }
    if (bufferMessage.hasAckId) {
        val sentMessageStatus = messageStatuses. synchronized {
            messageStatuses.get(bufferMessage.ackId) match {
                case Some(status) =>{
                    messageStatuses -= bufferMessage.ackId
                    status
                }
                case None =>{
                    throw new Exception("Could not find reference for received ack message " +
                            message.id)
                    null
                }
            }
        }
        sentMessageStatus. synchronized {
            sentMessageStatus.ackMessage = Some(message)
            sentMessageStatus.attempted = true
            sentMessageStatus.acked = true
            sentMessageStaus.markDone()
        }
    }
}

注意，此处的所调用的sentMessageStatus.markDone就会调用在sendMessageReliablySync中定义的promise.Success. 不妨看看MessageStatus的定义。

class MessageStatus(
val message: Message,
val connectionManagerId: ConnectionManagerId,
completionHandler: MessageStatus => Unit) {

    var ackMessage: Option[Message] = None
    var attempted = false
    var acked = false

    def markDone() { completionHandler(this) }
}

1.7Partition如何转化为Block

在storage模块里面所有的操作都是和block相关的，但是在RDD里面所有的运算都是基于partition的，那么partition是如何与block对应上的呢？
RDD计算的核心函数是iterator()函数：

final def iterator(split: Partition, context: TaskContext): Iterator[T] = {
    if (storageLevel != StorageLevel.NONE) {
        SparkEnv.get.cacheManager.getOrCompute(this, split, context, storageLevel)
    } else {
        computeOrReadCheckpoint(split, context)
    }
}

如果当前RDD的storage level不是NONE的话，表示该RDD在BlockManager中有存储，那么调用CacheManager中的getOrCompute()函数计算RDD，在这个函数中partition和block发生了关系：
首先根据RDD id和partition index构造出block id (rdd_xx_xx)，接着从BlockManager中取出相应的block。
如果该block存在，表示此RDD在之前已经被计算过和存储在BlockManager中，因此取出即可，无需再重新计算。
如果该block不存在则需要调用RDD的computeOrReadCheckpoint()函数计算出新的block，并将其存储到BlockManager中。
需要注意的是block的计算和存储是阻塞的，若另一线程也需要用到此block则需等到该线程block的loading结束。

def getOrCompute[T](rdd:RDD[T],split:Partition,context:TaskContext,storageLevel:StorageLevel):Iterator[T]=
{
    val key = "rdd_%d_%d".format(rdd.id, split.index)
    logDebug("Looking for partition " + key)
    blockManager.get(key) match {
    case Some(values) =>
        // Partition is already materialized, so just return its values
        return values.asInstanceOf[Iterator[T]]

    case None =>
        // Mark the split as loading (unless someone else marks it first)
        loading. synchronized {
        if (loading.contains(key)) {
            logInfo("Another thread is loading %s, waiting for it to finish...".format(key))
            while (loading.contains(key)) {
                try {
                    loading.wait()
                } catch {
                    case _:
                        Throwable =>}
            }
            logInfo("Finished waiting for %s".format(key))
            // See whether someone else has successfully loaded it. The main way this would fail
            // is for the RDD-level cache eviction policy if someone else has loaded the same RDD
            // partition but we didn't want to make space for it. However, that case is unlikely
            // because it's unlikely that two threads would work on the same RDD partition. One
            // downside of the current code is that threads wait serially if this does happen.
            blockManager.get(key) match {
                case Some(values) =>
                    return values.asInstanceOf[Iterator[T]]
                case None =>
                    logInfo("Whoever was loading %s failed; we'll try it ourselves".format(key))
                    loading.add(key)
            }
        } else {
            loading.add(key)
        }
    }
    try {
        // If we got here, we have to load the split
        logInfo("Partition %s not found, computing it".format(key))
        val computedValues = rdd.computeOrReadCheckpoint(split, context)
        // Persist the result, so long as the task is not running locally
        if (context.runningLocally) {
            return computedValues
        }
        val elements = new ArrayBuffer[Any]
        elements++ = computedValues
        blockManager.put(key, elements, storageLevel, true)
        return elements.iterator.asInstanceOf[Iterator[T]]
    } finally {
        loading. synchronized {
            loading.remove(key)
            loading.notifyAll()
        }
    }
}

这样RDD的transformation、action就和block数据建立了联系，虽然抽象上我们的操作是在partition层面上进行的，但是partition最终还是被映射成为block，因此实际上我们的所有操作都是对block的处理和存取。

1.8partition和block的对应关系

在RDD中，核心的函数是iterator：

final def iterator(split: Partition, context: TaskContext): Iterator[T] = {
    if (storageLevel != StorageLevel.NONE) {
        SparkEnv.get.cacheManager.getOrCompute(this, split, context, storageLevel)
    } else {
        computeOrReadCheckpoint(split, context)
    }
}

如果当前RDD的storage level不是NONE的话，表示该RDD在BlockManager中有存储，那么调用CacheManager中的getOrCompute函数计算RDD，在这个函数中partition和block就对应起来了：
getOrCompute函数会先构造RDDBlockId，其中RDDBlockId就把block和partition联系起来了，RDDBlockId产生的name就是BlockId的name属性，形式是：rdd_rdd.id_partition.index。

def getOrCompute[T](
rdd: RDD[T],
partition: Partition,
context: TaskContext,
storageLevel: StorageLevel): Iterator[T] = {

    val key = RDDBlockId(rdd.id, partition.index)
    logDebug(s"Looking for partition $key")
    blockManager.get(key) match {
        case Some(blockResult) =>
            val existingMetrics = context.taskMetrics
                    .getInputMetricsForReadMethod(blockResult.readMethod)
            existingMetrics.incBytesRead(blockResult.bytes)

            val iter = blockResult.data.asInstanceOf[Iterator[T]]
            new InterruptibleIterator[T](context, iter) {
            override def next(): T = {
                    existingMetrics.incRecordsRead(1)
                    delegate.next()
            }
        }
        case None =>
            val storedValues = acquireLockForPartition[T](key)
            if (storedValues.isDefined) {
                return new InterruptibleIterator[T](context, storedValues.get)
            }

            try {
                logInfo(s"Partition $key not found, computing it")
                val computedValues = rdd.computeOrReadCheckpoint(partition, context)

                if (context.isRunningLocally) {
                    return computedValues
                }

                val updatedBlocks = new ArrayBuffer[(BlockId, BlockStatus)]
                val cachedValues = putInBlockManager(key, computedValues, storageLevel, updatedBlocks)
                val metrics = context.taskMetrics
                val lastUpdatedBlocks = metrics.updatedBlocks.getOrElse(Seq[(BlockId, BlockStatus)]())
                metrics.updatedBlocks = Some(lastUpdatedBlocks ++ updatedBlocks.toSeq)
                new InterruptibleIterator(context, cachedValues)

            } finally {
                loading.synchronized {
                    loading.remove(key)
                    loading.notifyAll()
                }
            }
    }
}

同时getOrCompute函数会对block进行判断：
如果该block存在，表示此RDD在之前已经被计算过和存储在BlockManager中，因此取出即可，无需再重新计算。
如果该block不存在则需要调用RDD的computeOrReadCheckpoint()函数计算出新的block，并将其存储到BlockManager中。
需要注意的是block的计算和存储是阻塞的，若另一线程也需要用到此block则需等到该线程block的loading结束。

你可能感兴趣的:(大数据之Spark,spark,microsoft,大数据)

SAP之顾问篇 FF.5电子银行对账单 SAP圣父 SAP
直接上干货一.配置财务会计→银行会计核算→业务往来→支付交易→电子银行对账单→进行电子银行对账单的全局设置1.创建科目符号例:ZS012.对科目符号分配科目给ZS01设置总账科目3.创建过账规则码例:Z0014.定义过账规则给Z001设置借贷方过账代码，借贷方科目5.创建业务类型例:ZT016.对过账规则分配外部事务类型给ZT01设置外部交易码(※2)，设置过账规则:Z0017.对事务类型分配银行
尚硅谷电商数仓6.0，hive on spark,spark启动不了新时代赚钱战士 hive spark hadoop
在datagrip执行分区插入语句时报错[42000][40000]Errorwhilecompilingstatement:FAILED:SemanticExceptionFailedtogetasparksession:org.apache.hadoop.hive.ql.metadata.HiveException:FailedtocreateSparkclientforSparksessio
oracle基础知识之表的集合运算数字天下 oracle 数据库
一个查询就是一个集合：查询的结果集一条记录就是一个元素。集合运算是用来把两个或多个查询的结果集做并、交、查的集合运算，包含集合运算的查询称为复合查询。*Select基本语法如下：SELECTcolumn_1,column_2,…FROMtable_nameWHEREsearch_conditionORDERBYcolumn_1,column_2;2.常用集合运算方式的应用（1）联合运算：联合运算实
Elasticsearch 介绍：分布式搜索与分析引擎吱屋猪_ elasticsearch
在如今大数据时代，企业和开发者面临着前所未有的数据量和实时性要求。为了能够高效地处理、存储和查询这些数据，Elasticsearch作为一种强大的分布式搜索引擎，已经成为了很多组织和开发者的首选解决方案。1.什么是Elasticsearch？Elasticsearch是一个开源的、基于ApacheLucene构建的全文搜索引擎。它提供了高效的搜索功能，并且非常适合处理大量数据，尤其是在需要快速搜索
K8S学习之基础四十：配置altermanager发送告警到钉钉群云上艺旅 K8S学习 kubernetes 学习钉钉 prometheus 云原生容器
配置altermanager发送告警到钉钉群创建钉钉群，设置机器人助手(必须是管理员才能设置)，获取webhookwebhook：https://oapi.dingtalk.com/robot/send?access_token=25bed933a52d69f192347b5be4b2193bc0b257a6d9ae68d81619e3ae3d93f7c6#创建cm，配置钉钉群信息vialertm
数仓建模—Data Warebase AI 时代数据平台应当的样子不二人生数仓建模人工智能数据仓库数仓建模
DataWarebaseAI时代数据平台应当的样子引言：在这个AI技术飞速发展的时代，我们有能力更深入地发掘数据潜在的价值，而数据处理不应当成为阻碍。云原生分布式DataWarebase将开启处理数据的新范式，它让数据的使用返璞归真，不论是存储还是查询，一个系统满足业务全方位数据需求。打破复杂数据架构的束缚，大大降低数据的使用门槛，释放数据潜能，让数据涌现智能。背景近二十年大数据发展史2002年我
Flink 通过 Chunjun Oracle LogMiner 实时读取 Oracle 变更日志并写入 Doris 的方案 roman_日积跬步-终至千里 #flink 实战 flink oracle 大数据
文章目录一、技术背景二、关键技术1、OracleLogMiner2、Chunjun的LogMiner关键流程3、修复ChunjunOracleLogMiner问题一、技术背景在大数据实时同步场景中，需要将Oracle数据库的变更数据（CDC）采集并写入ApacheDoris，以支持数据分析、BI报表、实时数据仓库等应用。本方案基于Flink+Chunjun，通过OracleLogMiner解析Re
【第11章】亿级电商平台订单系统-海量数据架构设计 cherry5230 架构系统架构架构分布式
1-1本章导学课程导学课程定位：大型系统架构设计核心难点解析核心项目：BToB电商平台订单系统（年交易额200亿级）本章知识体系1.核心概念辨析海量数据vs大数据本质区别解析常见认知误区说明2.方法论框架海量数据处理核心思想分布式计算原理数据分片策略弹性扩展机制3.数据库架构设计方法论体系读写分离模式分库分表策略数据分区方案缓存层设计4.数据处理体系海量数据处理之道批处理与流处理数据压缩技术异步处
python之gmsh划分网格老歌老听老掉牙 python有限元分析 python 开发语言 gmsh 划分网格
Gmsh（GeometryModelingandMeshingSuite）是一个开源的三维有限元网格生成器，它集成了内置的CAD引擎和后处理器。Gmsh的设计目标是提供一个快速、轻量级且用户友好的网格工具，同时具备参数化输入和高级可视化能力。Gmsh围绕几何（geometry）、网格（mesh）、求解器（solver）和后处理（post-processing）四个模块构建，用户可以通过图形用户界面
NET Core 大数据处理 Gene Z .Net C#c#
在.NETCore里处理10万条以上的大数据时，可采用以下几种方式，同时也适用于不同的应用场景。1.批量处理方式借助批量操作一次性处理大量数据，从而减少与数据库或外部系统的交互次数，提高性能。例如，在向数据库插入大量数据时，可使用批量插入操作。应用场景适用于数据导入、数据迁移等场景。比如将CSV文件中的大量数据批量导入到数据库中。2.并行处理方式运用并行编程技术（像Parallel.ForEach
火山云与腾讯云的优势对比苹果企业签名分发腾讯云云计算
首先，我需要确定用户的需求是什么。可能他们是在选择云服务提供商，或者在做市场调研。用户可能是企业的IT决策者，或者是开发人员，需要了解哪个平台更适合他们的项目。接下来，我得收集火山云和腾讯云的基本信息。火山云是字节跳动旗下的，虽然进入市场较晚，但可能有字节的技术支持，比如大数据和AI方面的优势。腾讯云作为老牌厂商，生态完善，产品线全，尤其在游戏、社交等领域有优势。需要对比的方面包括：背景与市场地位
Flume与Couchbase集成原理与实例 AI大模型应用之禅 DeepSeek R1 &AI大模型与大数据计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
Flume与Couchbase集成原理与实例作者：禅与计算机程序设计艺术/ZenandtheArtofComputerProgramming1.背景介绍1.1问题的由来随着大数据时代的到来，企业对数据存储和处理的效率要求越来越高。在数据采集、存储、处理和分析的各个环节，都需要高效、可靠的技术支持。Flume和Couchbase正是这样两种优秀的工具，前者擅长于数据采集和传输，后者擅长于键值存储和文
java八股文之常见的集合 qq_45923849 java 开发语言
一、数组的索引为什么从0开始？寻址公式：数组的首地址+索引乘以存储数据的类型大小在根据数组索引获取元素的时候，会用索引和寻址公式来计算内存所对应的元素数据。如果数组的索引从1开始，寻址公式中，就需要增加一次减法操作（数组的首地址-1），对于CPU来说就多了一次指令，性能会降低。二、数组进行查找操作的时间复杂度如果是通过下标，查询的时间复杂度是O(1)如果不通过下标，和使用的查找方式有关–从头往后顺
Lodash源码分析-every,some,size,includes 初学者7. Loadsh源码分析 javascript 前端
collection相关的函数，collection指的是一组用于处理集合（如数组或对象）的工具函数。lodash源码研读之every,some,size,includes一、源码地址GitHub地址:GitHub-lodash/lodash:AmodernJavaScriptutilitylibrarydeliveringmodularity,performance,&extras.官方文档地址
Lodash源码分析-uniq,uniqBy,uniqWith 初学者7. Loadsh源码分析 javascript 前端
lodash源码研读之uniq,uniqBy,uniqWith一、源码地址GitHub地址:GitHub-lodash/lodash:AmodernJavaScriptutilitylibrarydeliveringmodularity,performance,&extras.官方文档地址:Lodash官方文档二、结构分析uniq,uniqBy,uniqWith基于baseUniq模块。三、函数介
【SoC基础】单片机之寄存器解析望闻问嵌 #SoC 单片机嵌入式硬件
：如果你也对机器人、人工智能感兴趣，看来我们志同道合✨：不妨浏览一下我的博客主页【https://blog.csdn.net/weixin_51244852】：文章若有幸对你有帮助，可点赞收藏⭐不迷路：内容若有错误，敬请留言指正！原创文，转载注明出处文章目录1、寄存器位置2、寄存器种类2.1通用用途寄存器2.2CPU执行相关寄存器2.3外设控制寄存器3.寄存器在CPU访问外设过程中起到的作用1、寄
大模型时代的知识焦虑机载软件与适航机器学习-建模算法-代理模型人工智能大数据
引言：浪潮之巅，焦虑暗涌大模型时代已经浩荡而来，如同奔腾的浪潮，以令人惊叹的速度重塑着世界的面貌。从智能客服的温声细语，到AI绘画的妙笔生花，再到自动驾驶的日趋成熟，大型语言模型、图像模型等人工智能技术以前所未有的姿态，渗透进我们生活的方方面面。信息获取前所未有的便捷，知识创造空前高效，人机交互焕然一新，一个充满无限可能的智能化未来似乎触手可及。然而，在这令人眼花缭乱的技术盛景之下，一股无形的焦虑
大数据最新大数据StarRocks(七)：数据表创建(2) 2401_84182271 程序员大数据
2.1表分为内部表和外部表默认未内部表，3.0版本开始集成外部数据建议使用catalog，外部表的建表方式将被弃用2.2列定义语法：col_namecol_type[agg_type][NULL|NOTNULL][DEFAULT"default\_value"][AUTO_INCREMENT][ASgeneration_expr]col_name：列名称注意，在一般情况下，不能直接创建以以__op
数学领域的跨时代进化与升级：从公理化到智能化的破茧之路夏末之花算法
作者：夏末之花|发布时间：2025-03-16|阅读量：10万+|点赞数：5.6万引言：数学的“破茧时刻”与文明跃迁人类历史上，数学的每一次重大突破都像一次“破茧时刻”，推动文明跨越式发展。从古希腊的几何公理化到牛顿的微积分，再到20世纪的计算机理论，数学始终是科学革命的基石。而在21世纪的今天，随着量子计算、人工智能、生物信息等技术的爆发，数学正迎来新一轮的进化与升级——从纯粹的逻辑工具，演变为
大话C++之：左右值引用和std::move Kelvin7_Feng c++
大话C++之：左右值引用和std::move什么是左值和右值什么是左值引用和右值引用std::move的应用场景在C++11引入右值引用后，一直对其使用缺乏深入理解，特别是结合std::move移动语义。恰逢最近工作里有相关优化代码使用到，可以趁机会重新学习，加深理解。什么是左值和右值从命名来理解，既然命名区分左右，左右值是相对于赋值号“=”来作锚点。左值(LValue)：可以位于等号左边，有持久
K8S学习之基础三十六：node-exporter部署云上艺旅 K8S学习 kubernetes 学习贪心算法 prometheus 云原生
Prometheusv2.2.1编写yaml文件，包含创建ns、configmap、deployment、service#创建monitoring空间viprometheus-ns.yamlapiVersion:v1kind:Namespacemetadata:name:monitoring#创建SA并绑定权限kubectlcreateserviceaccountmonitor-nmonitori
【项目实战】Redis常见问题之缓存击穿、缓存穿透、缓存雪崩本本本添哥 004 -数据库 003 -中间件缓存 redis spring
Redis作为一款流行的内存数据存储系统，经常被用作缓存来提高应用的性能。然而，在使用Redis作为缓存时，可能会遇到一些问题，如缓存击穿、缓存穿透和缓存雪崩。这些问题可能导致系统性能下降甚至服务不可用。下面是对这三种常见问题的简要解释及解决方案，每种方案都有其适用场景与限制条件，在实际应用中需要根据具体情况选择最合适的方法来优化系统性能并保障稳定性。此外，合理的架构设计以及对业务逻辑的理解也是有
WPF使用MVVM模式开发 pluto li .net .net
本文用到的有：WPF（.net5）Microsoft.Toolkit.Mvvm按钮不带参数/带参数点击事件绑定文本框Text绑定，点击事件绑定步骤如下：创建wpf项目：WpfMVVM创建Views、ViewModels两个文件夹nuget添加Microsoft.Toolkit.Mvvm在ViewModels文件夹添加类MainViewModelusingMicrosoft.Toolkit.Mvvm
组件化开发之02 cocoapods 远程私有库 dzb1060545231 iOS 开发专栏免费
上一讲我们讲到了如何创建本地私有仓库,关于远程私有库就是我们按照cocoapods的一些规范创建一个自己的私有索引文件库和一个自己的私有库代码仓库,私有索引库存放我们私有库的podspec索引文件,后边更改了私有库版本内容,就将私有库的podsepc文件提交到这个私有索引库仓库里.接下来我会具体的讲解如何去生成这样一个远程私有索引库仓库,方便公司内部开发人员去使用这个远程私有库.///这是笔者电脑
计算机专业毕业设计题目推荐（新颖选题）本科计算机科学专业相关毕业设计选题大全✅ 会写代码的羊毕设选题课程设计计算机网络毕设选题毕设系统毕设题目计算机科学专业
文章目录前言最新毕设选题（建议收藏起来）本科计算机科学专业相关的毕业设计选题毕设作品推荐前言2025全新毕业设计项目博主介绍：✌全网粉丝10W+,CSDN全栈领域优质创作者，博客之星、掘金/华为云/阿里云等平台优质作者。技术范围：SpringBoot、Vue、SSM、HLMT、Jsp、PHP、Nodejs、Python、爬虫、数据可视化、小程序、大数据、机器学习等设计与开发。主要内容：免费功能设计
第六章第六节：C++STL之priority_queue（优先级队列）和仿函数快乐江湖队列 c++queue 优先级队列栈
pdf获取：7281文章目录一：priority_queue（优先级队列）（1）堆与堆排序（2）基本使用（3）“TOPK”问题（4）模拟实现二：仿函数（1）仿函数是什么（2）使用仿函数完成大顶堆和小顶堆的构建一：priority_queue（优先级队列）priority_queue（优先级队列）：在头文件中，除了基本的queue外，还有一个特殊的priority_queue，翻译过来是优先级队列的
python学习笔记之异常（内置标准异常总结） Molly_DD Python学习笔记 python 软件测试
python异常处理机制异常处理是python的一种高级工具，当异常发生时，程序会停止当前的所有工作，跳转到异常处理部分去执行。异常既可以是程序错误引发的，也可以由代码主动触发。异常处理基本结构try:可能引发异常的代码except异常类型名称：异常处理代码else：没有发生异常时执行的代码异常报错：try：classtest:defgetdata(self):returnself.datay=t
智慧交通是什么，可以帮助我们解决什么问题? Guheyunyi 运维大数据人工智能信息可视化前端
智慧交通是什么？智慧交通（SmartTransportation）是指利用物联网（IoT）、大数据、人工智能（AI）、云计算、5G通信等先进技术，对交通系统进行智能化管理和优化，以提高交通效率、减少拥堵、降低事故率、提升出行体验，并实现交通资源的合理配置和可持续发展。智慧交通的核心是通过数据采集、分析和应用，实现交通系统的智能化、自动化和协同化，从而构建一个高效、安全、绿色、便捷的交通生态系统。智
【C++篇】深入剖析C++ Vector底层源码及实现机制 far away4002 C++c++开发语言 vector visual studio vscode
文章目录须知欢迎讨论：如果你在学习过程中有任何问题或想法，欢迎在评论区留言，我们一起交流学习。你的支持是我继续创作的动力！点赞、收藏与分享：觉得这篇文章对你有帮助吗？别忘了点赞、收藏并分享给更多的小伙伴哦！你们的支持是我不断进步的动力！分享给更多人：如果你觉得这篇文章对你有帮助，欢迎分享给更多对C++感兴趣的朋友，让我们一起进步！全面剖析vector底层及实现机制接上篇：【C++篇】探索STL之美
批处理脚本编译vs工程感叹号的豆浆 c++batch命令
使用脚本直接编译vs工程，减少操作步骤，快速编译执行代码如下@colorb@echoenvironmentinit…@SETVARTOOL=“C:\ProgramFiles(x86)\MicrosoftVisualStudio\2017\Enterprise\Common7\Tools\vsdevcmd\ext”@SETCOMPILETOOL=“C:\ProgramFiles(x86)\Micro
基本数据类型和引用类型的初始值 3213213333332132 java基础
package com.array; /** * @Description 测试初始值 * @author FuJianyong * 2015-1-22上午10:31:53 */ public class ArrayTest { ArrayTest at; String str; byte bt; short s; int i; long
摘抄笔记--《编写高质量代码：改善Java程序的151个建议》白糖_ 高质量代码
记得3年前刚到公司，同桌同事见我无事可做就借我看《编写高质量代码：改善Java程序的151个建议》这本书，当时看了几页没上心就没研究了。到上个月在公司偶然看到，于是乎又找来看看，我的天，真是非常多的干货，对于我这种静不下心的人真是帮助莫大呀。看完整本书，也记了不少笔记
【备忘】Django 常用命令及最佳实践 dongwei_6688 django
注意：本文基于 Django 1.8.2 版本生成数据库迁移脚本（python 脚本） python manage.py makemigrations polls 说明：polls 是你的应用名字，运行该命令时需要根据你的应用名字进行调整查看该次迁移需要执行的 SQL 语句（只查看语句，并不应用到数据库上）： python manage.p
阶乘算法之一N! 末尾有多少个零周凡杨 java 算法阶乘面试效率
&n
spring注入servlet g21121 Spring注入
传统的配置方法是无法将bean或属性直接注入到servlet中的，配置代理servlet亦比较麻烦，这里其实有比较简单的方法，其实就是在servlet的init()方法中加入要注入的内容： ServletContext application = getServletContext(); WebApplicationContext wac = WebApplicationContextUtil
Jenkins 命令行操作说明文档 510888780 centos
假设Jenkins的URL为http://22.11.140.38:9080/jenkins/ 基本的格式为 java 基本的格式为 java -jar jenkins-cli.jar [-s JENKINS_URL] command [options][args] 下面具体介绍各个命令的作用及基本使用方法 1. &nb
UnicodeBlock检测中文用法布衣凌宇 UnicodeBlock
/** * 判断输入的是汉字 */ public static boolean isChinese(char c) { Character.UnicodeBlock ub = Character.UnicodeBlock.of(c);
java下实现调用oracle的存储过程和函数 aijuans java orale
1.创建表：STOCK_PRICES 2.插入测试数据： 3.建立一个返回游标： PKG_PUB_UTILS 4.创建和存储过程：P_GET_PRICE 5.创建函数： 6.JAVA调用存储过程返回结果集 JDBCoracle10G_INVO
Velocity Toolbox antlove 模板 tool box velocity
velocity.VelocityUtil package velocity; import org.apache.velocity.Template; import org.apache.velocity.app.Velocity; import org.apache.velocity.app.VelocityEngine; import org.apache.velocity.c
JAVA正则表达式匹配基础百合不是茶 java 正则表达式的匹配
正则表达式;提高程序的性能,简化代码,提高代码的可读性,简化对字符串的操作正则表达式的用途; 字符串的匹配字符串的分割字符串的查找字符串的替换正则表达式的验证语法 [a] //[]表示这个字符只出现一次 ,[a] 表示a只出现一
是否使用EL表达式的配置 bijian1013 jsp web.xml EL EasyTemplate
今天在开发过程中发现一个细节问题，由于前端采用EasyTemplate模板方法实现数据展示，但老是不能正常显示出来。后来发现竟是EL将我的EasyTemplate的${...}解释执行了，导致我的模板不能正常展示后台数据。网
精通Oracle10编程SQL(1-3)PLSQL基础 bijian1013 oracle 数据库 plsql
--只包含执行部分的PL/SQL块 --set serveroutput off begin dbms_output.put_line('Hello,everyone!'); end; select * from emp; --包含定义部分和执行部分的PL/SQL块 declare v_ename varchar2(5); begin select
【Nginx三】Nginx作为反向代理服务器 bit1129 nginx
Nginx一个常用的功能是作为代理服务器。代理服务器通常完成如下的功能：接受客户端请求将请求转发给被代理的服务器从被代理的服务器获得响应结果把响应结果返回给客户端实例本文把Nginx配置成一个简单的代理服务器对于静态的html和图片，直接从Nginx获取对于动态的页面，例如JSP或者Servlet，Nginx则将请求转发给Res
Plugin execution not covered by lifecycle configuration: org.apache.maven.plugin blackproof maven 报错
转：http://stackoverflow.com/questions/6352208/how-to-solve-plugin-execution-not-covered-by-lifecycle-configuration-for-sprin maven报错： Plugin execution not covered by lifecycle configuration:
发布docker程序到marathon ronin47 docker 发布应用
1 发布docker程序到marathon 1.1 搭建私有docker registry 1.1.1 安装docker regisry docker pull docker-registry docker run -t -p 5000:5000 docker-registry 下载docker镜像并发布到私有registry docker pull consol/tomcat-8.0
java-57-用两个栈实现队列&&用两个队列实现一个栈 bylijinnan java
import java.util.ArrayList; import java.util.List; import java.util.Stack; /* * Q 57 用两个栈实现队列 */ public class QueueImplementByTwoStacks { private Stack<Integer> stack1; pr
Nginx配置性能优化 cfyme nginx
转载地址：http://blog.csdn.net/xifeijian/article/details/20956605 大多数的Nginx安装指南告诉你如下基础知识——通过apt-get安装，修改这里或那里的几行配置，好了，你已经有了一个Web服务器了。而且，在大多数情况下，一个常规安装的nginx对你的网站来说已经能很好地工作了。然而，如果你真的想挤压出Nginx的性能，你必
[JAVA图形图像]JAVA体系需要稳扎稳打,逐步推进图像图形处理技术 comsci java
对图形图像进行精确处理，需要大量的数学工具，即使是从底层硬件模拟层开始设计，也离不开大量的数学工具包，因为我认为，JAVA语言体系在图形图像处理模块上面的研发工作，需要从开发一些基础的，类似实时数学函数构造器和解析器的软件包入手，而不是急于利用第三方代码工具来实现一个不严格的图形图像处理软件...... &nb
MonkeyRunner的使用 dai_lm android MonkeyRunner
要使用MonkeyRunner，就要学习使用Python，哎先抄一段官方doc里的代码作用是启动一个程序（应该是启动程序默认的Activity），然后按MENU键，并截屏 # Imports the monkeyrunner modules used by this program from com.android.monkeyrunner import MonkeyRun
Hadoop-- 海量文件的分布式计算处理方案 datamachine mapreduce hadoop 分布式计算
csdn的一个关于hadoop的分布式处理方案，存档。原帖：http://blog.csdn.net/calvinxiu/article/details/1506112。 Hadoop 是Google MapReduce的一个Java实现。MapReduce是一种简化的分布式编程模式，让程序自动分布到一个由普通机器组成的超大集群上并发执行。就如同ja
以資料庫驗證登入 dcj3sjt126com yii
以資料庫驗證登入由於 Yii 內定的原始框架程式, 採用綁定在UserIdentity.php 的 demo 與 admin 帳號密碼: public function authenticate() { $users=array( &nbs
github做webhooks：[2]php版本自动触发更新 dcj3sjt126com github git webhooks
上次已经说过了如何在github控制面板做查看url的返回信息了。这次就到了直接贴钩子代码的时候了。工具/原料 git github 方法/步骤在github的setting里面的webhooks里把我们的url地址填进去。钩子更新的代码如下： error_reportin
Eos开发常用表达式蕃薯耀 Eos开发 Eos入门 Eos开发常用表达式
Eos开发常用表达式 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 蕃薯耀 2014年8月18日 15:03:35 星期一 &
SpringSecurity3.X--SpEL 表达式 hanqunfeng SpringSecurity
使用 Spring 表达式语言配置访问控制，要实现这一功能的直接方式是在<http>配置元素上添加 use-expressions 属性： <http auto-config="true" use-expressions="true"> 这样就会在投票器中自动增加一个投票器：org.springframework
Redis vs Memcache IXHONG redis
1. Redis中，并不是所有的数据都一直存储在内存中的，这是和Memcached相比一个最大的区别。 2. Redis不仅仅支持简单的k/v类型的数据，同时还提供list，set，hash等数据结构的存储。 3. Redis支持数据的备份，即master-slave模式的数据备份。 4. Redis支持数据的持久化，可以将内存中的数据保持在磁盘中，重启的时候可以再次加载进行使用。 Red
Python - 装饰器使用过程中的误区解读 kvhur JavaScript jquery html5 css
大家都知道装饰器是一个很著名的设计模式，经常被用于AOP(面向切面编程)的场景，较为经典的有插入日志，性能测试，事务处理，Web权限校验， Cache等。原文链接：http://www.gbtags.com/gb/share/5563.htm Python语言本身提供了装饰器语法（@），典型的装饰器实现如下： @function_wrapper de
架构师之mybatis-----update 带case when 针对多种情况更新 nannan408 case when
1.前言. 如题. 2. 代码. <update id="batchUpdate" parameterType="java.util.List"> <foreach collection="list" item="list" index=&
Algorithm算法视频教程栏目记者 Algorithm 算法
课程：Algorithm算法视频教程百度网盘下载地址： http://pan.baidu.com/s/1qWFjjQW 密码: 2mji 程序写的好不好,还得看算法屌不屌！Algorithm算法博大精深。一、课程内容：课时1、算法的基本概念 + Sequential search 课时2、Binary search 课时3、Hash table 课时4、Algor
C语言算法之冒泡排序 qiufeihu c 算法
任意输入10个数字由小到大进行排序。代码： #include <stdio.h> int main() { int i,j,t,a[11]; /*定义变量及数组为基本类型*/ for(i = 1;i < 11;i++){ scanf("%d",&a[i]); /*从键盘中输入10个数*/ } for
JSP异常处理 wyzuomumu Web jsp
1.在可能发生异常的网页中通过指令将HTTP请求转发给另一个专门处理异常的网页中: <%@ page errorPage="errors.jsp"%> 2.在处理异常的网页中做如下声明： errors.jsp: <%@ page isErrorPage="true"%>，这样设置完后就可以在网页中直接访问exc