DiskStore负责将Block存储磁盘,且依赖于DiskBlockManager的服务。在Spark 1.x.x 版本中,BlockStore提供了对磁盘存储DiskStore、内存存储MemeoryStore及Tachyon存储TachyonStore的统一规范,DiskStore、MemoryStore和TachyonStore都是具体的实现。但从Spark 2.0.0 版本开始,取消了 TachyonStore,而且也不再使用BlockStore提供统一的接口规范,DiskStore和MemoryStore都是分别实现的。
DiskStore的属性有以下几项:
小贴士:什么是FileChannel的内存镜像映射方法?在Java NIO中,FileChannel的map方法所提供的快速读定技术,其实质上是将通道所连接的数据节点中的全部或部分数据直接映射到内存的一个Buffer中,而这个内存Buffer块就是节点数据的镜像,直接对这个Buffer进行修改,会影响到节点数据。这个Buffer叫做MappedBuffer,即镜像Buffer。由于是内存镜像,因此处理速度快。
下面学习其方法:
(1)getSize
此方法用于获取给定的BlockId所对应Block的大小。
//org.apache.spark.storage.DiskStore
//diskManager: DiskBlockManager
def getSize(blockId: BlockId): Long = {
diskManager.getFile(blockId.name).length
}
可得DiskStore的getSize方法实质是对DiskBlockManager的getFile方法的调用,在获取File之后,取File的大小。
(2)contains
判断本地磁盘存储路径下是否包含给定BlockId所对应的Block文件
//org.apache.spark.storage.DiskStore
def contains(blockId: BlockId): Boolean = {
val file = diskManager.getFile(blockId.name)
file.exists()
}
(3)remove
删除给定BlockId所对应的Block文件
//org.apache.spark.storage.DiskStore
def remove(blockId: BlockId): Boolean = {
val file = diskManager.getFile(blockId.name)
if (file.exists()) {
val ret = file.delete()
if (!ret) {
logWarning(s"Error deleting ${file.getPath()}")
}
ret
} else {
false
}
}
(4)put
用于将BlockId所对应的Block写入磁盘。
//org.apache.spark.storage.DiskStore
def put(blockId: BlockId)(writeFunc: FileOutputStream => Unit): Unit = {
if (contains(blockId)) {
throw new IllegalStateException(s"Block $blockId is already present in the disk store")
}
logDebug(s"Attempting to put block $blockId")
val startTime = System.currentTimeMillis
val file = diskManager.getFile(blockId)
val fileOutputStream = new FileOutputStream(file)
var threwException: Boolean = true
try {
writeFunc(fileOutputStream)
threwException = false
} finally {
try {
Closeables.close(fileOutputStream, threwException)
} finally {
if (threwException) {
remove(blockId)
}
}
}
val finishTime = System.currentTimeMillis
logDebug("Block %s stored as %s file on disk in %d ms".format(
file.getName,
Utils.bytesToString(file.length()),
finishTime - startTime))
}
(5)putBytes
用于将BlockId所对应的Block写入磁盘,Block的内容已经封装为ChunkedByteBuffer。
def putBytes(blockId: BlockId, bytes: ChunkedByteBuffer): Unit = {
put(blockId) { fileOutputStream =>
val channel = fileOutputStream.getChannel
Utils.tryWithSafeFinally {
bytes.writeFully(channel)
} {
channel.close()
}
}
}
在putBytes方法设置了回调函数,并调用put方法。根据put方法的实现,最后会回调外部的回调函数,此回调函数使用了工具类Uitls的tryWithSafeFinally方法。
(6)getBytes
此方法用于读取给定BlockId所对应的Block,并封装为ChunkedByteBuffer返回
def getBytes(blockId: BlockId): ChunkedByteBuffer = {
val file = diskManager.getFile(blockId.name)
val channel = new RandomAccessFile(file, "r").getChannel
Utils.tryWithSafeFinally {
// For small files, directly read rather than memory map
if (file.length < minMemoryMapBytes) {
val buf = ByteBuffer.allocate(file.length.toInt)
channel.position(0)
while (buf.remaining() != 0) {
if (channel.read(buf) == -1) {
throw new IOException("Reached EOF before filling buffer\n" +
s"offset=0\nfile=${file.getAbsolutePath}\nbuf.remaining=${buf.remaining}")
}
}
buf.flip()
new ChunkedByteBuffer(buf)
} else {
new ChunkedByteBuffer(channel.map(MapMode.READ_ONLY, 0, file.length))
}
} {
channel.close()
}
}