kafka后台抽象的Log由若干个LogSegment组成。每一个LogSegments都有一个base offset,是该logsegment第一条消息的偏移量。Server根据时间或者大小限制来创建新的LogSegment。
Log是一个partition的一个replica的存储。
各个server的日志存储不一定相同,即使是相同的topic里面相同partion的副本,存储的起始offset也不相同。
日志操作主要包含一下几个后台线程
##LogManager.scala
//遍历所有Log。负责清理未压缩的日志,清除条件
//1.日志超过保留时间 2.日志大小超过保留大小】
scheduler.schedule("kafka-log-retention",
cleanupLogs _,
delay = InitialTaskDelayMs,
period = retentionCheckMs,
TimeUnit.MILLISECONDS)
info("Starting log flusher with a default period of %d ms.".format(flushCheckMs))
//将超过写回限制时间且存在更新的Log写回磁盘。
//调用JAVA NIO中的FileChannel中的force,该方法将负责将channel中的所有未未写入磁盘的内容写入磁盘。
scheduler.schedule("kafka-log-flusher",
flushDirtyLogs _,
delay = InitialTaskDelayMs,
period = flushCheckMs,
TimeUnit.MILLISECONDS)
//向路径中写入当前的恢复点,避免在重启时需要重新恢复全部数据
scheduler.schedule("kafka-recovery-point-checkpoint",
checkpointLogRecoveryOffsets _,
delay = InitialTaskDelayMs,
period = flushRecoveryOffsetCheckpointMs,
TimeUnit.MILLISECONDS)
//向日志目录写入当前存储的日志中的start offset。避免读到已经被删除的日志
scheduler.schedule("kafka-log-start-offset-checkpoint",
checkpointLogStartOffsets _,
delay = InitialTaskDelayMs,
period = flushStartOffsetCheckpointMs,
TimeUnit.MILLISECONDS)
//清理已经被标记为删除的日志
scheduler.schedule("kafka-delete-logs",
deleteLogs _,
delay = InitialTaskDelayMs,
period = defaultConfig.fileDeleteDelayMs,
TimeUnit.MILLISECONDS)
Log的flush会具体到某个segment的flush
//Log.scala
def flush(offset: Long) : Unit = {
maybeHandleIOException(s"Error while flushing log for $topicPartition in dir ${dir.getParent} with offset $offset") {
if (offset <= this.recoveryPoint)
return
debug("Flushing log '" + name + " up to offset " + offset + ", last flushed: " + lastFlushTime + " current time: " +
time.milliseconds + " unflushed = " + unflushedMessages)
for (segment <- logSegments(this.recoveryPoint, offset))
segment.flush()
lock synchronized {
checkIfMemoryMappedBufferClosed()
if (offset > this.recoveryPoint) {
this.recoveryPoint = offset
lastflushedTime.set(time.milliseconds)
}
}
}
最终的日为止文件操作在FileRecords.java实现。封装了JAVA NIO中FILE CHANNEL的常见操作。
最终channel.write将MemoryRecords写入硬盘文件中。
MemoryRecords是Kakfa中Record在内存中的实现形式。
##MemoryRecords.java
public class MemoryRecords extends AbstractRecords {
//封装了NIO中的ByteBuffer
private final ByteBuffer buffer;
##FileRecords.java
public class FileRecords extends AbstractRecords implements Closeable {
//访问该文件的channel
private final FileChannel channel;
//在打开文件的时候使用java.io的File类初始化该实例的channel
public static FileRecords open(File file,
boolean mutable,
boolean fileAlreadyExists,
int initFileSize,
boolean preallocate) throws IOException {
FileChannel channel = openChannel(file, mutable, fileAlreadyExists, initFileSize, preallocate);
int end = (!fileAlreadyExists && preallocate) ? 0 : Integer.MAX_VALUE;
return new FileRecords(file, channel, 0, end, false);
}
//把内存中的记录追加到文件中
public int append(MemoryRecords records) throws IOException {
//底层实现就是channel.write(buffer)
int written = records.writeFullyTo(channel);
size.getAndAdd(written);
return written;
}
##MemoryRecords.java
public int writeFullyTo(GatheringByteChannel channel) throws IOException {
buffer.mark();
int written = 0;
while (written < sizeInBytes())
written += channel.write(buffer);
buffer.reset();
return written;
}
第二种方式如下
FileRecords类提供了两种方式来读取日志文件
+ 采用NIO中的channel.read将内容读到NIO中的ByteBuffer里。
+ 采用NIO中fileChannel.transferTo将内容直接零拷贝到socketchannel中。注意,这一步,broker端既不对数据进行解压缩,而是将压缩数据直接发给客户端,让客户端进行解压缩。
第一种方式如下:
##FileRecords.java
public ByteBuffer readInto(ByteBuffer buffer, int position) throws IOException {
Utils.readFully(channel, buffer, position + this.start);
//buffer从写入模式切换到了读出模式,返回
buffer.flip();
return buffer;
}
##Utils.java
public static void readFully(FileChannel channel, ByteBuffer destinationBuffer, long position) throws IOException {
if (position < 0) {
throw new IllegalArgumentException("The file channel position cannot be negative, but it is " + position);
}
long currentPosition = position;
int bytesRead;
do {
bytesRead = channel.read(destinationBuffer, currentPosition);
currentPosition += bytesRead;
} while (bytesRead != -1 && destinationBuffer.hasRemaining());
}
第二种方式如下:
##FileRecords.java
@Override
public long writeTo(GatheringByteChannel destChannel, long offset, int length) throws IOException {
final long bytesTransferred;
if (destChannel instanceof TransportLayer) {
//写入传输层的socket
TransportLayer tl = (TransportLayer) destChannel;
bytesTransferred = tl.transferFrom(channel, position, count);
} else {
bytesTransferred = channel.transferTo(position, count, destChannel);
}
return bytesTransferred;
}
##PlaintextTransportLayer.java
public class PlaintextTransportLayer implements TransportLayer {
//实例保存的具体socket
private final SocketChannel socketChannel;
@Override
public long transferFrom(FileChannel fileChannel, long position, long count) throws IOException {
//NIO包方法,从filechannel零拷贝到socketchannel。
return fileChannel.transferTo(position, count, socketChannel);
}
在操作系统支持的情况下,该数据并不需要将源数据从内核态拷贝到用户态,再从用户态拷贝到内核态。