1.Kafka “高吞吐” 之顺序访问与零拷贝
https://cloud.tencent.com/developer/article/1476649
2.kafka通过零拷贝实现高效的数据传输
https://blog.csdn.net/lxlmycsdnfree/article/details/78973864
3.Kafka的零拷贝技术
https://www.jianshu.com/p/835ec2d4c170
4.什么是“零拷贝”技术
https://baijiahao.baidu.com/s?id=1648595456047501430&wfr=spider&for=pc
Kafka在数据传输的时候,使用了零拷贝技术,这样的技术大大提升了Kafka 的吞吐率。来研究下 Kafka中的零拷贝是如何实现的。
许多Web应用程序都提供了大量的静态内容,这相当于从磁盘读取数据并将完全相同的数据写回到响应socket。这个活动可能似乎只需要相对较少的CPU活动,但是效率有些低下:内核从磁盘读取数据,并将其从内核用户边界推送到应用程序,然后应用程序将其推回到内核用户边界写出来的socket。实际上,应用程序作为一个低效的媒介,从磁盘文件获取数据到socket。
图示如下:
代码如下:
File.read(fileDesc, buf, len);
Socket.send(socket, buf, len);
复制操作需要在用户模式和内核模式之间进行四次上下文切换,并且在操作完成之前将数据复制 四次。上图显示了数据如何从文件内部移动到套接字:
内核使用零拷贝的应用程序要求内核直接将数据从磁盘文件复制到套接字,而不通过应用程序。零拷贝大大提高了应用程序的性能,减少了内核和用户模式之间的上下文切换次数。 这样的话只需要 两次 数据复制。
图示如下:
在Java 的实现是通过 java.nio.channels.FileChannel 的 transfer 实现的 ,看下具体的实现。
其中FileChannel 是一个抽象类
/**
* Transfers bytes from this channel's file to the given writable byte
* channel.
*
* An attempt is made to read up to count bytes starting at
* the given position in this channel's file and write them to the
* target channel. An invocation of this method may or may not transfer
* all of the requested bytes; whether or not it does so depends upon the
* natures and states of the channels. Fewer than the requested number of
* bytes are transferred if this channel's file contains fewer than
* count bytes starting at the given position, or if the
* target channel is non-blocking and it has fewer than count
* bytes free in its output buffer.
*
*
This method does not modify this channel's position. If the given
* position is greater than the file's current size then no bytes are
* transferred. If the target channel has a position then bytes are
* written starting at that position and then the position is incremented
* by the number of bytes written.
*
*
This method is potentially much more efficient than a simple loop
* that reads from this channel and writes to the target channel. Many
* operating systems can transfer bytes directly from the filesystem cache
* to the target channel without actually copying them.
*
* @param position
* The position within the file at which the transfer is to begin;
* must be non-negative
*
* @param count
* The maximum number of bytes to be transferred; must be
* non-negative
*
* @param target
* The target channel
*
* @return The number of bytes, possibly zero,
* that were actually transferred
*
* @throws IllegalArgumentException
* If the preconditions on the parameters do not hold
*
* @throws NonReadableChannelException
* If this channel was not opened for reading
*
* @throws NonWritableChannelException
* If the target channel was not opened for writing
*
* @throws ClosedChannelException
* If either this channel or the target channel is closed
*
* @throws AsynchronousCloseException
* If another thread closes either channel
* while the transfer is in progress
*
* @throws ClosedByInterruptException
* If another thread interrupts the current thread while the
* transfer is in progress, thereby closing both channels and
* setting the current thread's interrupt status
*
* @throws IOException
* If some other I/O error occurs
*/
public abstract long transferTo(long position, long count,
WritableByteChannel target)
throws IOException;
This method is potentially much more efficient than a simple loop that reads from this channel and writes to the target channel. Many operating systems can transfer bytes directly from the filesystem cache to the target channel without actually copying them.
看下具体的实现类 ctrl + alt + b (idea) sun.nio.ch.FileChannelImpl
public long transferTo(long var1, long var3, WritableByteChannel var5) throws IOException {
this.ensureOpen();
if(!var5.isOpen()) {
throw new ClosedChannelException();
} else if(!this.readable) {
throw new NonReadableChannelException();
} else if(var5 instanceof FileChannelImpl && !((FileChannelImpl)var5).writable) {
throw new NonWritableChannelException();
} else if(var1 >= 0L && var3 >= 0L) {
long var6 = this.size();
if(var1 > var6) {
return 0L;
} else {
int var8 = (int)Math.min(var3, 2147483647L);
if(var6 - var1 < (long)var8) {
var8 = (int)(var6 - var1);
}
long var9;
return (var9 = this.transferToDirectly(var1, var8, var5)) >= 0L?var9:((var9 = this.transferToTrustedChannel(var1, (long)var8, var5)) >= 0L?var9:this.transferToArbitraryChannel(var1, var8, var5));
}
} else {
throw new IllegalArgumentException();
}
最后追踪到了 sun.nio.ch.FileChannelImpl 的如下方法 :
private native long transferTo0(int var1, long var2, long var4, int var6);
其底层在Linux 中是调用了 sendFile 函数
ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count)
in_fd
被打开是等待读数据的fd.out_fd
被打开是等待写数据的fd.Offset
是在正式开始读取数据之前应该向前偏移的byte数.count
是需要在两个fd之间“搬移”的数据的byte数.sendFile系统调用零拷贝就是避免了上下文切换带来的copy操作,同时利用直接存储器访问技术(DMA)执行IO操作,避免了内核缓冲区之前的数据拷贝操作。
“零拷贝技术”只用将磁盘文件的数据复制到页面缓存中一次,然后将数据从页面缓存直接发送到网络中(发送给不同的订阅者时,都可以使用同一个页面缓存),避免了重复复制操作。
如果有10个消费者,传统方式下,数据复制次数为4*10=40次,而使用“零拷贝技术”只需要1+10=11次,一次为从磁盘复制到页面缓存,10次表示10个消费者各自读取一次页面缓存。