zero copy解析,通过sendfile分析


To understand the impact of sendfile, it is important to understand the common data path for transfer of data from file to socket:

  1. The operating system reads data from the disk into pagecache in kernel space
  2. The application reads the data from kernel space into a user-space buffer
  3. The application writes the data back into kernel space into a socket buffer
  4. The operating system copies the data from the socket buffer to the NIC buffer where it is sent over the network

This is clearly inefficient, there are four copies and two system calls. Using sendfile, this re-copying is avoided by allowing the OS to send the data from pagecache to the network directly. So in this optimized path, only the final copy to the NIC buffer is needed.

NDFILE(2) Linux Programmer's Manual SENDFILE(2)

NAME         top

       sendfile - transfer data between file descriptors

SYNOPSIS         top

       #include <sys/sendfile.h>

       ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count);

DESCRIPTION         top

       sendfile() copies data between one file descriptor and another.
       Because this copying is done within the kernel, sendfile() is more
       efficient than the combination of read(2) and write(2), which would
       require transferring data to and from user space.

       in_fd should be a file descriptor opened for reading and out_fd
       should be a descriptor opened for writing.

       If offset is not NULL, then it points to a variable holding the file
       offset from which sendfile() will start reading data from in_fd.
       When sendfile() returns, this variable will be set to the offset of
       the byte following the last byte that was read.  If offset is not
       NULL, then sendfile() does not modify the current file offset of
       in_fd; otherwise the current file offset is adjusted to reflect the
       number of bytes read from in_fd.

       If offset is NULL, then data will be read from in_fd starting at the
       current file offset, and the file offset will be updated by the call.

       count is the number of bytes to copy between the file descriptors.

       The in_fd argument must correspond to a file which supports
       mmap(2)-like operations (i.e., it cannot be a socket).

       In Linux kernels before 2.6.33, out_fd must refer to a socket.  Since
       Linux 2.6.33 it can be any file.  If it is a regular file, then
       sendfile() changes the file offset appropriately.

RETURN VALUE         top

       If the transfer was successful, the number of bytes written to out_fd
       is returned.  On error, -1 is returned, and errno is set
       appropriately.

ERRORS         top

       EAGAIN Nonblocking I/O has been selected using O_NONBLOCK and the
              write would block.

       EBADF  The input file was not opened for reading or the output file
              was not opened for writing.

       EFAULT Bad address.

       EINVAL Descriptor is not valid or locked, or an mmap(2)-like
              operation is not available for in_fd.

       EIO    Unspecified error while reading from in_fd.

       ENOMEM Insufficient memory to read from in_fd.

VERSIONS         top

       sendfile() is a new feature in Linux 2.2.  The include file
       <sys/sendfile.h> is present since glibc 2.1.

CONFORMING TO         top

       Not specified in POSIX.1-2001, or other standards.

       Other UNIX systems implement sendfile() with different semantics and
       prototypes.  It should not be used in portable programs.

NOTES         top

       If you plan to use sendfile() for sending files to a TCP socket, but
       need to send some header data in front of the file contents, you will
       find it useful to employ the TCP_CORK option, described in tcp(7), to
       minimize the number of packets and to tune performance.

       In Linux 2.4 and earlier, out_fd could also refer to a regular file,
       and sendfile() changed the current offset of that file.

       The original Linux sendfile() system call was not designed to handle
       large file offsets.  Consequently, Linux 2.4 added sendfile64(), with
       a wider type for the offset argument.  The glibc sendfile() wrapper
       function transparently deals with the kernel differences.

       Applications may wish to fall back to read(2)/write(2) in the case
       where sendfile() fails with EINVAL or ENOSYS.

       The Linux-specific splice(2) call supports transferring data between
       arbitrary files (e.g., a pair of sockets).

SEE ALSO         top

       mmap(2), open(2), socket(2), splice(2)

COLOPHON         top

       This page is part of release 3.54 of the Linux man-pages project.  A
       description of the project, and information about reporting bugs, can
       be found at http://www.kernel.org/doc/man-pages/.


你可能感兴趣的:(linux,kernel,Sockets)