14-Advanced I&O Functions

Please indicate the source: http://blog.csdn.net/gaoxiangnumber1

Welcome to my github: https://github.com/gaoxiangnumber1

14.1 Introduction

14.2 Socket Timeouts

  • There are three ways to place a timeout on an I/O operation involving a socket.
    1. Call alarm, which generates SIGALRM signal when the specified time has expired.
    2. Block waiting for I/O in select, which has a time limit built-in, instead of blocking in a call to read or write.
    3. Use SO_RCVTIMEO and SO_SNDTIMEO socket options.
  • All three techniques work with input and output operations, but we would like a technique that we can use with connect, since a TCP connect can take a long time to time out(typically 75 seconds). select can be used to place a timeout on connect only when the socket is in a nonblocking mode(Section 16.3), and the two socket options do not work with connect. The first two techniques work with any descriptor, while the third technique works only with socket descriptors.

connect with a Timeout Using SIGALRM

  • Figure 14.1: calls connect with an upper limit specified by the caller. The first three arguments are the three required by connect and the fourth argument is the number of seconds to wait.

Establish signal handler 8

  • A signal handler is established for SIGALRM. The current signal handler(if any) is saved, so we can restore it at the end of the function.

Set alarm 9-10

  • The alarm clock for the process is set to the number of seconds specified by the caller. The return value from alarm is the number of seconds currently remaining in the alarm clock for the process(if one has been set by the process) or 0(if there is no current alarm). In the former case we print a warning message since we are wiping out that previously set alarm(see Exercise 14.2).

Call connect 11-15

  • connect is called and if the function is interrupted(EINTR), we set the errno value to ETIMEDOUT instead. The socket is closed to prevent the three-way handshake from continuing.

Turn off alarm and restore any previous signal handler 16-18

  • The alarm is turned off by setting it to 0 and the previous signal handler(if any) is restored.

Handle SIGALRM 20-24

  • The signal handler just returns, assuming this return will interrupt the pending connect, causing connect to return an error of EINTR. Recall our signal function(Figure 5.6) that does not set the SA_RESTART flag when the signal being caught is SIGALRM.
  • We can always reduce the timeout period for a connect using this technique, but we cannot extend the kernel’s existing timeout. On Berkeley-derived kernel the timeout for a connect is 75 seconds. We can specify a value that is equal or lesser than 75 for our function, but if we specify a value that is greater than 75, the connect itself will still time out after 75 seconds.
  • We use the interruptibility of the system call(connect) to return before the kernel’s time limit expires. This is fine when we perform the system call and can handle the EINTR error return. But in Section 29.7, we use a library function that performs the system call, and the library function reissues the system call when EINTR is returned. We will see in Figure 29.10 that we have to use SIGALRM along with sigsetjmp and siglongjmp to get around the library’s ignoring of EINTR.
  • Since signals are difficult to use correctly with multithreaded programs(Chapter 26), the technique shown here is only recommended for single-threaded programs.

recvfrom with a Timeout Using SIGALRM

  • Figure 14.2 is a redo of dg_cli function(Figure 8.8) with a call to alarm to interrupt the recvfrom if a reply is not received within five seconds.

Handle timeout from recvfrom 8-22

  • We establish a signal handler for SIGALRM and then call alarm for a five-second timeout before each call to recvfrom. If recvfrom is interrupted by our signal handler, we print a message and continue. If a line is read from the server, we turn off the pending alarm and print the reply.

SIGALRM signal handler 24-28

  • Our signal handler returns to interrupt the blocked recvfrom. This example works correctly because we are reading only one reply each time we establish an alarm. In Section 20.4, we will use the same technique, but since we are reading multiple replies for a given alarm, a race condition exists that we must handle.

recvfrom with a Timeout Using select

  • Figure 14.3: readable_timeo function waits up to a specified number of seconds for a descriptor to become readable.

Prepare arguments for select 7-10

  • The bit corresponding to the descriptor is turned on in the read descriptor set. A timeval structure is set to the number of seconds that the caller wants to wait.

Block in select 11-12

  • select waits for the descriptor to become readable, or for the timeout to expire. The return value of this function is the return value of select: -1 on an error, 0 if a timeout occurs, or a positive value specifying the number of ready descriptors.
  • This function does not read, just waits for the descriptor to be ready for reading. So, this function can be used with any type of socket, TCP or UDP.
  • We use this function in Figure 14.4(redo of dg_cli function from Figure 8.8). This new version calls recvfrom only when our readable_timeo function returns a positive value.

  • We do not call recvfrom until the function readable_timeo tells us that the descriptor is readable. This guarantees that recvfrom will not block.

recvfrom with a Timeout Using the SO_RCVTIMEO Socket Option

  • We set SO_RCVTIMEO option once for a descriptor, specifying the timeout value, and this timeout then applies to all read operations on that descriptor.
  • Advantage: We set the option only once; previous two methods require doing something before every operation on which we wanted to place a time limit.
    Disadvantage: SO_RCVTIMEO applies only to read operations, SO_SNDTIMEO applies only to write operations; neither socket option can be used to set a timeout for a connect.
  • Figure 14.5: dg_cli function uses the SO_RCVTIMEO socket option.

Set socket option 8-10

  • The fourth argument to setsockopt is a pointer to a timeval structure that is filled in with the desired timeout.

Test for timeout 15-17

  • If the I/O operation times out, the function returns EWOULDBLOCK.

14.3 ‘recv’ and ‘send’ Functions

#include <sys/types.h>
#include <sys/socket.h>
ssize_t recv(int sockfd, void *buff, size_t nbytes, int flags);
ssize_t send(int sockfd, const void *buff, size_t nbytes, int flags);
Both return: number of bytes read or written if OK, -1 on error
  • The first three arguments to recv and send are the same as the first three arguments to read and write.
  • flags is 0 or is formed by logically OR one or more of the constants in Figure 14.6.
  • Design problem: flags is passed by value, not a value-result argument. It can be used only to pass flags from the process to the kernel, the kernel cannot pass back flags to the process. Since the OSI protocols were added to 4.3BSD Reno, the need arose to return MSG_EOR to the process with an input operation. Thus, the decision was made with 4.3BSD Reno to leave the arguments to the commonly used input functions(recv and recvfrom) as-is and change the msghdr structure that is used with recvmsg and sendmsg. Section 14.5: an integer msg_flags member was added to this structure, and since the structure is passed by reference, the kernel can modify these flags on return. If a process needs to have the flags updated by the kernel, the process must call recvmsg instead of recv or recvfrom.

14.4 ‘readv’ and ‘writev’ Functions

#include <sys/uio.h>
ssize_t readv(int filedes, const struct iovec *iov, int iovcnt);
ssize_t writev(int filedes, const struct iovec *iov, int iovcnt);
Both return: number of bytes read or written, -1 on error
  • readv and writev let us read into or write from one or more buffers with a single function call. These operations are called scatter read(since the input data is scattered into multiple application buffers) and gather write(since multiple buffers are gathered for a single output operation).
  • iov is a pointer to an array of iovec structures(defined in
struct iovec
{
    void *iov_base; // starting address of buffer
    size_t iov_len; // size of buffer
};
  • There is limit to the number of elements in the array of iovec structures that an implementation allows. POSIX requires that the constant IOV_MAX defined in

14.5 ‘recvmsg’ and ‘sendmsg’ Functions

#include <sys/types.h>
#include <sys/socket.h>
ssize_t recvmsg(int sockfd, struct msghdr *msg, int flags);
ssize_t sendmsg(int sockfd, struct msghdr *msg, int flags);
Both return: number of bytes read or written if OK, -1 on error
  • Both functions package most arguments into a msghdr structure.
struct msghdr
{
    void*           msg_name;       // protocol address
    socklen_t       msg_namelen;    // size of protocol address
    struct iovec*       msg_iov;        // scatter/gather array
    int             msg_iovlen;     // # elements in msg_iov
    void*           msg_control;    // ancillary data (cmsghdr struct)
    socklen_t       msg_controllen; // length of ancillary data
    int             msg_flags;      // flags returned by recvmsg()
};
  • msg_name and msg_namelen are used when the socket is not connected(e.g., an unconnected UDP socket). msg_name points to a socket address structure in which the caller stores the destination’s protocol address for sendmsg, or in which recvmsg stores the sender’s protocol address. If a protocol address does not need to be specified(e.g., a TCP socket or a connected UDP socket), msg_name should be set to a null pointer. msg_namelen is a value for sendmsg, but a value-result for recvmsg.
  • msg_iov and msg_iovlen specify the array of input or output buffers(the array of iovec structures).
  • msg_control and msg_controllen specify the location and size of the optional ancillary data(Section 14.6). msg_controllen is a value-result argument for recvmsg.
  • With recvmsg and sendmsg, we must distinguish between two flag variables: flags argument, which is passed by value, and msg_flags member of msghdr structure, which is passed by reference(since the address of the structure is passed to the function).
  • The msg_flags member is used only by recvmsg. When recvmsg is called, the flags argument is copied into the msg_flags member and this value is used by the kernel to drive its receive processing. This value is then updated based on the result of recvmsg.
  • The msg_flags member is ignored by sendmsg because this function uses the flags argument to drive its output processing. This means if we want to set the MSG_DONTWAIT flag in a call to sendmsg, we set the flags argument to this value; setting the msg_flags member to this value has no effect.

  • Figure 14.7 summarizes the flags that are examined by the kernel for both the input and output functions, as well as the msg_flags that might be returned by recvmsg.
  • There is no column for sendmsg msg_flags because it is not used. The first four flags are only examined and never returned; the next two are both examined and returned; and the last four are only returned. The following comments apply to the six flags returned by recvmsg.
  • Implementations might return some of the input flags in the msg_flags member, so we should examine only those flag values we are interested in(e.g., the last six in Figure 14.7).

  • Figure 14.8 shows a msghdr structure and the various information it points to. Assume the process is about to call recvmsg for a UDP socket.
  • Sixteen bytes are allocated for the protocol address and 20 bytes are allocated for the ancillary data. An array of three iovec structures is initialized: The first specifies a 100-byte buffer, the second a 60-byte buffer, and the third an 80-byte buffer. Assume IP_RECVDSTADDR socket option has been set for the socket to receive the destination IP address from the UDP datagram.
  • We next assume that a 170-byte UDP datagram arrives from 192.6.38.100, port 2000, destined for our UDP socket with a destination IP address of 206.168.112.96. Figure 14.9 shows all the information in the msghdr structure when recvmsg returns. The shaded fields are modified by recvmsg.

  • The following items have changed from Figure 14.8 to Figure 14.9:
    1. The buffer pointed to by msg_name has been filled in as an Internet socket address structure, containing the source IP address and source UDP port from the received datagram.
    2. msg_namelen, a value-result argument, is updated with the amount of data stored in msg_name. Nothing changes since its value before the call was 16 and its value when recvmsg returns is also 16.
    3. The first 100 bytes of data are stored in the first buffer; the next 60 bytes are stored in the second buffer; and the final 10 bytes are stored in the third buffer.
    4. The last 70 bytes of the final buffer are not modified. The return value of the recvmsg function is the size of the datagram, 170.
    5. The buffer pointed to by msg_control is filled in as a cmsghdr structure.(We will say more about ancillary data in Section 14.6 and more about this particular socket option in Section 22.2.) The cmsg_len is 16; cmsg_level is IPPROTO_IP; cmsg_type is IP_RECVDSTADDR; and the next 4 bytes contain the destination IP address from the received UDP datagram. The final 4 bytes of the 20-byte buffer we supplied to hold the ancillary data are not modified.
    6. The msg_controllen member is updated with the actual amount of ancillary data that was stored. It is also a value-result argument and its result on return is 16.
    7. The msg_flags member is updated by recvmsg, but there are no flags to return to the process.
  • Figure 14.10 summarizes the differences among the five groups of I/O functions we described.

14.6 Ancillary Data

  • Ancillary data is also called control information. It can be sent and received using msg_control and msg_controllen members of msghdr structure with sendmsg and recvmsg functions.

  • Figure 14.11 is a summary of the uses of ancillary data we cover in this text.
  • Ancillary data consists of one or more ancillary data objects, each one beginning with a cmsghdr structure, defined by including
struct cmsghdr
{
    socklen_t   cmsg_len;   // length in bytes, including this structure
    int         cmsg_level; // originating protocol
    int         cmsg_type;  // protocol-specific type
    // followed by unsigned char cmsg_data[]
};
  • The ancillary data pointed to by msg_control must be suitably aligned for a cmsghdr structure. We will show one way to do this in Figure 15.11.

  • Figure 14.12 shows an example of two ancillary data objects in the control buffer. msg_control points to the first ancillary data object, and the total length of the ancillary data is specified by msg_controllen. Each object is preceded by a cmsghdr structure that describes the object. There can be padding between the cmsg_type member and the actual data, and there can also be padding at the end of the data, before the next ancillary data object. The five CMSG_xxx macros we describe shortly account for this possible padding.
  • Not all implementations support multiple ancillary data objects in the control buffer. Figure 14.13 shows the format of the cmsghdr structure when used with a Unix domain socket for descriptor passing(Section 15.7) or credential passing(Section 15.8).

  • In this figure, we assume each of the three members of the cmsghdr structure occupies four bytes and there is no padding between the cmsghdr structure and the actual data. When descriptors are passed, the contents of the cmsg_data array are the actual descriptor values. In this figure, we show only one descriptor being passed, but in general, more than one can be passed(in which case, the cmsg_len value will be 12 plus 4 times the number of descriptors, assuming each descriptor occupies 4 bytes).
  • Since the ancillary data returned by recvmsg can contain any number of ancillary data objects, and to hide the possible padding from the application, the following five macros are defined by including the
#include <sys/socket.h>
#include <sys/param.h> /* for ALIGN macro on many implementations */
struct cmsghdr *CMSG_FIRSTHDR(struct msghdr *mhdrptr) ;
Returns: pointer to first cmsghdr structure or NULL if no ancillary data

struct cmsghdr *CMSG_NXTHDR(struct msghdr *mhdrptr, struct cmsghdr *cmsgptr) ;
Returns: pointer to next cmsghdr structure or NULL if no more ancillary data objects

unsigned char *CMSG_DATA(struct cmsghdr *cmsgptr) ;
Returns: pointer to first byte of data associated with cmsghdr structure

#include <sys/socket.h>
unsigned int CMSG_LEN(unsigned int length) ;
Returns: value to store in cmsg_len given the amount of data

unsigned int CMSG_SPACE(unsigned int length) ;
Returns: total size of an ancillary data object given the amount of data
  • POSIX defines the first three macros; RFC 3542 defines the last two.
  • These macros would be used in the following pseudocode:
struct msghdr msg;
struct cmsghdr *cmsgptr;
/* fill in msg structure */
/* call recvmsg() */
for(cmsgptr = CMSG_FIRSTHDR(&msg); cmsgptr != NULL;
        cmsgptr = CMSG_NXTHDR(&msg, cmsgptr))
{
    if (cmsgptr->cmsg_level == ... && cmsgptr->cmsg_type == ... )
    {
        u_char *ptr;
        ptr = CMSG_DATA(cmsgptr);
        /* process data pointed to by ptr */
    }
}
  • CMSG_FIRSTHDR returns a pointer to the first ancillary data object, or a null pointer if there is no ancillary data in the msghdr structure(either msg_control is a null pointer or cmsg_len is less than the size of a cmsghdr structure). CMSG_NXTHDR returns a null pointer when there is not another ancillary data object in the control buffer.
  • Many existing implementations of CMSG_FIRSTHDR never look at msg_controllen and just return the value of cmsg_control. In Figure 22.2, we will test the value of msg_controllen before calling this macro.
  • The difference between CMSG_LEN and CMSG_SPACE is that the former does not account for any padding following the data portion of the ancillary data object and is therefore the value to store in cmsg_len, while the latter accounts for the padding at the end and is therefore the value to use if dynamically allocating space for the ancillary data object.

14.7 How Much Data Is Queued?

  • There are times when we want to see how much data is queued to be read on a socket, without reading the data. Three techniques are available:
    1. If the goal is not to block in the kernel because we have something else to do when nothing is ready to be read, nonblocking I/O can be used.(Chapter 16.)
    2. If we want to examine the data but still leave it on the receive queue for some other part of our process to read, we can use the MSG_PEEK flag(Figure 14.6). If we want to do this, but we are not sure that something is ready to be read, we can use this flag with a nonblocking socket or combine this flag with the MSG_DONTWAIT flag.
  • Note that the amount of data on the receive queue can change between two successive calls to recv for a stream socket. For example, assume we call recv for a TCP socket specifying a buffer length of 1024 along with the MSG_PEEK flag, and the return value is 100. If we then call recv again, it is possible for more than 100 bytes to be returned(assuming we specify a buffer length greater than 100), because more data can be received by TCP between our two calls.
  • For UDP socket with a datagram on the receive queue, if we call recvfrom specifying MSG_PEEK, followed by another call without specifying MSG_PEEK, the return values from both calls(the datagram size, its contents, and the sender’s address) will be the same, even if more datagrams are added to the socket receive buffer between the two calls.(We are assuming that some other process is not sharing the same descriptor and reading from this socket at the same time.)
  • Some implementations support the FIONREAD command of ioctl. The third argument to ioctl is a pointer to an integer, and the value returned in that integer is the current number of bytes on the socket’s receive queue. This value is the total number of bytes queued, which for a UDP socket includes all queued datagrams. Note that the count returned for a UDP socket by Berkeley-derived implementations includes the space required for the socket address structure containing the sender’s IP address and port for each datagram(16 bytes for IPv4; 24 bytes for IPv6).

14.8 Sockets and Standard I/O

  • Unix I/O are read and write functions and their variants(recv, send, etc.). These functions work with descriptors and are implemented as system calls within the Unix kernel.
  • Another is the standard I/O library. The standard I/O library handles some of the details that we must worry about ourselves when using the Unix I/O functions, such as automatically buffering the input and output streams.
  • The term stream is used with the standard I/O library, as in “we open an input stream” or “we flush the output stream.”
  • The standard I/O library can be used to sockets with a few items to consider:
    1. A standard I/O stream can be created from any descriptor by calling fdopen function. Given a standard I/O stream, we can obtain the corresponding descriptor by calling fileno.
    2. TCP and UDP sockets are full-duplex. Standard I/O streams can be full-duplex: we open the stream with a type of r+, which means read-write. But on such stream, an output function cannot be followed by an input function without an intervening call to fflush, fseek, fsetpos, or rewind. An input function cannot be followed by an output function without an intervening call to fseek, fsetpos, or rewind, unless the input function encounters an EOF. The problem with these latter three functions is that they all call lseek, which fails on a socket.
    3. The easiest way to handle this read-write problem is to open two standard I/O streams for a given socket: one for reading and one for writing.

Example: str_echo Function Using Standard I/O

  • Figure 14.14 is a version of str_echo(Figure 5.3) that uses standard I/O.

Convert descriptor into input stream and output stream 7-10

  • Two standard I/O streams are created by fdopen: one for input and one for output. The calls to read and writen are replaced with calls to fgets and fputs.
$ tcpcli02 206.168.112.96
hello, world                    #we type this line, but nothing is echoed
and hi                      #and this one, still no echo
hello??                     #and this one, still no echo
^D                          #and our EOF character
hello, world                    #and then the three echoed lines are output
and hi
hello??
  • There is a buffering problem because nothing is echoed by the server until we enter our EOF character. The following steps take place:
    1. We type the first line of input and it is sent to the server.
    2. The server reads the line with fgets and echoes it with fputs. (The server’s standard I/O stream is fully buffered by the standard I/O library. This means the library copies the echoed line into its standard I/O buffer for this stream, but does not write the buffer to the descriptor because the buffer is not full.)
    3. We type the second line of input and it is sent to the server.
    4. The server reads the line with fgets and echoes it with fputs. (Again, the server’s standard I/O library just copies the line into its buffer, but does not write the buffer because it is still not full.)
    5. The same scenario happens with the third line of input that we enter.
    6. We type EOF character, and our str_cli function(Figure 6.13) calls shutdown, sending a FIN to the server.
    7. The server TCP receives the FIN, which fgets reads, causing fgets to return a null pointer.
    8. The str_echo function returns to the server main function(Figure 5.12) and the child terminates by calling exit.
    9. exit calls the standard I/O cleanup function. The output buffer that was partially filled by our calls to fputs is now output.
    10. The server child process terminates, causing its connected socket to be closed, sending a FIN to the client, completing the TCP four-packet termination sequence.
    11. The three echoed lines are received by our str_cli function and output.
    12. str_cli receives an EOF on its socket, and the client terminates.
  • Problem is the buffering performed automatically by the standard I/O library on the server. Three types of buffering performed by the standard I/O library:
    1. Fully buffered: I/O takes place only when the buffer is full, the process explicitly calls fflush, or the process terminates by calling exit. Common size for standard I/O buffer is 8192 bytes.
    2. Line buffered: I/O takes place when a newline is encountered, when the process calls fflush, or when the process terminates by calling exit.
    3. Unbuffered: I/O takes place each time a standard I/O output function is called.
  • Most Unix implementations of the standard I/O library use the following rules:
    1. Standard error is unbuffered.
    2. Standard input and standard output are fully buffered, unless they refer to a terminal device(line buffered).
    3. All other streams are fully buffered unless they refer to a terminal device(line buffered).
  • Since a socket is not a terminal device, the output stream(fpout) is fully buffered. Solutions:
    1. Force the output stream to be line buffered by calling setvbuf.
    2. Force each echoed line to be output by calling fflush after each call to fputs.
  • In practice, both solutions are error-prone and may interact badly with Nagle algorithm(Section 7.9). The best solution is to avoid using the standard I/O library altogether for sockets and operate on buffers instead of lines, as described in Section 3.9.

14.9 Advanced Polling(Not supported on Linux)

14.10 Summary

Exercises(Redo)

Please indicate the source: http://blog.csdn.net/gaoxiangnumber1

Welcome to my github: https://github.com/gaoxiangnumber1

你可能感兴趣的:(github,socket)