APUE-CH3 文件IO(2)

read Function

Data is read from an open file with read function

#include <unistd.h>
ssize_t read(itn fd, void(buf), size_t nbytes)
            Returns: number of bytes read, 0 if end of file, -1 on error

If the read is successful, the number of bytes read is returned. If the end of file is encountered, 0 is returned.

There are several cases in which number of bytes actually read is less than the amount requested:
- When reading from a regular file, if the end of file is reached before the requested number of bytes has been read. For example, if 30 bytes remian until the end of file and we try to read 100 bytes, read return 30. The next time we call read, it will return 0 (end of file).
- When reading from a terminal device. Normally, up to one line is read at a time. (We’ll see how the chage this default in Chapter 18)
- When reading from a network. Buffering within the network may cause less than the requested amount to be returned.
- When reading from a pipe or FIFO. If the pipe contains fewer bytes than requested. read will return only what is available.
- When reading from a record-oriented device. Some record-oriented devices, such as magnetic tape, can return up to a single record at a time.
- When interrupted by a signal and a partial amout of data has already been read. We discuss thsi further in Section 10.5.

The read opoeration starts at the file’s current offset. Before a successful return, the offset is incremented by the number of bytes actually read.

POSIX.1 changed the prototype for this function in several way. The classic definition is

in read(int fd, char *buf, unsigned nbytes);

  • First, the second argument was changed from char * to void * to be consistent with ISO C: the type void * is used for generic pointers.
  • Next, the return value was required to be s signed integer (ssize_t) to return a positive byte count, 0 (for the end of file), or -1 (for an error).
  • Finally, the thrid argument historically has been an unsinged integer, to allow 16-bit implememtation to read or write up to 65534 bytes at a time. With the 1990 POSIX.1 standard, the primitive system data type ssize_t was introduced to provide the signed retgurn value, and the unsinged size_t was used for the third argument. (Recall the SSIZE_MAX constant from Section 2.5.2).

wirte Fucntion

Data is written to an open file with the write function.

#include <unistd.h>
ssize_t write(int fd, const void *buf, size_t nbytes);
            Return: number of bytes written if OK, -1 on error

The return value is usually equal to the nbytes argument; otherwise, an error has occurred. A common cause for a write error is either filling up a disk or exceeding the file size limit for a given process (Section 7.11 and Exercise 10.11).

for a regular file, the write operation starts at the file’s current offset. If the O_APPEND option was specified when the file was opened, the file’s offset is set to the current end file before each writ operation. After a successful write, the file’s offset is incremented by tehe number of bytes actually written.

I/O Efficiency

The program in Figure 3.5 copies a file, using only the read and write functions.

#include "apue.h"

#define BUFFSIZE 4096

int
main(void)
{
    int     n;
    char    buf[BUFFSIZE];

    while((n = read(STDIN_FILENO, buf, BUFFSIZE)) > 0)
        if(write(STDOUT_FILENO, buf, n) != n)
            err_sys("write error");
        if(n < 0)       //>
            error_sys("read error");
        exit(0);
}

The following caveats apply to this program:
- It reads from standard input and writes to standard output, assuming that these have been set up by the shell before this program is executed. Indeed, all normal UNIX system shells provide a way to open a file for reading on standard input and to create (or rewrite) a file on standard output. This prevents the program from having to open the input and output files, and allows the user to take advantage of the shell’s I/O redirection facilities.
- The program doesn’t close the input file or output file. Instead, the program uses the feature of the UNIX kernel that closes all open file descriptors in a process when that process terminates.
- This example works for both text files and binary files, since there is no difference between the two to the UNIX kernel.

One question we haven’t answered, however, is how we chose the BUFFSIZE value. Before answering that, let’s run the program using different values for BUFFSIZE. Figure 3.6 shows the results for reading a 516,581,760-byte file, using 20
different buffer sizes.

The file was read using the program shown in Figure 3.5, with standard output redirected to /dev/null. The file system used for this test was the Linux ext4 file system with 4096-bytes blocks. (The st_blksize value, which we discribe in Section 4.12, is 4096.) This accouts for the minimum in the system time occuring at the few timing measurements starting around a BUFFSIZE of 4096. Increasing the buffer size beyong this limit has little positive effect.

==Most file system support some kind of read-ahead to improve performance.== When sequential reads are detected, the system tries to read in more data than an application requests, assuming that the application will read it shortly. The effect of read-ahead can be seen in Figure 3.6, where the elapsed time for buffer sizes as small as 32 bytes is as good as the elapsed time for larger buffer sizes.

We’ll return to this timing example later in the text. In Section 3.14, we show the effect of synchronous writes; in Section 5.8, we compare these unbuffered I/O times with the standard I/O library.

Beware when trying to measure the performance of programs that read and write files. The operating system will try to cache the file incore, so if your measure the performance of the program repeatedly, the successive timings will likely to be better than the first. This improvement occurs because the first run causes the file to be entered into the system’s cache, and successive runs access the file from the system’s cache instead of from the disk. (The term incore means in main memory. Back in the day, a computer’s main memory was built out of ferrite core. This is where the phrase “core dump” comes from: the main memory image of a program stored in a file on disk for diagnosis.)

In the test reported in Figure 3.6, each run with a different buffer size was made using a different copy of the file so that the current run didn’t find cache from the previous run. The files are larege enough that they all don’t remian in the cache (the test system was configured with 6GB of RAM).

File Sharing

The UNIX System supports the sharing of openf files among different processes. Before describing the dup function, we need to describe this sharing. To do this, we’ll examing the data structure used by the kernel for all I/O.

The following decription is conceptual; it may or may not match a particular implementation. Refer to Bach [1986] for a discussion of these structures in System V. McKusick etal. [1996] describe these stru

你可能感兴趣的:(apue)