[单刷APUE系列]第五章——标准I/O库

[单刷APUE系列]第一章——Unix基础知识[1]
[单刷APUE系列]第一章——Unix基础知识[2]
[单刷APUE系列]第二章——Unix标准及实现
[单刷APUE系列]第三章——文件I/O
[单刷APUE系列]第四章——文件和目录[1]
[单刷APUE系列]第四章——文件和目录[2]
[单刷APUE系列]第五章——标准I/O库
[单刷APUE系列]第六章——系统数据文件和信息
[单刷APUE系列]第七章——进程环境
[单刷APUE系列]第八章——进程控制[1]
[单刷APUE系列]第八章——进程控制[2]
[单刷APUE系列]第九章——进程关系
[单刷APUE系列]第十章——信号[1]

流和FILE对象

在学习C语言的时候，肯定也对标准I/O库有所了解，这个库是由ISO C标注制定的，前面也说过，ISO C被包含在SUS标准中，所以SUS在ISO C的标准上，又进行了扩充。
标准I/O库最大的好处就是不需要再和底层内核调用打交道了，非常方便的就能跨平台使用，在前面几节中，大家也对I/O各种繁琐的细节也有些头晕，而标准I/O库就很便于使用，但是如果不对底层有所了解，在使用的时候也会出现问题的。
在第三章中，所有的内容都是由`stat'引出，然后围绕着文件描述符进行讲述，但是标准I/O库则是围绕着流来进行。
众所周知，现有的所有字符，可以分为ASCII字符和宽字符，所以标准I/O函数是围绕着单字符和宽字符的。就像是汽车和火车，汽车只能在马路上跑，火车只能在铁轨上跑，当一个流被创建的时候，是不能确定是宽字符还是单字节的，只有后续I/O操作才会确定下来。

int fwide(FILE *stream, int mode);

如果流的方向被确定了，那么fwide函数不会改变流的方向。否则，fwide会设置流的方向
如果mode小于0，流将被设置为字节方向；如果mode大于0，流将被设置为宽方向。如果mode为0，则不改变方向
无论是否改变，返回值都会存在，用于确定流的方向

标准输入、标准输出和标准错误

就像前文提到的一样，进程自动会打开三个文件描述符。我们知道文件描述符和文件关联，就像是预定义了三个文件描述符STDIN_FILENO、STDOUT_FILENO、STDERR_FILENO，标准库也提供了预定义的stdin、stdout、stderr文件指针。

缓冲

前面提到过，Unix系统自己提供的是不带缓冲的I/O函数。缓冲的意义就是为了减少调用read和write的次数，标准库对每个流都自动管理缓冲，这样开发者就不会为了缓冲区到底用多大、跨平台标准不一致而苦恼了，但是也应当注意到，标准I/O函数库的提出已经经过很多年了，而且几乎没有改动过，所以就算是缓冲，也有很多的困扰出现。
在前面提到过，Unix系统实现在内核中都设有高速缓冲，大多数的磁盘IO都通过缓冲区进行，为了保证实际文件系统的一致性，系统还提供了一些磁盘同步函数。对于标准IO来说，它有三种缓冲类型

全缓冲，或者说，叫做块缓冲。相关标准IO函数会先使用malloc来获得固定大小的缓冲区，每当缓冲区被填满后就会进行实际的磁盘写入，开发者也可以手动调用fflush函数来强制将缓冲区写入磁盘，请记住，这个是标准C函数库的函数，和前面提到的Unix系统提供的磁盘同步系统调用时两码事
行缓冲。在输入输出时遇到换行符，自动进行IO写入。我们知道，标准IO库的缓冲区是固定的，所以只要填满了缓冲区，即使没有遇上换行符也会执行IO操作
不带缓冲。就如同Unix系统提供的write函数一样，标准库不对字符进行缓冲存储

需要注意的是，stderr通常是不带缓冲的，因为错误信息通常需要得到立即的输出处理。
ISO C标准只规定了

当且仅当标准输入输出不指向交互式设备时，他们才是全缓冲的
标准错误绝对不是全缓冲

非常暧昧的定义，结果开发者还得自己注意不同平台的实现。但是目前来说，大部分的系统规定都是一样的

标准错误不带缓冲
指向终端的流是行缓冲，其他则是全缓冲

从Mac OS X的系统手册上可以找到上面提到的东西

Three types of buffering are available: unbuffered, block buffered, and line buffered. When an output stream is unbuffered, information appears on the des-tination file or terminal as soon as written; when it is block buffered, many characters are saved up and written as a block; when it is line buffered,
characters are saved up until a newline is output or input is read from any stream attached to a terminal device (typically stdin). The function fflush(3) may be used to force the block out early. (See fclose(3).)

Normally, all files are block buffered. When the first I/O operation occurs on a file, malloc(3) is called and an optimally-sized buffer is obtained. If a stream refers to a terminal (as stdout normally does), it is line buffered. The standard error stream stderr is always unbuffered.

下面是更改缓冲类型的函数

void setbuf(FILE *restrict stream, char *restrict buf);
int setvbuf(FILE *restrict stream, char *restrict buf, int type, size_t size);

setbuf函数用于打开关闭缓冲机制，将一个长度为BUFSIZ的缓冲区传入参数，就会打开缓冲区，而传入null则会关闭缓冲区。
setvbuf函数功能十分强大，我们可以精确的说明缓冲类型

_IONBF->unbuffered 无缓冲
_IOLBF->line buffered 行缓冲
_IOFBF->fully buffered 全缓冲

一个不带缓冲的流可以忽略type和size参数，当一个缓冲传入的buf是null时，系统会自动分配缓冲区。
其实除了上面两个函数外，还有两个函数，但是由于原著未讲，所以这里也不提及。
其实setbuf函数除了没有返回值，就等同于setvbuf(stream, buf, buf ? _IOFBF : _IONBF, BUFSIZ);

int fflush(FILE *stream);

这个函数就是一个强制将缓冲区写入磁盘的函数，当stream是null时，将清洗所有缓冲区，还有一个fpurge函数，这里不提及。

打开流

FILE *fopen(const char *restrict filename, const char *restrict mode);
FILE *freopen(const char *restrict filename, const char *restrict mode, FILE *restrict stream);
FILE *fdopen(int fildes, const char *mode);

从字面意义上就能看出这些函数的作用，fopen就是打开一个文件，freopen则是在一个指定的流上重新打开文件，一般用于将文件在一个预定义流上打开，fdopen则是将一个文件描述符打开，主要用于管道和网络通信。
mode参数指定对IO流的读写方式，在学习C语言的时候可能就已经有所接触了

mode	说明	open(2)标志
r/rb	读打开	O_RDONLY
w/wb	写打开	O_WRONLY or O_CREAT or O_TRUNC
a/ab	追加	O_WRONLY or O_CREAT or O_APPEND
r+/r+b/rb+	读写打开	O_RDWR
w+/w+b/wb+	读写打开	O_RDWR or O_CREAT or O_TRUNC
a+/a+b/ab+	文件尾读写打开	O_RDWR or O_CREAT or O_APPEND

在系统手册中也有一段关于mode的讲解

The argument mode points to a string beginning with one of the following sequences (Additional characters may follow these sequences.):

``r'' Open text file for reading. The stream is positioned at the beginning of the file.

``r+'' Open for reading and writing. The stream is positioned at the beginning of the file.

``w'' Truncate to zero length or create text file for writing. The stream is positioned at the beginning of the file.

``w+'' Open for reading and writing. The file is created if it does not exist, otherwise it is truncated. The stream is positioned at the beginning of the file.

``a'' Open for writing. The file is created if it does not exist. The stream is positioned at the end of the file. Subsequent writes to the file will always end up at the then current end of file, irrespective of any intervening fseek(3) or similar.

``a+'' Open for reading and writing. The file is created if it does not exist. The stream is positioned at the end of the file. Subsequent writes to the file will always end up at the then current end of file, irrespective of any intervening fseek(3) or similar.

The mode string can also include the letter b'' either as last character or as a character between the characters in any of the two-character strings described above. This is strictly for compatibility with ISO/IEC 9899:1990 (ISO C90'') and has no effect; the ``b'' is ignored.

Finally, as an extension to the standards (and thus may not be portable), mode string may end with the letter x'', which insists on creating a new file when used with w'' or ``a''. If path exists, then an error is returned (this is the equivalent of specifying O_EXCL with open(2)).

上面就多了一个x参数，等价于open函数的O_EXCL参数。还有就是对于Unix缓解实际上二进制和普通文件没有任何区别，所以b参数将被忽视。

The fdopen() function associates a stream with the existing file descriptor, fildes. The mode of the stream must be compatible with the mode of the file descriptor. When the stream is closed via fclose(3), fildes is closed also.

原著中关于fdopen的讲解过于繁琐，实际上就是由于文件描述符已经存在，流的模式必须兼容文件描述符，并且当使用fclose关闭时，文件描述符也被关闭。

Any created files will have mode "S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH" (0666), as modified by the process' umask value (see umask(2)).

还记得前面到的open和creat函数可以指定文件的权限，但是标准库对于文件只有一种方式，就是以0666的模式创建文件，但是会被umask掩码字删减权限。

int fclose(FILE *stream);

很简单，就是将缓冲区内容写入磁盘并关闭文件，如果缓冲区是自动分配则会自动回收缓冲区。

读写流

在学习C语言输入输出的时候，书上讲的是scanf和printf格式化输入输出函数，但是除了格式化输入输出以外，还有三种非格式化IO

一次读写一个字符
一次读写一行
直接IO，也叫作二进制读写

int getc(FILE *stream);
int fgetc(FILE *stream);
int getchar(void);

The fgetc() function obtains the next input character (if present) from the stream pointed at by stream, or the next character pushed back on the stream via ungetc(3).

The getc() function acts essentially identically to fgetc(), but is a macro that expands in-line.

The getchar() function is equivalent to getc(stdin).

实际上还有三个读取函数，但是不在介绍范围内。上面已经把三个函数都介绍了，fget就是获得下一个输入字符，getc函数等价于fgetc，但是可以被实现为宏定义用于内联，getchar函数等同于getc(stdin)，所以在实际开发中，应当注意fgetc和getc的区别。

If successful, these routines return the next requested object from the stream. Character values are returned as an unsigned char converted to an int. If the stream is at end-of-file or a read error occurs, the routines return EOF. The routines feof(3) and ferror(3) must be used to distinguish between end-of-file and error. If an error occurs, the global variable errno is set to indicate the error. The end-of-file condition is remembered, even on a termi-nal, and all subsequent attempts to read will return EOF until the condition is cleared with clearerr(3).

书上写的挺繁琐的，笔者就将手册说明抄录在上面，看不懂原著的解释，看上面的解释就懂了，

int ferror(FILE *stream);
int feof(FILE *stream);

void clearerr(FILE *stream);

这两个函数用于区分EOF和错误发生，我们注意到，这里的参数是一个stream的文件指针，那我们是否可以猜测错误码和文件结束标志是和文件结构体相关，笔者并没有在文件结构体中找到这两个标志，所以也只能当做猜测。原著中则明确指出了大多数实现中存在这两个标志。最后一个函数则是用于清除这两个标志。

int ungetc(int c, FILE *stream);

这个函数就是将已经读取的字符反压回流。

int putc(int c, FILE *stream);
int fputc(int c, FILE *stream);
int putchar(int c);

就如同前面的输入函数一样，这里就不在重复讲述了。

一次读取一行IO

char *fgets(char * restrict str, int size, FILE * restrict stream);
char *gets(char *str);

The fgets() function reads at most one less than the number of characters specified by size from the given stream and stores them in the string str. Read-ing stops when a newline character is found, at end-of-file or error. The newline, if any, is retained. If any characters are read and there is no error,a `\0' character is appended to end the string.

The gets() function is equivalent to fgets() with an infinite size and a stream of stdin, except that the newline character (if any) is not stored in the string. It is the caller's responsibility to ensure that the input line, if any, is sufficiently short to fit in the string.

fgets函数读取不超过size参数规定的字符串从给定的流中，并将其存储在str字符串中，读取一直会持续到新行、EOF、错误发生，并且null字节永远会在字符串末尾，这也就是说，读取的字符串最大长度是size - 1。
gets函数等价于一个指定了无限的size和标准输入的fgets函数，并且函数不会将换行符存储在str参数中，但是调用者得自行保证输入足够短能存放在str参数中。由于可能会导致缓冲区溢出，所以实际上这个函数不被推荐使用。

int fputs(const char *restrict s, FILE *restrict stream);
int puts(const char *s);

puts函数不像gets函数一样不安全，但是实际上也应当少用，因为它会在一行输出后，再次输出一个换行符，而fgets和fputs则需要我们自己处理换行符。

二进制IO

size_t fread(void *restrict ptr, size_t size, size_t nitems, FILE *restrict stream);
size_t fwrite(const void *restrict ptr, size_t size, size_t nitems, FILE *restrict stream);

The function fread() reads nitems objects, each size bytes long, from the stream pointed to by stream, storing them at the location given by ptr.
The function fwrite() writes nitems objects, each size bytes long, to the stream pointed to by stream, obtaining them from the location given by ptr.

fread读取nitems个对象，每个size字节长，从stream流中读取，存储在ptr位置，fwrite写入nitems个对象，每个size字节长，写到stream流中，从ptr位置读取。两句话就能说明这两个函数的作用。

The functions fread() and fwrite() advance the file position indicator for the stream by the number of bytes read or written. They return the number of objects read or written. If an error occurs, or the end-of-file is reached, the return value is a short object count (or zero).
The function fread() does not distinguish between end-of-file and error; callers must use feof(3) and ferror(3) to determine which occurred. The function fwrite() returns a value less than nitems only if a write error has occurred.

返回值是读写的对象数目，如果到达了底部或者出错，则返回实际写入的对象数和0，需要feof和ferror来判断区分。

定位流

和Unix系统提供的无缓冲IO一样，标准C库也提供了流定位函数

ftell和fseek函数。非常古老的函数，最好少用
ftello和fseeko函数。只是把稳健偏移量类型从long换成了off_t
fgetpos和fsetpos函数。被ISO C引入的，使用抽象文件位置记录位置，跨平台推荐使用

long ftell(FILE *stream);
int fseek(FILE *stream, long offset, int whence);
void rewind(FILE *stream);

这些函数单位是都是字节，其中whence和Unix系统的lseek函数是一样的，rewind就是把流设置到头位置。

off_t ftello(FILE *stream);
int fseeko(FILE *stream, off_t offset, int whence);

除了单位不同，和ftell、fseek没有区别

int fgetpos(FILE *restrict stream, fpos_t *restrict pos);
int fsetpos(FILE *stream, const fpos_t *pos);

格式化IO

格式化IO函数可能是我们使用的最多最熟悉的，就5个函数

int printf(const char * restrict format, ...);
int fprintf(FILE * restrict stream, const char * restrict format, ...);
int dprintf(int fd, const char * restrict format, ...);
int sprintf(char * restrict str, const char * restrict format, ...);
int snprintf(char * restrict str, size_t size, const char * restrict format, ...);

实际上还有其他输出函数，但是这里也不提及，printf就是向标准输出写，fprintf是向指定流写，dprintf是向文件描述符写，sprintf和snprintf都是是向一个字符串写，但是snprintf加入了size参数确定大小，sprintf由于存在缓冲区溢出的隐患，所以也不建议使用了。
至于格式控制字符串如何书写，原著中有，手册中也非常详细，但是由于过长，所以不再摘录出来。
下面是printf函数族的变体

int vprintf(const char * restrict format, va_list ap);
int vfprintf(FILE * restrict stream, const char * restrict format, va_list ap);
int vsprintf(char * restrict str, const char * restrict format, va_list ap);
int vsnprintf(char * restrict str, size_t size, const char * restrict format, va_list ap);
int vdprintf(int fd, const char * restrict format, va_list ap);

这些函数被放在文件中，只是将可变参数表改成了va_list。

格式化输入

int scanf(const char *restrict format, ...);
int fscanf(FILE *restrict stream, const char *restrict format, ...);
int sscanf(const char *restrict s, const char *restrict format, ...);

与输出相同，输入函数也有对应的变体

int vscanf(const char *restrict format, va_list arg);
int vfscanf(FILE *restrict stream, const char *restrict format, va_list arg);
int vsscanf(const char *restrict s, const char *restrict format, va_list arg);

实现细节

实际上在Unix系统中，标准C库最终都是会调用系统提供的接口，所以在FILE结构体中，我们可以看到文件描述符的存在

int fileno(FILE *stream);

虽然fileno函数是一个标准C库函数，但是却是POSIX规定的。
在学习Unix系统开发的时候，多查系统手册，有了疑问多看看源代码，是永远不会错的，而且有一些特殊的小技巧可以帮助我们开发找到错误

> cc -E xxx.c

就比如上面，让C编译器只进行预处理，然后就能得到预处理后的文件，这样看起来就更容易了，不需要我们在多个头文件中反复跳转。

临时文件

char *tmpnam(char *s);
FILE *tmpfile(void);

tmpnam函数产生一个有效路径名字符串，如果s参数为null，则所产生的路径存放在静态区，然后将指针返回，当继续调用时，将会重写整个静态区，如果ptr不是null，则将其存放在ptr中，ptr也作为函数值返回。
tmpfile创建一个临时文件，并且在文件关闭时删除文件，我们知道，进程结束时会自动关闭所有文件，所以当进程结束时，也会删除文件。

#include "include/apue.h"

int main(int argc, char *argv[])
{
    char name[L_tmpnam], line[MAXLINE];
    FILE *fp;
    
    printf("%s\n", tmpnam(NULL));
    
    tmpnam(name);
    printf("%s\n", name);
    
    if ((fp = tmpfile()) == NULL)
        err_sys("tmpfile error");
    fputs("one line of output\n", fp);
    rewind(fp);
    if (fgets(line, sizeof(line), fp) == NULL)
        err_sys("fgets error");
    fputs(line, stdout);
    
    exit(0);
}

通常开发者是先调用tmpnam产生唯一路径，然后使用该路径创建文件，并立即unlink，前文说过，对一个打开的文件使用unlink等命令时，不会立即删除文件，而是等到最后文件关闭才删除。

char *mkdtemp(char *template);
int mkstemp(char *template);

The mkstemp() function makes the same replacement to the template and creates the template file, mode 0600, returning a file descriptor opened for reading and writing. This avoids the race between testing for a file's existence and opening it for use.

The mkdtemp() function makes the same replacement to the template as in mktemp() and creates the template directory, mode 0700.

模板字符串是长这样的/tmp/temp.XXXXXX，最后六位是被替换掉的。还有，要注意不要使用常亮字符串作为template。

内存流

在普通的流中，是和磁盘上的实际文件关联在一起的，那么，是否存在一种虚拟的文件，将一块内存区域当做文件来读写呢，实际上，是存在的，原著中提到了三个标准库函数用于内存流的创建，但是非常遗憾，这是glibc专属的，没有glibc的系统只能自己实现，所以这里也不讲述内存流，如果有需要的朋友可以对照Linux下的手册和原著学习。

标准IO的替代

在文章最前面也提到了，标准函数库已经有很久没有做出修改了，而且从一路的学习下来，我们也看到了标准函数库实际上存在着许多的缺陷。甚至很多东西在不同的系统上有着不同的表现，我们甚至还无法确定这个标准的存在。所以现在也有许多的替代品被提出来方便开发者使用。但是要记得，不管库怎么变化，实际上的系统调用还是那个。