A file is an area of memory in which information is stored. Normally, a file is kept in some sort of permanent memory, such as a hard disk, USB flash drive, or optical disc, such as a DVD.
C has many library functions for opening, reading, writing, and closing files. On one level, it can deal with files by using the basic file tools of the host operating system. This is called low-level I/O . Because of the many differences among computer systems, it is impossible to create a standard library of universal low-level I/O functions, and ANSI C does not attempt to do so; however, C also deals with files on a second level called the
standard I/O package. This involves creating a standard model and a standard set of I/O functions for dealing with files. At this higher level, differences between systems are handled by specific C implementations so that you deal with a uniform interface.
Different systems, for example, store files differently. Some store the file contents in one place and information about the file elsewhere. Some build a description of the file into the file itself. In dealing with text, some systems use a single newline character to mark the end of a line. Others might use the combination of the carriage return and linefeed characters to represent the end of a line. Some systems measure file sizes to the nearest byte; some measure in blocks of bytes.
When you use the standard I/O package, you are shielded from these differences. Therefore, to check for a newline, you can use if (ch == '\n')
. If the system actually uses the carriagereturn/ linefeed combination, the I/O functions automatically translate back and forth between the two representations.
Conceptually, the C program deals with a stream instead of directly with a file. A stream is an idealized flow of data to which the actual input or output is mapped. That means various kinds of input with differing properties are represented by streams with more uniform properties. The process of opening a file then becomes one of associating a stream with the file, and reading and writing take place via the stream.
C treats input and output devices the same as it treats regular files on storage devices. In particular, the
keyboard and the display device are treated as files opened automatically by every C program. Keyboard input is represented by a stream called stdin , and output to the screen (or teletype or other output device) is represented by a stream called stdout . The getchar()
, putchar()
, printf()
, and scanf()
functions are all members of the standard I/O package, and they deal with these two streams.
You can use the same techniques with keyboard input as you do with files. For example, a program reading a file needs a way to detect the end of the file so that it knows where to stop reading. Therefore, C input functions come equipped with a builtin, end-of-file detector. Because keyboard input is treated like a file, you should be able to use that end-of-file detector to terminate keyboard input, too. Let’s see how this is done, beginning with files.
One method to detect the end of a file is to place a special character in the file to mark the end. This is the method once used, for example, in CP/M, IBM-DOS, and MS-DOS text files. Today, these operating systems may use an embedded Ctrl+Z character to mark the ends of files. At one time, this was the sole means these operating systems used, but there are other options now, such as keeping track of the file size. So a modern text file may or may not have an embedded Ctrl+Z
, but if it does, the operating system will treat it as an end-of-file marker.
A second approach is for the operating system to store information on the size of the file. If a file has 3000 bytes and a program has read 3000 bytes, the program has reached the end. MS-DOS and its relatives use this approach for binary files because this method allows the files to hold all characters, including Ctrl+Z. Newer versions of DOS also use this approach for text files. Unix uses this approach for all files.
C handles this variety of methods by having the getchar() function return a special value when the end of a file is reached, regardless of how the operating system actually detects the end of file. The name given to this value is EOF (end of file). Therefore, the return value for getchar() when it detects an end of file is EOF . The scanf() function also returns EOF on detecting the end of a file. Typically, EOF is defined in the stdio.h
file as follows:
#define EOF (-1)
getchar()
returns a value in the range 0 through 127 , because those are values corresponding to the standard character set, but it might return values from 0 through 255 if the system recognizes an extended character set. In either case, the value -1 does not correspond to any character, so it can be used to signal the end of a file.
Some systems may define EOF to be a value other than -1 , but the definition is always different from a return value produced by a legitimate input character. If you include the stdio.h file and use the EOF symbol, you don’t have to worry about the numeric definition. The important point is that EOF represents a value that signals the end of a file was detected; it is not a symbol actually found in the file.
Okay, how can you use EOF in a program? Compare the return value of getchar()
with EOF
. If they are different, you have not yet reached the end of a file. In other words, you can use an expression like this:
while ((ch = getchar()) != EOF)
What if you are reading keyboard input and not a file? Most systems (but not all) have a way to simulate an end-of-file condition from the keyboard.
程序示例:
下面的程序从键盘读入内容, 再打印读入的内容.
#include
int main(void)
{
int ch;
while ((ch = getchar()) != EOF)
{
putchar(ch);
}
return 0;
}
结果:
ok
ok
hello
hello
^Z
用 Ctrl + Z
模拟文件结尾, 结束键盘输入.
The concept of simulated EOF arose in a command-line environment using a text interface. In such an environment, the user interacts with a program through keystrokes, and the operating system generates the EOF signal. Some practices don’t translate particularly well to graphical interfaces, such as Windows and the Macintosh, with more complex user interfaces that incorporate mouse movement and button clicks. The program behavior on encountering a simulated EOF depends on the compiler and project type.
这里需要把 ch 声明为 int 类型, 因为 char 类型只能储存无符号整数, 但是这里需要返回 -1 以便和 EOF 进行比较, 且 getchar() 函数的返回值也是 int 类型的.
The fact that getchar() is type int is why some compilers warn of possible data loss if you assign the getchar() return value to a type char variable.
Input and output involve functions, data, and devices.
默认情况下, C 使用标准 I/O 包查找标准输入作为输入源, 就是 stdin 流, 输入设备是键盘, 输入数据流由字符组成.
程序不关心是从哪里获得数据, 这些设备用一个更抽象更统一的概念来表达, 即流. I/O 函数都是和流做交互而不是和某个硬件设备或某个文件.
因此程序可以从磁带 (magnetic tape), 穿孔卡 (punched cards) 或电传打字机 (teletype) 或者文件中获取输入.
程序可以通过两种方式使用文件, 一种是显式地使用某种函数打开文件, 关闭文件, 读取文件, 写入文件等等. 第二种方法是重定向程序从文件输入或输出到文件.
One major problem with redirection is that it is associated with the operating system, not C. However, the many C environments, including Unix, Linux, and the Windows Command - Prompt mode, feature redirection, and some C implementations simulate it on systems lacking the feature.
程序:
#include
int main(void)
{
int ch;
while ((ch = getchar()) != EOF)
{
putchar(ch);
}
return 0;
}
程序名称: echo_eof.c
编译程序得到可执行文件: echo_eof.exe
如图:
在命令行模式下输入 echo_eof
即可执行这个程序, 如图:
现在假设将程序的输入重定向到文件 word 中, 先建立文本文件 word, 内容为: hello world!
如图:
注意, words 文件没有后缀.
在命令行中重定向输入为文件 words:
The < symbol is a Unix and Linux and DOS/Windows redirection operator. It causes the words file to be associated with the stdin stream, channeling the file contents into the echo_eof program. The echo_eof program itself doesn’t know (or care) that the input is coming from a file instead of the keyboard. All it knows is that a stream of characters is being fed to it, so it reads them and prints them one character at a time until the end of file shows up. Because C puts files and I/O devices on the same footing, the file is now the I/O device.
With Unix, Linux, and Windows Command Prompt, the spaces on either side of the < are optional.
还可以把 echo_eof
的输出重定向到文件, 使得该程序从键盘读取输入, 再将其保存在文件中, 如图:
只有在一行的开头输入 Ctrl + Z
才能结束输入.
The > is a second redirection operator. It causes a new file called mywords to be created for your use, and then it redirects the output of echo_eof (that is, a copy of the characters you type) to that file. The redirection reassigns stdout from the display device (your screen) to the mywords file instead. If you already have a file with the name mywords , normally it would be erased and then replaced by the new one. (Many operating systems, however, give you the option of protecting existing files by making them read-only.) All that appears on your screen are the letters as you type them, and the copies go to the file instead. To end the program, press Ctrl+D (Unix) or Ctrl+Z (DOS) at the beginning of a line.
Now suppose you want to make a copy of the file mywords and call it savewords . Just issue this next command:
echo_eof < mywords > savewords
and the deed is done. The following command would have worked as well, because the order of redirection operations doesn’t matter:
echo_eof > savewords < mywords
Beware: Don’t use the same file for both input and output to the same command.
echo_eof < mywords > mywords // wrong
The reason is that > mywords causes the original mywords to be truncated to zero length before it is ever used as input.
In brief, here are the rules governing the use of the two redirection operators ( < and > ) with Unix, Linux, or Windows/DOS:
■ A redirection operator connects an executable program (including standard operating system commands) with a data file. It cannot be used to connect one data file to another, nor can it be used to connect one program to another program.
■ Input cannot be taken from more than one file, nor can output be directed to more than one file by using these operators.
■ Normally, spaces between the names and operators are optional, except occasionally when some characters with special meaning to the Unix shell or Linux shell or the Windows Command Prompt mode are used.
Unix, Linux, and Windows/DOS also feature the >> operator, which enables you to add data to the end of an existing file, and the pipe operator ( | ), which enables you to connect the output of one program to the input of a second program.