大文件指的是超过4G的文件。在32bit机器上操作这样的大文件时,会出现问题。具体的,下面会具体讲解。
首先32位机器用fopen/fclose打开大文件没有问题,顺序读写操作while(!feof(fp)){ fread / fgets / fscanf }或while(1){ fwrite / fputs / fprintf} 也没有问题。由于32位机器下long是32位,故fseek (FILE *stream,longoffset, int whence)和long ftell(FILE *stream) 不能访问4G以上文件。另外,要用 fseeko(FILE *stream,off_toffset, int whence)和off_tftello(FILE *stream);代替fseek和ftell。这样,只要你用64bit的类型(off_t,long(64位机器),和longlong(32位机器) 或int64_t/uint64_t)声明offset作为fseeko的参数输入,就可以操作4G以上的文件了。
fseeko和ftello的具体说明见本文《ftello&fseeko》,在文章最后面。
注:文件open操作返回的是文件描述符,并没有将文件读入内存。文件内容只有通过read调用时才读才将相应的内容读入内存。
类型off_t 的定义在 <sys/types.h>里面:
# ifndef __USE_FILE_OFFSET64 typedef __off_t off_t; # else typedef __off64_t off_t; # endif
off_t在32位机器中是32bit,64位机器中是64bit。那么,在32位机器中,在include之前加入宏定义:#define _FILE_OFFSET_BITS 64,或者编译是加入-D_FILE_OFFSET_BITS 64告诉系统在文件内部使用64位的偏移地址,使off_t变成__off64_t类型。然后,将ftell、fseek换成对应的ftello、fseeko就可以操作大文件了。
另外看到一些英文说法,和上述解决方法差不多,为了便于理解,粘贴如下:
In a nutshell for using LFS you can choose either of the following:
dd if=/dev/zero of=tt bs=1G seek=100 count=0
#include <stdio.h> #include <sys/stat.h> #include <sys/types.h> #include <unistd.h> #include <assert.h> #define FILENAME "tt" #define READBUFSIZE 100 int main () { struct stat buf; FILE *stream = fopen (FILENAME, "rw"); char readbuf[READBUFSIZE]; size_t ret = 0; printf ("\nfollowing messages present system info.\n"); printf ("sizeof(size_t) = %d, sizeof(off_t) = %d\n", sizeof (size_t), sizeof (off_t)); system ("getconf LONG_BIT"); system ("uname -a"); printf ("\nfollowing messages present file info.\n\n"); if (stat (FILENAME, &buf) != 0) { perror ("stat:"); return -1; } printf ("file size is %lld Byte\n", buf.st_size); if (!stream) { perror ("fopen:"); return -1; } { ret = fread (readbuf, READBUFSIZE, 1, stream); printf ("fread:%u Byte\n", ret * READBUFSIZE); assert (ret == 1); } printf ("current pos is %lld Byte\n", ftello (stream)); if (fseeko (stream, -READBUFSIZE, SEEK_END) != 0) { perror ("fseeko:"); return -1; } { ret = fread (readbuf, READBUFSIZE, 1, stream); printf ("fread:%u Byte\n", ret * READBUFSIZE); assert (ret == 1); } printf ("after read last %d Byte ,cur pos is %lld Byte\n", READBUFSIZE, ftello (stream)); return 0; }测试程序二:
#include <stdio.h> #include <sys/stat.h> #include <sys/types.h> #include <unistd.h> #include <assert.h> #define FILENAME "tt" #define READBUFSIZE 100 int main () { struct stat buf; FILE *stream = fopen (FILENAME, "rw"); char readbuf[READBUFSIZE]; size_t ret = 0; printf ("\nfollowing messages present system info.\n"); printf ("sizeof(size_t) = %d, sizeof(off_t) = %d\n", sizeof (size_t), sizeof (off_t)); system ("getconf LONG_BIT"); system ("uname -a"); printf ("\nfollowing messages present file info.\n\n"); if (stat (FILENAME, &buf) != 0) { perror ("stat:"); return -1; } printf ("file size is %lld Byte\n", buf.st_size); if (!stream) { perror ("fopen:"); return -1; } { ret = fread (readbuf, READBUFSIZE, 1, stream); printf ("fread:%u Byte\n", ret * READBUFSIZE); assert (ret == 1); } printf ("current pos is %lld Byte\n", ftell(stream)); if (fseek(stream, -READBUFSIZE, SEEK_END) != 0) { perror ("fseeko:"); return -1; } { ret = fread (readbuf, READBUFSIZE, 1, stream); printf ("fread:%u Byte\n", ret * READBUFSIZE); assert (ret == 1); } printf ("after read last %d Byte ,cur pos is %lld Byte\n", READBUFSIZE, ftell (stream)); return 0; }
following messages present system info.
sizeof(size_t) = 8, sizeof(off_t) = 8
64
Linux SPA 2.6.18-194.17.1.b1.05 #3 SMP Fri Jan 25 15:14:45 CST 2013 x86_64 x86_64 x86_64 GNU/Linux
following messages present file info.
file size is 107374182400 Byte
fread:100 Byte
current pos is 100 Byte
fread:100 Byte
after read last 100 Byte ,cur pos is 107374182400 Byte
following messages present system info.
sizeof(size_t) = 4, sizeof(off_t) = 4
32
Linux localhost.localdomain 2.6.32-220.el6.i686 #1 SMP Tue Dec 6 16:15:40 GMT 2011 i686 i686 i386 GNU/Linux
following messages present file info.
stat:: Value too large for defined data type
following messages present system info.
sizeof(size_t) = 4, sizeof(off_t) = 8
32
Linux localhost.localdomain 2.6.32-220.el6.i686 #1 SMP Tue Dec 6 16:15:40 GMT 2011 i686 i686 i386 GNU/Linux
following messages present file info.
file size is 107374182400 Byte
fread:100 Byte
current pos is 100 Byte
fread:100 Byte
after read last 100 Byte ,cur pos is 107374182400 Byte
对于测试程序二:
有着和测试程序一同样的结果。
从测试结果并没有发现ftell和ftello之间的差别,只要加上-D_FILE_OFFSET_BITS=64 选项,程序都可以正确运行。可,为什么最后ftell返回的long型返回值,可以输出那么大的数值呢?
也许问题就出在你这里,我们在打印ftell的返回值时,使用的是%lld格式。为此,我将程序中ftell打印的地方,都换成%ld进行输出,然而,程序依然可以运行,只是,ftell的输出值有的出现了溢出,导致输出信息出错。
至此,无论如何,API都能正常工作,至于原因,因为测试用例的测试点有误,需要改进测试用例。调用ftell和fseek时,如果文件位置超出了32bit数可以表示的范围,那么fseek和ftell将不能正常工作。
Linux man ftello部分信息:
NAME
fseeko, ftello - seek to or report file position
SYNOPSIS
#include <stdio.h>
int fseeko(FILE *stream, off_t offset, int whence);
off_t ftello(FILE *stream);
DESCRIPTION
The fseeko() and ftello() functions are identical to fseek() and ftell() (see fseek(3)), respec-
tively, except that the offset argument of fseeko() and the return value of ftello() is of type
off_t instead of long.
On many architectures both off_t and long are 32-bit types, but compilation with
#define _FILE_OFFSET_BITS 64
will turn off_t into a 64-bit type.
RETURN VALUE
On successful completion, fseeko() returns 0, while ftello() returns the current offset. Other-
wise, -1 is returned and errno is set to indicate the error.
NAME
fgetpos, fseek, fsetpos, ftell, rewind - reposition a stream
SYNOPSIS
#include <stdio.h>
int fseek(FILE *stream, long offset, int whence);
long ftell(FILE *stream);
void rewind(FILE *stream);
int fgetpos(FILE *stream, fpos_t *pos);
int fsetpos(FILE *stream, fpos_t *pos);
DESCRIPTION
The fseek() function sets the file position indicator for the stream pointed to by stream. The new
position, measured in bytes, is obtained by adding offset bytes to the position specified by whence.
If whence is set to SEEK_SET, SEEK_CUR, or SEEK_END, the offset is relative to the start of the file,
the current position indicator, or end-of-file, respectively. A successful call to the fseek() func-
tion clears the end-of-file indicator for the stream and undoes any effects of the ungetc(3) function
on the same stream.
The ftell() function obtains the current value of the file position indicator for the stream pointed
to by stream.
The rewind() function sets the file position indicator for the stream pointed to by stream to the
beginning of the file. It is equivalent to:
(void) fseek(stream, 0L, SEEK_SET)
except that the error indicator for the stream is also cleared (see clearerr(3)).
The fgetpos() and fsetpos() functions are alternate interfaces equivalent to ftell() and fseek()
(with whence set to SEEK_SET), setting and storing the current value of the file offset into or from
the object referenced by pos. On some non-Unix systems an fpos_t object may be a complex object and
these routines may be the only way to portably reposition a text stream.
RETURN VALUE
The rewind() function returns no value. Upon successful completion, fgetpos(), fseek(), fsetpos()
return 0, and ftell() returns the current offset. Otherwise, -1 is returned and errno is set to
indicate the error.
ERRORS
EBADF The stream specified is not a seekable stream.
EINVAL The whence argument to fseek() was not SEEK_SET, SEEK_END, or SEEK_CUR.
Linux man fstat部分信息:
NAME
stat, fstat, lstat - get file status
SYNOPSIS
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
int stat(const char *path, struct stat *buf);
int fstat(int filedes, struct stat *buf);
int lstat(const char *path, struct stat *buf);
DESCRIPTION
These functions return information about a file. No permissions are required on the file itself,
but ? in the case of stat() and lstat() ? execute (search) permission is required on all of the
directories in path that lead to the file.
stat() stats the file pointed to by path and fills in buf.
lstat() is identical to stat(), except that if path is a symbolic link, then the link itself is
stat-ed, not the file that it refers to.
fstat() is identical to stat(), except that the file to be stat-ed is specified by the file
descriptor filedes.
All of these system calls return a stat structure, which contains the following fields:
struct stat {
dev_t st_dev; /* ID of device containing file */
ino_t st_ino; /* inode number */
mode_t st_mode; /* protection */
nlink_t st_nlink; /* number of hard links */
uid_t st_uid; /* user ID of owner */
gid_t st_gid; /* group ID of owner */
dev_t st_rdev; /* device ID (if special file) */
off_t st_size; /* total size, in bytes */
blksize_t st_blksize; /* blocksize for filesystem I/O */
blkcnt_t st_blocks; /* number of blocks allocated */
time_t st_atime; /* time of last access */
time_t st_mtime; /* time of last modification */
time_t st_ctime; /* time of last status change */
};
在此处附上stat的信息,是想说明在stat获取的文件大小st_size也是off_t类型的。