APUE Chapter 4 - Files and Directories

4.1. Introduction

4.2. stat, fstat, and lstat Function

#include <sys/stat.h>
int stat(const char *restrict pathname, struct stat *restrict buf);
int fstat(int filedes, struct stat *buf);
int lstat(const char *restrict pathname, struct stat *restrict buf);

Given a pathname, the stat function returns a structure of information about the named file. Thefs tat function obtains information about the file that is already open on the descriptor filedes. The lstat function is similar to stat, but when the named file is a symbolic link, lstat returns information about the symbolic link, not the file referenced by the symbolic link.

The second argument is a pointer to a structure that we must supply. The function fills in the structure pointed to by buf. The definition of the structure can differ among implementations, but it could look like

struct stat {
mode_t st_mode; /* file type & mode (permissions) */
ino_t st_ino; /* i-node number (serial number) */
dev_t st_dev; /* device number (file system) */
dev_t st_rdev; /* device number for special files */
nlink_t st_nlink; /* number of links */
uid_t st_uid; /* user ID of owner */
gid_t st_gid; /* group ID of owner */
off_t st_size; /* size in bytes, for regular files */
time_t st_atime; /* time of last access */
time_t st_mtime; /* time of last modification */
time_t st_ctime; /* time of last file status change */
blksize_t st_blksize; /* best I/O block size */
blkcnt_t st_blocks; /* number of disk blocks allocated */
};

4.3. File Types

We've talked about two different types of files so far: regular files and directories. Most files on a UNIX system are either regular files or directories, but there are additional types of files. The types are:

1. Regular file. The most common type of file, which contains data of some form. There is no distinction to the UNIX kernel whether this data is text or binary. Any interpretation of the contents of a regular file is left to the application processing the file. One notable exception to this is with binary executable files. To execute a program, the kernel must understand its format. All binary executable files conform to a format that allows the kernel to identify where to load a program's text and data.
2. Directory file. A file that contains the names of other files and pointers to information on these files. Any process that has read permission for a directory file can read the contents of the directory, but only the kernel can write directly to a directory file. Processes must use the functions described in this chapter to make changes to a directory.
3. Block special file. A type of file providing buffered I/O access in fixed-size units to devices such as disk drives.
4. Character special file. A type of file providing unbuffered I/O access in variable-sized units to devices. All devices on a system are either block special files or character special files.
5. FIFO. A type of file used for communication between processes. It's sometimes called a named pipe.
6. Socket. A type of file used for network communication between processes. A socket can also be used for non-network communication between processes on a single host.
7. Symbolic link. A type of file that points to another file.

 

4.4. Set-User-ID and Set-Group-ID

4.5. File Access Permissions

whenever we want to open any type of file by name, we must have execute permission in each directory mentioned in the name, including the current directory, if it is implied. This is why the execute permission bit for a directory is often called the search bit.

For example, to open the file "/usr/include/stdio.h", we need execute permission in the directory "/", execute permission in the directory "/usr", and execute permission in the directory "/usr/include". We then need appropriate permission for the file itself, depending on how we're trying to open it: read-only, readwrite, and so on.

If the current directory is "/usr/include", then we need execute permission in the current directory to open the file "stdio.h". This is an example of the current directory being implied, not specifically mentioned. It is identical to our opening the file "./stdio.h".

Note that read permission for a directory and execute permission for a directory mean different things. Read permission lets us read the directory, obtaining a list of all the filenames in the directory. Execute permission lets us pass through the directory when it is a component of a pathname that we are trying to access. (We need to search the directory to look for a specific filename.)

4.6. Ownership of New Files and Directories

4.7. access Function

4.8. umask Function

4.9. chmod and fchmod Functions

chmod function updates only the time that the i-node was last changed. By default, the "ls -l" lists the time when the contents of the file were last modified.

4.10. Sticky Bit

4.11. chown, fchown, and lchown Functions

4.12. File Size

4.13. File Truncation

4.14. File Systems

We can think of a disk drive being divided into one or more partitions. Each partition can contain a file system.
The i-nodes are fixed-length entries that contain most of the information about a file.

4.15. link, unlink, remove and rename Functions

#include <unistd.h>
int link(const char *existingpath, const char *newpath); //Returns: 0 if OK, 1 on error

This function creates a new directory entry, newpath, that references the existing file existingpath. If the newpath already exists, an error is returned. Only the last component of the newpath is created. The rest of the path must already exist.

To remove an existing directory entry, we call the unlink function.

#include <unistd.h>
int unlink(const char *pathname); //Returns: 0 if OK, 1 on error

This function removes the directory entry and decrements the link count of the file referenced by pathname. If there are other links to the file, the data in the file is still accessible through the other links. The file is not changed if an error occurs.

Only when the link count reaches 0 can the contents of the file be deleted. One other condition prevents the contents of a file from being deleted: as long as some process has the file open, its contents will not be deleted. When a file is closed, the kernel first checks the count of the number of processes that have the file open. If this count has reached 0, the kernel then checks the link count; if it is 0, the file's contents are deleted.

The program shown below opens a file and then unlinks it. The program then goes to sleep for 15 seconds before terminating.

#include "apue.h"
#include <fcntl.h>

int main(void)
{
if (open("tempfile", O_RDWR) < 0)
err_sys("open error");
if (unlink("tempfile") < 0)
err_sys("unlink error");
printf("file unlinked\n");
sleep(15);
printf("done\n");
exit(0);
}

Running this program gives us

$ ls -l tempfile                                     // look at how big the file is
-rw-r----- 1 sar 413265408 Jan 21 07:14 tempfile
$ df /home // check how much free space is available
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/hda4 11021440 1956332 9065108 18% /home
$ ./a.out & // run the program in the background
1364 // the shell prints its process ID
$ file unlinked // the file is unlinked
ls -l tempfile // see if the filename is still there
ls: tempfile: No such file or directory // the directory entry is gone
$ df /home // see if the space is available yet
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/hda4 11021440 1956332 9065108 18% /home
$ done // the program is done, all open files are closed
df /home // now the disk space should be available
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/hda4 11021440 1552352 9469088 15% /home // now the 394.1 MB of disk space are available

This property of unlink is often used by a program to ensure that a temporary file it creates won't be left around in case the program crashes. The process creates a file using either open or creat and then immediately calls unlink. The file is not deleted, however, because it is still open. Only when the process either closes the file or terminates, which causes the kernel to close all its open files, is the file deleted.

If pathname is a symbolic link, unlink removes the symbolic link, not the file referenced by the link. There is no function to remove the file referenced by a symbolic link given the name of the link.

The superuser can call unlink with pathname specifying a directory, but the function rmdir should be used instead to unlink a directory.

We can also unlink a file or a directory with the remove function. For a file, "remove" is identical to "unlink". For a directory, "remove" is identical to "rmdir".

#include <stdio.h>
int remove(const char *pathname); //Returns: 0 if OK, 1 on error

A file or a directory is renamed with the rename function.

#include <stdio.h>
int rename(const char *oldname, const char *newname); // Returns: 0 if OK, 1 on error

There are several conditions to describe, depending on whether oldname refers to a file, a directory, or a symbolic link. We must also describe what happens if newname already exists.
1. If oldname specifies a file that is not a directory, then we are renaming a file or a symbolic link. In this case,n ief wname exists, it cannot refer to a directory. If newname exists and is not a directory, it is removed, ando ldname is renamed to newname. We must have write permission for the directory containing oldname and for the directory containing newname, since we are changing both directories.
2. If oldname specifies a directory, then we are renaming a directory. Inf ewname exists, it must refer to a directory, and that directory must be empty. (When we say that a directory is empty, we mean that the only entries in the directory are dot and dot-dot.) If newname exists and is an empty directory, it is removed, and oldname is renamed to newname. Additionally, when we're renaming a directory, newname cannot contain a path prefix that names oldname. For example, we can't rename /usr/foo to /usr/foo/testdir, since the old name (/usr/foo) is a path prefix of the new name and cannot be removed.
3. If either oldname or newname refers to a symbolic link, then the link itself is processed, not the file to which it resolves.
4. As a special case, if the oldname and newname refer to the same file, the function returns successfully without changing anything.
If newname already exists, we need permissions as if we were deleting it. Also, because we're removing the directory entry foor ldname and possibly creating a directory entry for newname, we need write permission and execute permission in the directory containing oldname and in the directory containing newname.

 

4.16. Symbolic Links

4.17. symlink and readlink Functions

A symbolic link is created with the symlink function.

#include <unistd.h>
int symlink(const char *actualpath, const char *sympath); //Returns: 0 if OK, 1 on error

A new directory entry, sympath, is created that points to actualpath. It is not required that actualpath exist when the symbolic link is created. Also, actualpath and sympath need not reside in the same file system.

Because the open function follows a symbolic link, we need a way to open the link itself and read the name in the link. The readlink function does this.

#include <unistd.h>
ssize_t readlink(const char* restrict pathname, char *restrict buf, size_t bufsize); //Returns: number of bytes read if OK, 1 on error

This function combines the actions of open, read, and close. If the function is successful, it returns the number of bytes placed into buf. The contents of the symbolic link that are returned in buf are not null terminated.

4.18. File Times

The three time values associated with each file

Field       Description                            Example         ls option
st_atime last-access time of file data read -u
st_mtime last-modification time of file data write default
st_ctime last-change time of i-node status chmod, chown -c

Note the difference between the modification time (st_mtime) and the changed-status time (st_ctime). The modification time is when the contents of the file were last modified. The changed-status time is when the i-node of the file was last modified. In this chapter, we've described many operations that affect the i-node without changing the actual contents of the file: changing the file access permissions, changing the user ID, changing the number of links, and so on. Because all the information in the i-node is stored separately from the actual contents of the file, we need the changed-status time, in addition to the modification time.

Note that the system does not maintain the last-access time for an i-node. This is why the functions access and stat, for example, don't change any of the three times.

The access time is often used by system administrators to delete files that have not been accessed for a certain amount of time. The classic example is the removal of files named a.out or core that haven't been accessed in the past week. The "find" command is often used for this type of operation. The modification time and the changed-status time can be used to archive only those files that have had their contents modified or their i-node modified.

The "ls" command displays or sorts only on one of the three time values. By default, when invoked with either the -l or the -t option, it uses the modification time of a file. The -u option causes it to use the access time, and the -c option causes it to use the changed-status time.

4.19. utime Function

The access time and the modification time of a file can be changed with the utime function.

#include <utime.h>
int utime(const char *pathname, const struct utimbuf *times); // Returns: 0 if OK, 1 on error

The structure used by this function is

struct utimbuf {
time_t actime; /* access time */
time_t modtime; /* modification time */
}

The program shown below truncates files to zero length using the "O_TRUNC" option of the "open" function, but does not change their access time or modification time. To do this, the program first obtains the times with the "stat" function, truncates the file, and then resets the times with the "utime" function.

#include "apue.h"
#include <fcntl.h>
#include <utime.h>

int main(int argc, char *argv[])
{
int i, fd;
struct stat statbuf;
struct utimbuf timebuf;

for (i = 1; i < argc; i++) {
if (stat(argv[i], &statbuf) < 0) { /* fetch current times */
err_ret("%s: stat error", argv[i]);
continue;
}

if ((fd = open(argv[i], O_RDWR | O_TRUNC)) < 0) { /* truncate */
err_ret("%s: open error", argv[i]);
continue;
}

close(fd);

timebuf.actime = statbuf.st_atime;
timebuf.modtime = statbuf.st_mtime;

if (utime(argv[i], &timebuf) < 0) { /* reset times */
err_ret("%s: utime error", argv[i]);
continue;
}
}

exit(0);
}

We can demonstrate the program with the following script:

$ ls -l changemod times       // look at sizes and last-modification times
-rwxrwxr-x 1 sar 15019 Nov 18 18:53 changemod
-rwxrwxr-x 1 sar 16172 Nov 19 20:05 times
$ ls -lu changemod times   // look at last-access times
-rwxrwxr-x 1 sar 15019 Nov 18 18:53 changemod
-rwxrwxr-x 1 sar 16172 Nov 19 20:05 times
$ date      // print today's date
Thu Jan 22 06:55:17 EST 2004
$ ./a.out changemod times  // run the program above
$ ls -l changemod times   // and check the results
-rwxrwxr-x 1 sar 0 Nov 18 18:53 changemod
-rwxrwxr-x 1 sar 0 Nov 19 20:05 times
$ ls -lu changemod times  // check the last-access times also
-rwxrwxr-x 1 sar 0 Nov 18 18:53 changemod
-rwxrwxr-x 1 sar 0 Nov 19 20:05 times
$ ls -lc changemod times  // and the changed-status times
-rwxrwxr-x 1 sar 0 Jan 22 06:55 changemod
-rwxrwxr-x 1 sar 0 Jan 22 06:55 times

As we expect, the last-modification times and the last-access times are not changed. The changed-status times, however, are changed to the time that the program was run.

4.20. mkdir and rmdir Functions

Directories are created with the "mkdir" function and deleted with the "rmdir" function.

#include <sys/stat.h>
int mkdir(const char *pathname, mode_t mode); // Returns: 0 if OK, 1 on error

This function creates a new, empty directory. The entries for dot and dot-dot are automatically created. The specified file access permissions, mode, are modified by the file mode creation mask of the process.

An empty directory is deleted with the "rmdir" function. Recall that an empty directory is one that contains entries only for dot and dot-dot.

#include <unistd.h>
int rmdir(const char *pathname); // Returns: 0 if OK, 1 on error

If the link count of the directory becomes 0 with this call, and if no other process has the directory open, then the space occupied by the directory is freed. If one or more processes have the directory open when the link count reaches 0, the last link is removed and the dot and dot-dot entries are removed before this function returns. Additionally, no new files can be created in the directory. The directory is not freed, however, until the last process closes it. (Even though some other process has the directory open, it can't be doing much in the directory, as the directory had to be empty for the rmdir function to succeed.)

4.21. Reading Directories

#include <dirent.h>
DIR *opendir(const char *pathname); // Returns: pointer if OK, NULL on error
struct dirent *readdir(DIR *dp); // Returns: pointer if OK, NULL at end of directory or error
void rewinddir(DIR *dp);
int closedir(DIR *dp); // Returns: 0 if OK, 1 on error
long telldir(DIR *dp); // Returns: current location in directory associated with dp
void seekdir(DIR *dp, long loc);

The dirent structure defined in the file <dirent.h> is implementation dependent. Implementations define the structure to contain at least the following two members:

struct dirent {
ino_t d_ino; /* i-node number */
char d_name[NAME_MAX + 1]; /* null-terminated filename */
}

The "DIR" structure is an internal structure used by these six functions to maintain information about the directory being read. The purpose of the DIR structure is similar to that of the "FILE" structure maintained by the standard I/O library

The pointer to a DIR structure that is returned by "opendir" is then used with the other five functions. The opendir function initializes things so
that the first "readdir" reads the first entry in the directory. The ordering of entries within the directory is implementation dependent and is
usually not alphabetical.

We'll use these directory routines to write a program that traverses a file hierarchy.
Recursively descend a directory hierarchy, counting file types

#include "apue.h"
#include <dirent.h>
#include <limits.h>

/* function type that is called for each filename */
typedef int Myfunc(const char *, const struct stat *, int);

static Myfunc myfunc;
static int myftw(char *, Myfunc *);
static int dopath(Myfunc *);
static long nreg, ndir, nblk, nchr, nfifo, nslink, nsock, ntot;

int main(int argc, char *argv[])
{
int ret;
if (argc != 2)
err_quit("usage: ftw <starting-pathname>");

ret = myftw(argv[1], myfunc); /* does it all */

ntot = nreg + ndir + nblk + nchr + nfifo + nslink + nsock;

if (ntot == 0)
ntot = 1; /* avoid divide by 0; print 0 for all counts */

printf("regular files = %7ld, %5.2f %%\n", nreg, nreg*100.0/ntot);
printf("directories = %7ld, %5.2f %%\n", ndir, ndir*100.0/ntot);
printf("block special = %7ld, %5.2f %%\n", nblk, nblk*100.0/ntot);
printf("char special = %7ld, %5.2f %%\n", nchr, nchr*100.0/ntot);
printf("FIFOs = %7ld, %5.2f %%\n", nfifo, nfifo*100.0/ntot);
printf("symbolic links = %7ld, %5.2f %%\n", nslink, nslink*100.0/ntot);
printf("sockets = %7ld, %5.2f %%\n", nsock, nsock*100.0/ntot);

exit(ret);
}

/*
* Descend through the hierarchy, starting at "pathname".
* The caller's func() is called for every file.
*/

#define FTW_F 1 /* file other than directory */
#define FTW_D 2 /* directory */
#define FTW_DNR 3 /* directory that can't be read */
#define FTW_NS 4 /* file that we can't stat */

static char *fullpath; /* contains full pathname for every file */

/* we return whatever func() returns */
static int myftw(char *pathname, Myfunc *func)
{
int len;
fullpath = path_alloc(&len); /* malloc's for PATH_MAX+1 bytes */
strncpy(fullpath, pathname, len); /* protect against */
fullpath[len-1] = 0; /* buffer overrun */
return(dopath(func));
}

/*
* Descend through the hierarchy, starting at "fullpath".
* If "fullpath" is anything other than a directory, we lstat() it,
* call func(), and return. For a directory, we call ourself
* recursively for each name in the directory.
*/

/* we return whatever func() returns */
static int dopath(Myfunc* func)
{
struct stat statbuf;
struct dirent *dirp;
DIR *dp;
int ret;
char *ptr;

if (lstat(fullpath, &statbuf) < 0) /* stat error */
return(func(fullpath, &statbuf, FTW_NS));
if (S_ISDIR(statbuf.st_mode) == 0) /* not a directory */
return(func(fullpath, &statbuf, FTW_F));

/*
* It's a directory. First call func() for the directory,
* then process each filename in the directory.
*/

if ((ret = func(fullpath, &statbuf, FTW_D)) != 0)
return(ret);

ptr = fullpath + strlen(fullpath); /* point to end of fullpath */
*ptr++ = '/';
*ptr = 0;

if ((dp = opendir(fullpath)) == NULL) /* can't read directory */
return(func(fullpath, &statbuf, FTW_DNR));

while ((dirp = readdir(dp)) != NULL) {
if (strcmp(dirp->d_name, ".") == 0 || strcmp(dirp->d_name, "..") == 0)
continue; /* ignore dot and dot-dot */

strcpy(ptr, dirp->d_name); /* append name after slash */
if ((ret = dopath(func)) != 0) /* recursive */
break; /* time to leave */
}

ptr[-1] = 0; /* erase everything from slash onwards */

if (closedir(dp) < 0)
err_ret("can't close directory %s", fullpath);

return(ret);
}

static int myfunc(const char *pathname, const struct stat *statptr, int type)
{
switch (type) {
case FTW_F:
switch (statptr->st_mode & S_IFMT) {
case S_IFREG: nreg++; break;
case S_IFBLK: nblk++; break;
case S_IFCHR: nchr++; break;
case S_IFIFO: nfifo++; break;
case S_IFLNK: nslink++; break;
case S_IFSOCK: nsock++; break;
case S_IFDIR: err_dump("for S_IFDIR for %s", pathname);
/* directories should have type = FTW_D */
}
break;

case FTW_D:
ndir++;
break;

case FTW_DNR:
err_ret("can't read directory %s", pathname);
break;

case FTW_NS:
err_ret("stat error for %s", pathname);
break;

default:
err_dump("unknown type %d for pathname %s", type, pathname);
}

return(0);
}

4.22. chdir, fchdir, and getcwd Functions

We can change the current working directory of the calling process by calling the chdir or fchdir functions.

#include <unistd.h>
int chdir(const char *pathname); // return: 0 if OK, 1 on error
int fchdir(int filedes); // return: 0 if OK, 1 on error

 What we need is a function that starts at the current working directory (dot) and works its way up the directory hierarchy, using dot-dot to move up one level. At each directory, the function reads the directory entries until it finds the name that corresponds to the i-node of the directory that it just came from. Repeating this procedure until the root is encountered yields the entire absolute pathname of the current working directory. Fortunately, a function is already provided for us that does this task.

#include <unistd.h>
char *getcwd(char *buf, size_t size); // Returns: buf if OK, NULL on error

We must pass to this function the address of a buffer, buf, and its size (in bytes). The buffer must be large enough to accommodate the absolute pathname plus a terminating null byte, or an error is returned.

The program below changes to a specific directory and then callsgetcwd to print the working directory.

#include "apue.h"
int main(void)
{
char *ptr;
int size;

if (chdir("/usr/spool/uucppublic") < 0)
err_sys("chdir failed");

ptr = path_alloc(&size); /* our own function */
if (getcwd(ptr, size) == NULL)
err_sys("getcwd failed");

printf("cwd = %s\n", ptr);
exit(0);
}

If we run the program, we get

$ ./a.out
cwd = /var/spool/uucppublic
$ ls -l /usr/spool
lrwxrwxrwx 1 root 12 Jan 31 07:57 /usr/spool -> ../var/spool

Note that chdir follows the symbolic linkas we expect it to, when it goes up the directory tree, getcwd has no idea when it hits the /var/spool directory that it is pointed to by the symbolic link/usr/spool. This is a characteristic of symbolic links.

4.23. Device Special Files

4.24. Summary of File Access Permission Bits

你可能感兴趣的:(File)