Vritual Filesystem 是给用户空间程序提供统一的文件和文件系统访问接口的内核子系统。借助VFS,即使文件系统的类型不同(比如NTFS和ext3),也可以实现文件系统之间交互(移动、复制文件等),
VFS是面向对象的,VFS中的数据结构既包含数据也包含对该数据进行操作的函数的指针,虽然是使用C的数据结构来实现,但是思想上和面向对象编程是一致的。
VFS的通用数据模型主要包括4种对象类型:
每种对象类型都有着对应的操作操作函数表(相当于对象的方法)
任何类型的文件系统都要实现Superblock对象,用于存储文件系统的描述信息。Superblock对象通常对应了磁盘上的filesystem superblock 或者 filesystem control block。非磁盘文件系统(比如基于内存的文件系统sysfs)需要动态地生成superblock对象,并将其保存在内存中。
创建、管理、删除superblock对象的代码在fs/super.c中
VFS使用super_block结构体来保存superblock对象。使用alloc_super()函数来创建和初始化superblock对象,文件系统挂载时,文件系统调用alloc_super()从磁盘中读取超级快,并填充super_block结构体.
super_block结构体在
struct super_block
{
struct list_head s_list; /* list of all superblocks */
dev_t s_dev; /* identifier */
unsigned long s_blocksize; /* block size in bytes */
unsigned char s_blocksize_bits; /* block size in bits */
unsigned char s_dirt; /* dirty flag */
unsigned long long s_maxbytes; /* max file size */
struct file_system_type s_type; /* filesystem type */
struct super_operations s_op; /* superblock methods */
struct dquot_operations *dq_op; /* quota methods */
struct quotactl_ops *s_qcop; /* quota control methods */
struct export_operations *s_export_op; /* export methods */
unsigned long s_flags; /* mount flags */
unsigned long s_magic; /* filesystem’s magic number */
struct dentry *s_root; /* directory mount point */
struct rw_semaphore s_umount; /* unmount semaphore */
struct semaphore s_lock; /* superblock semaphore */
int s_count; /* superblock ref count */
int s_need_sync; /* not-yet-synced flag */
atomic_t s_active; /* active reference count */
void *s_security; /* security module */
struct xattr_handler **s_xattr; /* extended attribute handlers */
struct list_head s_inodes; /* list of inodes */
struct list_head s_dirty; /* list of dirty inodes */
struct list_head s_io; /* list of writebacks */
struct list_head s_more_io; /* list of more writeback */
struct hlist_head s_anon; /* anonymous dentries */
struct list_head s_files; /* list of assigned files */
struct list_head s_dentry_lru; /* list of unused dentries */
int s_nr_dentry_unused; /* number of dentries on list */
struct block_device *s_bdev; /* associated block device */
struct mtd_info *s_mtd; /* memory disk information */
struct list_head s_instances; /* instances of this fs */
struct quota_info s_dquot; /* quota-specific options */
int s_frozen; /* frozen status */
wait_queue_head_t s_wait_unfrozen; /* wait queue on freeze */
char s_id[32]; /* text name */
void *s_fs_info; /* filesystem-specific info */
fmode_t s_mode; /* mount permissions */
struct semaphore s_vfs_rename_sem; /* rename semaphore */
u32 s_time_gran; /* granularity of timestamps */
char *s_subtype; /* subtype name */
char *s_options; /* saved mount options */
};
superblock对象中最重要的成员是s_op指针,指向superblock_operations,superblock_operations在
struct super_operations {
struct inode *(*alloc_inode)(struct super_block *sb);
void (*destroy_inode)(struct inode *);
void (*dirty_inode) (struct inode *);
int (*write_inode) (struct inode *, int);
void (*drop_inode) (struct inode *);
void (*delete_inode) (struct inode *);
void (*put_super) (struct super_block *);
void (*write_super) (struct super_block *);
int (*sync_fs)(struct super_block *sb, int wait);
int (*freeze_fs) (struct super_block *);
int (*unfreeze_fs) (struct super_block *);
int (*statfs) (struct dentry *, struct kstatfs *);
int (*remount_fs) (struct super_block *, int *, char *);
void (*clear_inode) (struct inode *);
void (*umount_begin) (struct super_block *);
int (*show_options)(struct seq_file *, struct vfsmount *);
int (*show_stats)(struct seq_file *, struct vfsmount *);
ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t);
ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t);
};
这是一个函数表,每个指针都指向了一个对superlbock对象进行操作的函数(不含创建、删除superblock,这个是在fs/super.c中),这些操作函数对文件系统和它的inode执行low-level operations. 当文件系统想要调用某个方法时,比如写superblock,使用superblock的指针sb,调用方法为sb->s_op->write(sb).这里需要传入sb指针是因为C缺乏面向对象的特性(没有C++中的this指针),所以需要将sb作为参数传入。
函数表中有的函数是可选的,即可以选择不实现,文件系统可以将指针置为NULL,对于置NULL的函数,VFS将调用一个通用函数或者什么都不做,取决于是什么函数。
下面摘录了部分函数的说明,不一一翻译了
struct inode *(*alloc_inode)(struct super_block *sb)
Creates and initializes a new inode object under the given superblock.
void (destroy_inode)(struct inode )
Deallocates the given inode.
int (write_inode) (struct inode , int)
Writes the given inode to disk
void (delete_inode) (struct inode )
Deletes the given inode from the disk.
void (put_super) (struct super_block )
Called by the VFS on unmount to release the given superblock object
void (write_super) (struct super_block )
Updates the on-disk superblock with the specified superblock.
int (*sync_fs)(struct super_block *sb, int wait)
Synchronizes filesystem metadata with the on-disk filesystem
int (statfs) (struct dentry , struct kstatfs *)
Called by the VFS to obtain filesystem statistics
void (clear_inode) (struct inode )
Called by the VFS to release the inode and clear any pages containing related data.
void (umount_begin) (struct super_block )
Called by the VFS to interrupt a mount operation. It is used by network filesystems,
such as NFS.
Inode对象包含了内核操作一个文件或者目录需要的所有信息。对于Unix-style的文件系统,这些信息可以直接从磁盘中的inode读入,没有inode的文件系统需要根据磁盘上的数据动态生成inode的信息,并将这些信息填入内存中的inode对象
Inode对象使用inode结构体来存储,该结构体定义在
struct inode
{
struct hlist_node i_hash; /* hash list */
struct list_head i_list; /* list of inodes */
struct list_head i_sb_list; /* list of superblocks */
struct list_head i_dentry; /* list of dentries */
unsigned long i_ino; /* inode number */
atomic_t i_count; /* reference counter */
unsigned int i_nlink; /* number of hard links */
uid_t i_uid; /* user id of owner */
gid_t i_gid; /* group id of owner */
kdev_t i_rdev; /* real device node */
u64 i_version; /* versioning number */
loff_t i_size; /* file size in bytes */
seqcount_t i_size_seqcount; /* serializer for i_size */
struct timespec i_atime; /* last access time */
struct timespec i_mtime; /* last modify time */
struct timespec i_ctime; /* last change time */
unsigned int i_blkbits; /* block size in bits */
blkcnt_t i_blocks; /* file size in blocks */
unsigned short i_bytes; /* bytes consumed */
umode_t i_mode; /* access permissions */
spinlock_t i_lock; /* spinlock */
struct rw_semaphore i_alloc_sem; /* nests inside of i_sem */
struct semaphore i_sem; /* inode semaphore */
struct inode_operations *i_op; /* inode ops table */
struct file_operations *i_fop; /* default inode ops */
struct super_block *i_sb; /* associated superblock */
struct file_lock *i_flock; /* file lock list */
struct address_space *i_mapping; /* associated mapping */
struct address_space i_data; /* mapping for device */
struct dquot *i_dquot[MAXQUOTAS]; /* disk quotas for inode */
struct list_head i_devices; /* list of block devices */
union
{
struct pipe_inode_info *i_pipe; /* pipe information */
struct block_device *i_bdev; /* block device driver */
struct cdev *i_cdev; /* character device driver */
};
unsigned long i_dnotify_mask; /* directory notify mask */
struct dnotify_struct *i_dnotify; /* dnotify */
struct list_head inotify_watches; /* inotify watches */
struct mutex inotify_mutex; /* protects inotify_watches */
unsigned long i_state; /* state flags */
unsigned long dirtied_when; /* first dirtying time */
unsigned int i_flags; /* filesystem flags */
atomic_t i_writecount; /* count of writers */
void *i_security; /* security module */
void *i_private; /* fs private pointer */
};
文件系统中的每个文件都可以用一个inode对象来表示,但是inode对象只有在文件被访问时才会在内存中构建。inode对象中一些域是和特殊文件相关的,比如i_pipe指向named pipe数据结构,i_bdev指向了block device数据结构,i_cdev指向character device数据结构,这三个指针存储在了union中,因为一个给定的inode最多指向这三个数据结构中的0个或者1个。
文件系统可能无法支持inode对象中的一些属性,比如有些文件系统没有access timestamp。这种情况下,文件系统可以自己决定怎么如实现这些特性(比如讲timestamp置为0)
inode中的i_op指针指向操作inode的函数表,该函数表定义在
struct inode_operations
{
int (*create) (struct inode *,struct dentry *,int, struct nameidata *);
struct dentry * (*lookup) (struct inode *,struct dentry *, struct nameidata *);
int (*link) (struct dentry *,struct inode *,struct dentry *);
int (*unlink) (struct inode *,struct dentry *);
int (*symlink) (struct inode *,struct dentry *,const char *);
int (*mkdir) (struct inode *,struct dentry *,int);
int (*rmdir) (struct inode *,struct dentry *);
int (*mknod) (struct inode *,struct dentry *,int,dev_t);
int (*rename) (struct inode *, struct dentry *,
struct inode *, struct dentry *);
int (*readlink) (struct dentry *, char __user *,int);
void * (*follow_link) (struct dentry *, struct nameidata *);
void (*put_link) (struct dentry *, struct nameidata *, void *);
void (*truncate) (struct inode *);
int (*permission) (struct inode *, int);
int (*setattr) (struct dentry *, struct iattr *);
int (*getattr) (struct vfsmount *mnt, struct dentry *, struct kstat *);
int (*setxattr) (struct dentry *, const char *,const void *,size_t,int);
ssize_t (*getxattr) (struct dentry *, const char *, void *, size_t);
ssize_t (*listxattr) (struct dentry *, char *, size_t);
int (*removexattr) (struct dentry *, const char *);
void (*truncate_range)(struct inode *, loff_t, loff_t);
long (*fallocate)(struct inode *inode, int mode, loff_t offset,
loff_t len);
int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start,
u64 len);
};
下面摘录了部分函数的说明
int create(struct inode *dir, struct dentry *dentry, int mode)
The VFS calls this function from the creat() and open() system calls to create a new inode associated with the given dentry object with the specified initial access mode.
struct dentry* lookup(struct inode *dir, struct dentry *dentry)
This function searches a directory for an inode corresponding to a filename specified in the given dentry.
int link(struct dentry *old_dentry, struct inode *dir, struct dentry *dentry)
Invoked by the link() system call to create a hard link of the file old_dentry in the directory dir with the new filename dentry.
int unlink(struct inode *dir, struct dentry *dentry)
Called from the unlink() system call to remove the inode specified by the directory entry dentry from the directory dir.
int follow_link(struct dentry *dentry, struct nameidata *nd)
Called by the VFS to translate a symbolic link to the inode to which it points.
int permission(struct inode *inode, int mask)
Checks whether the specified access mode is allowed for the file referenced by inode
dentry是directory entry的简称,dentry是路径上具体的一个组件,一个路径上的每一个组件都是一个dentry,如路径/bin/vi.txt中,共有3个dentry,分别是 /, bin, vi.txt。
dentry对象使用dentry结构体来表示,该结构体定义在
struct dentry
{
atomic_t d_count; /* usage count */
unsigned int d_flags; /* dentry flags */
spinlock_t d_lock; /* per-dentry lock */
int d_mounted; /* is this a mount point? */
struct inode *d_inode; /* associated inode */
struct hlist_node d_hash; /* list of hash table entries */
struct dentry *d_parent; /* dentry object of parent */
struct qstr d_name; /* dentry name */
struct list_head d_lru; /* unused list */
union
{
struct list_head d_child; /* list of dentries within */
struct rcu_head d_rcu; /* RCU locking */
} d_u;
struct list_head d_subdirs; /* subdirectories */
struct list_head d_alias; /* list of alias inodes */
unsigned long d_time; /* revalidate time */
struct dentry_operations *d_op; /* dentry operations table */
struct super_block *d_sb; /* superblock of file */
void *d_fsdata; /* filesystem-specific data */
unsigned char d_iname[DNAME_INLINE_LEN_MIN]; /* short name */
};
因为dentry对象没有在磁盘上的物理存储,所以denty结构体中没有用于标记对象是否被修改的域(即不需要判断对象是否dirty,从而需要写回磁盘)
dentry分为三种状态,user, unused, negative
used:
该dentry对应一个有效的inode(dentry的d_inode域指向一个有效的inode),并且d_count是正数,即有一个或者多个用户正在使用该dentry
unused:
该dentry对应一个有效的inode(dentry的d_inode域指向一个有效的inode),并且d_count为0,即VFS并没有使用该dentry,因为该dentry仍然指向一个有效的inode对象,dentry当前被保存在dentry cache中(等待可能再次被使用)
negtive:
该dentry没有对应一个有效的inode(dentry的d_inode为NULL),这种情况可能是因为对应的inode对象被销毁了或者是查找的路径名称不对。此时dentry仍然被保存在cache中,这样下次路径查找可以快速进行(直接从dentry cache中获得)
dentry cache的机制由三个部分组成
dentry存储在cache中时,dentry的存在导致对应的inode的使用计数大于0,这样dentry对象可以将inode钉在内存中,只要dentry被cache了,那么对应的inode就一定也被cache了(使用的是inode cache,即icache),所以当路径查找函数在dentry cache中命中时,其对应的inode一定也在内存中。
dentry结构体中的d_op指针指向操作dentry的函数表,函数表定义在
struct dentry_operations
{
int (*d_revalidate) (struct dentry *, struct nameidata *);
int (*d_hash) (struct dentry *, struct qstr *);
int (*d_compare) (struct dentry *, struct qstr *, struct qstr *);
int (*d_delete) (struct dentry *);
void (*d_release) (struct dentry *);
void (*d_iput) (struct dentry *, struct inode *);
char *(*d_dname) (struct dentry *, char *, int);
};
下面摘录了部分函数的说明
int d_revalidate(struct dentry dentry, struct nameidata )
Determines whether the given dentry object is valid.The VFS calls this function whenever it is preparing to use a dentry from the dcache. Most filesystems set this method to NULL because their dentry objects in the dcache are always valid.
int d_hash(struct dentry *dentry, struct qstr *name)
Creates a hash value from the given dentry.
int d_compare(struct dentry *dentry, struct qstr *name1, struct qstr *name2)
Called by the VFS to compare two filenames, name1 and name2. Most filesystems leave this at the VFS default, which is a simple string compare
int d_delete (struct dentry *dentry)
Called by the VFS when the specified dentry object’s d_count reaches zero.This function requires the dcache_lock and the dentry’s d_lock.
void d_release(struct dentry *dentry)
Called by the VFS when the specified dentry is going to be freed.The default function does nothing.
void d_iput(struct dentry *dentry, struct inode *inode)
Called by the VFS when a dentry object loses its associated inode (say, because the entry was deleted from the disk). By default, the VFS simply calls the iput() function to release the inode.
File对象是打开的文件在内存中的表示(representation),用于在进程中表示打开的文件。进程和file对象直接进行交互,不会解除superblocks,inodes,dentrys。多个进程可以同时打开同一个文件,所以一个文件在内存中可以对应多个file对象。而inode和dentry在内存中只有唯一的对应。
File对象使用file结构体来表示,定义在
struct file
{
union
{
struct list_head fu_list; /* list of file objects */
struct rcu_head fu_rcuhead; /* RCU list after freeing */
} f_u;
struct path f_path; /* contains the dentry */
struct file_operations *f_op; /* file operations table */
spinlock_t f_lock; /* per-file struct lock */
atomic_t f_count; /* file object’s usage count */
unsigned int f_flags; /* flags specified on open */
mode_t f_mode; /* file access mode */
loff_t f_pos; /* file offset (file pointer) */
struct fown_struct f_owner; /* owner data for signals */
const struct cred *f_cred; /* file credentials */
struct file_ra_state f_ra; /* read-ahead state */
u64 f_version; /* version number */
void *f_security; /* security module */
void *private_data; /* tty driver hook */
struct list_head f_ep_links; /* list of epoll links */
spinlock_t f_ep_lock; /* epoll lock */
struct address_space *f_mapping; /* page cache mapping */
unsigned long f_mnt_write_state; /* debugging state */
};
和dentry对象类似,file对象在磁盘上也没有对应的存储,所以在file对象也没有flag表示file是否dirty。file对象通过指针f_dentry指向对应的dentry对象,dentry对象指向对应的inode,inode中存储了文件本身是否dirty的信息。
file结构体中的f_op指针指向操作file的函数表,函数表定义在
struct file_operations
{
struct module *owner;
loff_t (*llseek) (struct file *, loff_t, int);
ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
ssize_t (*aio_read) (struct kiocb *, const struct iovec *,
unsigned long, loff_t);
ssize_t (*aio_write) (struct kiocb *, const struct iovec *,
unsigned long, loff_t);
int (*readdir) (struct file *, void *, filldir_t);
unsigned int (*poll) (struct file *, struct poll_table_struct *);
int (*ioctl) (struct inode *, struct file *, unsigned int,
unsigned long);
long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
int (*mmap) (struct file *, struct vm_area_struct *);
int (*open) (struct inode *, struct file *);
int (*flush) (struct file *, fl_owner_t id);
int (*release) (struct inode *, struct file *);
int (*fsync) (struct file *, struct dentry *, int datasync);
int (*aio_fsync) (struct kiocb *, int datasync);
int (*fasync) (int, struct file *, int);
int (*lock) (struct file *, int, struct file_lock *);
ssize_t (*sendpage) (struct file *, struct page *,
int, size_t, loff_t *, int);
unsigned long (*get_unmapped_area) (struct file *,
unsigned long,
unsigned long,
unsigned long,
unsigned long);
int (*check_flags) (int);
int (*flock) (struct file *, int, struct file_lock *);
ssize_t (*splice_write) (struct pipe_inode_info *,
struct file *,
loff_t *,
size_t,
unsigned int);
ssize_t (*splice_read) (struct file *,
loff_t *,
struct pipe_inode_info *,
size_t,
unsigned int);
int (*setlease) (struct file *, long, struct file_lock **);
}
文件系统可以实现自己的file操作函数,也可以使用file的通用操作函数。通用操作函数一般可以在标准的基于Unix的文件系统中正常工作。
下面摘录了部分函数的说明
int open(struct inode *inode, struct file *file)
Creates a new file object and links it to the corresponding inode object. It is called by the open() system call.
loff_t llseek(struct file *file, loff_t offset, int origin)
Updates the file pointer to the given offset. It is called via the llseek() system call.
ssize_t read(struct file *file, char *buf, size_t count, loff_t *offset)
Reads count bytes from the given file at position offset into buf.The file pointer is then updated.This function is called by the read() system call.
ssize_t aio_read(struct kiocb *iocb, char *buf, size_t count, loff_t offset)
Begins an asynchronous read of count bytes into buf of the file described in iocb. This function is called by the aio_read() system call.
ssize_t write(struct file *file, const char *buf, size_t count, loff_t *offset)
Writes count bytes from buf into the given file at position offset.The file pointer is then updated.This function is called by the write() system call.
int readdir(struct file *file, void *dirent, filldir_t filldir)
Returns the next directory in a directory listing.This function is called by the readdir() system call.
unsigned int poll(struct file *file, struct poll_table_struct *poll_table)
Sleeps, waiting for activity on the given file. It is called by the poll() system call.
int ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
Sends a command and argument pair to a device. It is used when the file is an open device node.This function is called from the ioctl() system call. Callers must hold the BKL.
int mmap(struct file *file, struct vm_area_struct *vma)
Memory maps the given file onto the given address space and is called by the mmap() system call.
int flush(struct file *file)
Called by the VFS whenever the reference count of an open file decreases. Its purpose is filesystem-dependent.
内核使用两种数据结构来管理和文件系统相关的数据,file_system_type结构体用于表示文件系统类别。vfsmount结构体用于表示一个挂载的文件系统实例。
因为Linux支持那很多中文件系统,所以内核必须要有一个特殊的数据结构来描述每个文件系统的特性和行为,file_system_type结构体就是做这个的。
file_system_type定义在
struct file_system_type
{
const char *name; /* filesystem’s name */
int fs_flags; /* filesystem type flags */
struct super_block *(*get_sb) (struct file_system_type *, int, char *, void *);
void (*kill_sb) (struct super_block *);
struct module *owner; /* module owning the filesystem */
struct file_system_type *next; /* next file_system_type in list */
struct list_head fs_supers; /* list of superblock objects */
struct lock_class_key s_lock_key;
struct lock_class_key s_umount_key;
struct lock_class_key i_lock_key;
struct lock_class_key i_mutex_key;
struct lock_class_key i_mutex_dir_key;
struct lock_class_key i_alloc_sem_key;
};
其中get_sb()函数在文件系统加载的时候读取磁盘上的superblock,并使用读入的数据填充内存中的superblock对象。每种文件系统不管有多少个实例(哪怕是0个),都会有且只有一个file_system_type。
vfsmount结构体在文件系统挂载时创建,该结构体表示一个具体的文件系统实例(挂载点)
下面是vfsmount结构体的定义,定义在
struct vfsmount
{
struct list_head mnt_hash; /* hash table list */
struct vfsmount *mnt_parent; /* parent filesystem */
struct dentry *mnt_mountpoint; /* dentry of this mount point */
struct dentry *mnt_root; /* dentry of root of this fs */
struct super_block *mnt_sb; /* superblock of this filesystem */
struct list_head mnt_mounts; /* list of children */
struct list_head mnt_child; /* list of children */
int mnt_flags; /* mount flags */
char *mnt_devname; /* device file name */
struct list_head mnt_list; /* list of descriptors */
struct list_head mnt_expire; /* entry in expiry list */
struct list_head mnt_share; /* entry in shared mounts list */
struct list_head mnt_slave_list; /* list of slave mounts */
struct list_head mnt_slave; /* entry in slave list */
struct vfsmount *mnt_master; /* slave’s master */
struct mnt_namespace *mnt_namespace; /* associated namespace */
int mnt_id; /* mount identifier */
int mnt_group_id; /* peer group identifier */
atomic_t mnt_count; /* usage count */
int mnt_expiry_mark; /* is marked for expiration */
int mnt_pinned; /* pinned count */
int mnt_ghosts; /* ghosts count */
atomic_t __mnt_writers; /* writers count */
};
vfsmount中含有指向文件系统示例的superlbock对象的指针。
进程使用files_struct, fs_struct 和mnt_namesapce这三个数据结构来将进程和VFS层关联起来,记录已打开文件列表、进程的根文件系统、当前工作目录等信息。
进程描述符的files指针指向file_struct,该结构体定义在
struct files_struct
{
atomic_t count; /* usage count */
struct fdtable *fdt; /* pointer to other fd table */
struct fdtable fdtab; /* base fd table */
spinlock_t file_lock; /* per-file lock */
int next_fd; /* cache of next available fd */
struct embedded_fd_set close_on_exec_init; /* list of close-on-exec fds */
struct embedded_fd_set open_fds_init /* list of open fds */
struct file *fd_array[NR_OPEN_DEFAULT]; /* base files array */
};
fd_array指向一个已打开文件的列表。fd_array[i]指向文件描述符为i的file对象。NR_OPEN_DEFAULT是一个常数,在64bit机器中是64.当打开的文件数超过这个常数值时,内核会创建一个新的fdtable,并使fdt指向这个新的fdtable结构体。
fs_struct结构体用于存储和进程相关的文件系统信息。进程描述符中的fs指针指向进程的fs_struct结构体
fs_struct定义在
struct fs_struct
{
int users; /* user count */
rwlock_t lock; /* per-structure lock */
int umask; /* umask */
int in_exec; /* currently executing a file */
struct path root; /* root directory */
struct path pwd; /* current working directory */
};
root保存了进程的根目录,pwd保存了进程的当前工作目录
mnt_namespace给了每个进程一个独立的文件系统视角。进程描述符中的mnt_namespace域指向进程的mnt_namespace结构体
linux中默认是所有进程共享一个namespace的,只有当clone()时指定了CLONE_NEWS标志,才会创建一个新的namespace。
mnt_namespace定义在
struct mnt_namespace
{
atomic_t count; /* usage count */
struct vfsmount *root; /* root directory */
struct list_head list; /* list of mount points */
wait_queue_head_t poll; /* polling waitqueue */
int event; /* event count */
};
list是一个双向链表,该链表将所有组成该namespace的已挂载文件系统连接到一起。
《Linux Kernel Development 3rd Edition》
《Understanding The Linux Kernel 3rd Edition》