文件系统除了需要记录一个文件有哪些数据块之外,还需要记录一个目录有哪些文件,提供通过文件名索引到文件的手段。在f2fs中,这些都是通过f2fs directory 相关的数据结构去组织的。
f2fs的目录结构
了解了f2fs的目录结构,才能理解f2fs如何索引文件的。其中关键是掌握下面几个概念: directory entry,bucket,directroy block.
directory entry
每个目录entry包括hash/ino/len/type四个成员,占用11bytes。每个entry代表一个子目录、符号链接或者普通文件。
A directory entry occupies 11 bytes, which consists of the following attributes.
- hash hash value of the file name
- ino inode number
- len the length of file name
- type file type such as directory, symlink, etc
directory block
专门存储directory entry的block叫做diretory block。
A dentry block consists of 214 dentry slots and file names. Therein a bitmap is
used to represent whether each dentry is valid or not. A dentry block occupies
4KB with the following composition.
Dentry Block(4 K) = bitmap (27 bytes) + reserved (3 bytes) +
dentries(11 * 214 bytes) + file name (8 * 214 bytes)
+--------+----------+----------+------------+
| bitmap | reserved | dentries | file names |
+--------+----------+----------+------------+
[Dentry Block: 4KB] . .
. .
. .
+------+------+-----+------+
| hash | ino | len | type |
+------+------+-----+------+
[Dentry Structure: 11 bytes]
bucket
一个bucket 根据它的目录深度,可以有2个、4个、8个... dentry block。
[Bucket]
+--------------------------------+
|dentry block 1 | dentry block 2 |
+--------------------------------+
The number of blocks and buckets are determined by,
# of blocks in level #n = |
`- 4, Otherwise
,- 2^(n + dir_level),
| if n + dir_level < MAX_DIR_HASH_DEPTH / 2,
索引方法
不同深度的目录具有不同数量的bucket。
of buckets in level #n = |
`- 2^((MAX_DIR_HASH_DEPTH / 2) - 1),
Otherwise
,- 2, if n < MAX_DIR_HASH_DEPTH / 2,
F2FS implements multi-level hash tables for directory structure. Each level has
a hash table with dedicated number of hash buckets as shown below. Note that
"A(2B)" means a bucket includes 2 data blocks.
----------------------
A : bucket
B : block
N : MAX_DIR_HASH_DEPTH
----------------------
|
level #1 | A(2B) - A(2B)
|
level #2 | A(2B) - A(2B) - A(2B) - A(2B)
. | . . . .
level #N/2 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
. | . . . .
level #N | A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B)
When F2FS finds a file name in a directory, at first a hash value of the file
name is calculated. Then, F2FS scans the hash table in level #0 to find the
dentry consisting of the file name and its inode number. If not found, F2FS
scans the next hash table in level #1. In this way, F2FS scans hash tables in
each levels incrementally from 1 to N. In each levels F2FS needs to scan only
one bucket determined by the following equation, which shows O(log(# of files))
complexity.
bucket number to scan in level #n = (hash value) % (# of buckets in level #n)
In the case of file creation, F2FS finds empty consecutive slots that cover the
file name. F2FS searches the empty slots in the hash tables of whole levels from
1 to N in the same way as the lookup operation.
The following figure shows an example of two cases holding children.
--------------> Dir <--------------
| |
child child
child - child [hole] - child
child - child - child [hole] - [hole] - child
Case 1: Case 2:
Number of children = 6, Number of children = 3,
File size = 7 File size = 7
inode number和file name如何关联
计算出hash 值之后,根据 inode 去读对应的block:
这里除了比较hash值之外,还会比较file name,所以可以避免hash冲突。
hash 函数在 hash.c 里面: fs/f2fs/hash.c
- f2fs_dentry_block:
/* 4KB-sized directory entry block */ struct f2fs_dentry_block { /* validity bitmap for directory entries in each block */ __u8 dentry_bitmap[SIZE_OF_DENTRY_BITMAP]; __u8 reserved[SIZE_OF_RESERVED]; struct f2fs_dir_entry dentry[NR_DENTRY_IN_BLOCK]; __u8 filename[NR_DENTRY_IN_BLOCK][F2FS_SLOT_LEN]; } __packed;
具体的hash比较过程如下:
struct f2fs_dir_entry *f2fs_find_target_dentry(struct fscrypt_name *fname,
f2fs_hash_t namehash, int *max_slots,
struct f2fs_dentry_ptr *d)
{
struct f2fs_dir_entry *de;
unsigned long bit_pos = 0;
int max_len = 0;
if (max_slots)
*max_slots = 0;
while (bit_pos < d->max) {
if (!test_bit_le(bit_pos, d->bitmap)) {
bit_pos++;
max_len++;
continue;
}
de = &d->dentry[bit_pos];
if (unlikely(!de->name_len)) {
bit_pos++;
continue;
}
if (**de->hash_code == namehash &&
fscrypt_match_name(fname, d->filename[bit_pos],
le16_to_cpu(de->name_len)))**
goto found;
if (max_slots && max_len > *max_slots)
*max_slots = max_len;
max_len = 0;
bit_pos += GET_DENTRY_SLOTS(le16_to_cpu(de->name_len));
}
上面函数的被调用链如下:
called in find_in_block()
called in find_in_level()
called in __f2fs_find_entry()
called by f2fs_add_regular_entry()
由此看见,添加一个entry的时候就会根据dentry filename计算hash,如果找到已经存在的并且文件名相同,就说明它真的已经存在。