http://lxr.oss.org.cn/source/Documentation/filesystems/fiemap.txt
https://www.kernel.org/doc/Documentation/filesystems/fiemap.txt
1 ============ 2 Fiemap Ioctl 3 ============ 4 5 The fiemap ioctl is an efficient method for userspace to get file 6 extent mappings. Instead of block-by-block mapping (such as bmap), fiemap 7 returns a list of extents. 8
filemap ioctl 对于用户空间来说是一个得到文件扩展映射的有效方法。文件映射(filemap)返回一个扩展块集合(a list of extents) , 而不是逐块逐块的映射(例如bmap)。 (注:位示图法用bitmap映射空闲块)
9 10 Request Basics 11 -------------- 12 13 A fiemap request is encoded within struct fiemap: 14
一个文件映射请求(filemap request)通过文件映射结构体(struct filemap)来编码,如下
15 struct fiemap { 16 __u64 fm_start; /* logical offset (inclusive) at 17 * which to start mapping (in) */ 18 __u64 fm_length; /* logical length of mapping which 19 * userspace cares about (in) */ 20 __u32 fm_flags; /* FIEMAP_FLAG_* flags for request (in/out) */ 21 __u32 fm_mapped_extents; /* number of extents that were 22 * mapped (out) */ 23 __u32 fm_extent_count; /* size of fm_extents array (in) */ 24 __u32 fm_reserved; 25 struct fiemap_extent fm_extents[0]; /* array of mapped extents (out) */ 26 }; 27 28 29 fm_start, and fm_length specify the logical range within the file 30 which the process would like mappings for. Extents returned mirror 31 those on disk - that is, the logical offset of the 1st returned extent 32 may start before fm_start, and the range covered by the last returned 33 extent may end after fm_length. All offsets and lengths are in bytes. 34
fm_start 和 fm_length 指定了一个进程想要映射的文件的逻辑范围。扩展块(extends)返回它在磁盘上的镜像 -- 即,第一个返回的扩展映射的逻辑偏移量可能在fm_start之前,并且最后一个返回的逻辑偏移量范围可能在fm_lengh之后。所有的偏移量和长度都用字节表示。
35 Certain flags to modify the way in which mappings are looked up can be 36 set in fm_flags. If the kernel doesn't understand some particular 37 flags, it will return EBADR and the contents of fm_flags will contain 38 the set of flags which caused the error. If the kernel is compatible 39 with all flags passed, the contents of fm_flags will be unmodified. 40 It is up to userspace to determine whether rejection of a particular 41 flag is fatal to its operation. This scheme is intended to allow the 42 fiemap interface to grow in the future but without losing 43 compatibility with old software. 44
修改映射查找方式的一些特殊标志可以在 fm_flas 中设定。如果内核对一些特殊的标志不理解,它将返回 EBADR 并且 fm_falgs 的内容将包含这些可能导致错误的标志。如果内核和所有传过来的标志都兼容,那么 fm_flags 的内容将不会被修改。由用户空间来决定对特殊标志的排斥是否对它的操作是至关重要的。这种策略可以使文件映射接口(fimemap interface)在未来可以扩展,同时与旧的软件仍保持兼容。
45 fm_extent_count specifies the number of elements in the fm_extents[] array 46 that can be used to return extents. If fm_extent_count is zero, then the 47 fm_extents[] array is ignored (no extents will be returned), and the 48 fm_mapped_extents count will hold the number of extents needed in 49 fm_extents[] to hold the file's current mapping. Note that there is 50 nothing to prevent the file from changing between calls to FIEMAP. 51
fm_extend_count 指定了可以被用来返回extents的 fm_extnets[] 数组元素的数量。如果fm_extends_count是0,那么 fm_extends[] 数组将会被忽略(即没有extents返回),并且 fm_mapped_extents count 将会持有在fm_extents[]数组中需要的extents数量,并且去持有文件当前的映射。注意这不会阻止文件在 calls 和 FIEMAP 期间发生改变
52 The following flags can be set in fm_flags: 53 54 * FIEMAP_FLAG_SYNC 55 If this flag is set, the kernel will sync the file before mapping extents. 56 57 * FIEMAP_FLAG_XATTR 58 If this flag is set, the extents returned will describe the inodes 59 extended attribute lookup tree, instead of its data tree. 60 61
下面的一些标志可以在 fm_flags 中设定:
* FIEMAP_FLAG_SYNC
如果 FIEMAP_FLAG_SYNC 这个标志被设置,内核将在映射extents之前同步文件。
* FIEMAP_FLAG_XATTR
如果 FIEMAP_FLAG_XATTR 这个标志被设定,返回的extents还将描述inodes扩展属性查找树,而非它的数据树。
62 Extent Mapping 63 -------------- 64 65 Extent information is returned within the embedded fm_extents array 66 which userspace must allocate along with the fiemap structure. The 67 number of elements in the fiemap_extents[] array should be passed via 68 fm_extent_count. The number of extents mapped by kernel will be 69 returned via fm_mapped_extents. If the number of fiemap_extents 70 allocated is less than would be required to map the requested range, 71 the maximum number of extents that can be mapped in the fm_extent[] 72 array will be returned and fm_mapped_extents will be equal to 73 fm_extent_count. In that case, the last extent in the array will not 74 complete the requested range and will not have the FIEMAP_EXTENT_LAST 75 flag set (see the next section on extent flags). 76
Extent 信息将在嵌入的fm_extents数组中返回,而此时用户空间一定要分配相应的文件映射结构。 在filemap_extents[]数组中的元素数量应该通过fm_extent_count来传递。通过内核映射的extends数量将通过fm_mapped_extents返回。如果分配的fiemap_extents数量是小于请求所需的范围,那么在fm_extent [ ]中可以被映射的extents的最大数量将会返回 , 并且 fm_mapped_extents将等于fm_extent_count。在这种情况下,数组中的最后一个extent将完成请求的范围并且不会拥有FIEMAP_EXTENT_LAST标志集合(参见下一部分中的 extent flags)
77 Each extent is described by a single fiemap_extent structure as 78 returned in fm_extents. 79 80 struct fiemap_extent { 81 __u64 fe_logical; /* logical offset in bytes for the start of 82 * the extent */ 83 __u64 fe_physical; /* physical offset in bytes for the start 84 * of the extent */ 85 __u64 fe_length; /* length in bytes for the extent */ 86 __u64 fe_reserved64[2]; 87 __u32 fe_flags; /* FIEMAP_EXTENT_* flags for this extent */ 88 __u32 fe_reserved[3]; 89 }; 90 91 All offsets and lengths are in bytes and mirror those on disk. It is valid 92 for an extents logical offset to start before the request or its logical 93 length to extend past the request. Unless FIEMAP_EXTENT_NOT_ALIGNED is 94 returned, fe_logical, fe_physical, and fe_length will be aligned to the 95 block size of the file system. With the exception of extents flagged as 96 FIEMAP_EXTENT_MERGED, adjacent extents will not be merged. 97
所有的偏移量和长度都用字节表示并且在磁盘上有镜像。一个extent的逻辑偏移量在请求的长度之前或者超过了请求的长度是有效的,除非FIEMAP_EXTENT_NOT_ALIGNED 被返回,否则fe_logical, fe_physical, and fe_length都将与文件系统的块大小保持一致。由于期待extents flagged作为FIEMAP_EXTENT_MERGED,临近的 extents 将不会被合并。
98 The fe_flags field contains flags which describe the extent returned. 99 A special flag, FIEMAP_EXTENT_LAST is always set on the last extent in 100 the file so that the process making fiemap calls can determine when no 101 more extents are available, without having to call the ioctl again. 102
fe_flags域包含了描述返回的extent的标志。一个特殊的标志-FIEMAP_EXTENT_LAST总是在文件最后一个extent被设置,这样做便于在进行filemap函数调用的进程决定什么时候extents不够用了, 而不用去再一次去调用ioctl。
103 Some flags are intentionally vague and will always be set in the 104 presence of other more specific flags. This way a program looking for 105 a general property does not have to know all existing and future flags 106 which imply that property. 107
一些标志是模糊不清的并且总是在其他一些具体的标志出现时被设置。采用这种方式,一个程序在寻找一个一般属性的时候不用知道所有已经存在的标志和暗示那个属性的未来的标志。
108 For example, if FIEMAP_EXTENT_DATA_INLINE or FIEMAP_EXTENT_DATA_TAIL 109 are set, FIEMAP_EXTENT_NOT_ALIGNED will also be set. A program looking 110 for inline or tail-packed data can key on the specific flag. Software 111 which simply cares not to try operating on non-aligned extents 112 however, can just key on FIEMAP_EXTENT_NOT_ALIGNED, and not have to 113 worry about all present and future flags which might imply unaligned 114 data. Note that the opposite is not true - it would be valid for 115 FIEMAP_EXTENT_NOT_ALIGNED to appear alone. 116
例如,如果 FIEMAP_EXTENT_DATA_INLINE 和 FIEMAP_EXTENT_DATA_TAIL 被设置了,FIEMAP_EXTENT_NOT_ALIGNED也将被设置。一个正寻找内联或tail-packed数据的程序可以锁定一个特殊的标志。软件仅仅关心不去操作不一致的extents,这样,软件就能锁定FIEMAP_EXTENT_NOT_ALIGNED,而不用去担心现在或将来的可能暗示着不一致数据的标志。注意反过来事不对的---FIEMAP_EXTENT_NOT_ALIGNED单独出现时无效的。
117 * FIEMAP_EXTENT_LAST 118 This is the last extent in the file. A mapping attempt past this 119 extent will return nothing. 120
* FIEMAP_EXTENT_LAST 这个extent是文件的最后一个extent,一个企图越过这个extent的映射不会返回任何东西。
121 * FIEMAP_EXTENT_UNKNOWN 122 The location of this extent is currently unknown. This may indicate 123 the data is stored on an inaccessible volume or that no storage has 124 been allocated for the file yet. 125
* FIEMAP_EXTENT_UNKNOWN 这个extent的位置是当前未知的。这意味着数据存储在一个不可接近的容器中或者内存还没有分配给这个文件
126 * FIEMAP_EXTENT_DELALLOC 127 - This will also set FIEMAP_EXTENT_UNKNOWN. 128 Delayed allocation - while there is data for this extent, its 129 physical location has not been allocated yet. 130
* FIEMAP_EXTENT_DELALLOC - 这也会设置FIEMAP_EXTENT_UNKNOWN。 延迟分配 - 当这个extent有数据时,它的物理位置是没有被分配的。
131 * FIEMAP_EXTENT_ENCODED 132 This extent does not consist of plain filesystem blocks but is 133 encoded (e.g. encrypted or compressed). Reading the data in this 134 extent via I/O to the block device will have undefined results. 135
* FIEMAP_EXTENT_ENCODED
这个extent不包含纯文件系统块但是要被编码(例如加密和压缩)。通过关联这个块设备的i/o去读关于这个extent的数据时将产生未定义的结果。
136 Note that it is *always* undefined to try to update the data 137 in-place by writing to the indicated location without the 138 assistance of the filesystem, or to access the data using the 139 information returned by the FIEMAP interface while the filesystem 140 is mounted. In other words, user applications may only read the 141 extent data via I/O to the block device while the filesystem is 142 unmounted, and then only if the FIEMAP_EXTENT_ENCODED flag is 143 clear; user applications must not try reading or writing to the 144 filesystem via the block device under any other circumstances. 145
注意到试图通过往指定位置去写以在原位置上更新数据而不通过文件系统的操作总是不确定的试图在当文件系统挂载时,通过FIEMAP interface返回的信息去使用数据的操作也一样。换句话说,用户程序只能够在文件系统没有挂载时,通过关联这个块设备的I/O去读extent中的数据并且仅仅是在FIEMAP_EXTENT_ENCODED 标志是清晰的时候。用户程序在任何情况下都不能通过块设备去读或者写文件系统。
146 * FIEMAP_EXTENT_DATA_ENCRYPTED 147 - This will also set FIEMAP_EXTENT_ENCODED 148 The data in this extent has been encrypted by the file system. 149 150 * FIEMAP_EXTENT_NOT_ALIGNED 151 Extent offsets and length are not guaranteed to be block aligned. 152 153 * FIEMAP_EXTENT_DATA_INLINE 154 This will also set FIEMAP_EXTENT_NOT_ALIGNED 155 Data is located within a meta data block. 156 157 * FIEMAP_EXTENT_DATA_TAIL 158 This will also set FIEMAP_EXTENT_NOT_ALIGNED 159 Data is packed into a block with data from other files. 160 161 * FIEMAP_EXTENT_UNWRITTEN 162 Unwritten extent - the extent is allocated but its data has not been 163 initialized. This indicates the extent's data will be all zero if read 164 through the filesystem but the contents are undefined if read directly from 165 the device. 166 167 * FIEMAP_EXTENT_MERGED 168 This will be set when a file does not support extents, i.e., it uses a block 169 based addressing scheme. Since returning an extent for each block back to 170 userspace would be highly inefficient, the kernel will try to merge most 171 adjacent blocks into 'extents'. 172 173 174 VFS -> File System Implementation 175 --------------------------------- 176 177 File systems wishing to support fiemap must implement a ->fiemap callback on 178 their inode_operations structure. The fs ->fiemap call is responsible for 179 defining its set of supported fiemap flags, and calling a helper function on 180 each discovered extent: 181
VFS -> 文件系统实现
---------------------------------
文件系统希望支持filemap,这就一定要实现在它们的inode_operations结构上的一个->flemap回调fs ->filemap 调用负责定义它所支持的filemap标志,并且在每一个被发现的extent上调用一个帮助函数
182 struct inode_operations { 183 ... 184 185 int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start, 186 u64 len); 187 188 ->fiemap is passed struct fiemap_extent_info which describes the 189 fiemap request: 190 191 struct fiemap_extent_info { 192 unsigned int fi_flags; /* Flags as passed from user */ 193 unsigned int fi_extents_mapped; /* Number of mapped extents */ 194 unsigned int fi_extents_max; /* Size of fiemap_extent array */ 195 struct fiemap_extent *fi_extents_start; /* Start of fiemap_extent array */ 196 }; 197 198 It is intended that the file system should not need to access any of this 199 structure directly. Filesystem handlers should be tolerant to signals and return 200 EINTR once fatal signal received. 201
显然,文件系统不应该直接去使用这个结构。文件系统操作者应该允许返回信号,并且一旦接收到紧急信号应该返回 EINTR。
202 203 Flag checking should be done at the beginning of the ->fiemap callback via the 204 fiemap_check_flags() helper: 205
通过fiemap_check_flags() helper(文件映射检查标志助手)在->fiemap回调一开始时就完成标志检查。
206 int fiemap_check_flags(struct fiemap_extent_info *fieinfo, u32 fs_flags); 207 208 The struct fieinfo should be passed in as received from ioctl_fiemap(). The 209 set of fiemap flags which the fs understands should be passed via fs_flags. If 210 fiemap_check_flags finds invalid user flags, it will place the bad values in 211 fieinfo->fi_flags and return -EBADR. If the file system gets -EBADR, from 212 fiemap_check_flags(), it should immediately exit, returning that error back to 213 ioctl_fiemap(). 214
fileinor 结构体应该作为从ioctl_fiemap()接受的参数被传进来。文件系统明白的这个fiemap 标志应该通过fs_flags传递。如果fiemap_check_flags找到了无效的用户标志,那么它将替换掉fieinfo->fi_flags中错误的值并且返回-EBADR。如果文件系统从fiemap_check_flags()得到了-EBADR,它将立即退出,并想ioctl_fiemap()返回错误。
215 216 For each extent in the request range, the file system should call 217 the helper function, fiemap_fill_next_extent(): 218
对于每一个在请求范围内的extent来说,文件系统应该调用帮助函数,fiemap_fill_next_extent()函数
219 int fiemap_fill_next_extent(struct fiemap_extent_info *info, u64 logical, 220 u64 phys, u64 len, u32 flags, u32 dev); 221 222 fiemap_fill_next_extent() will use the passed values to populate the 223 next free extent in the fm_extents array. 'General' extent flags will 224 automatically be set from specific flags on behalf of the calling file 225 system so that the userspace API is not broken. 226 227 fiemap_fill_next_extent() returns 0 on success, and 1 when the 228 user-supplied fm_extents array is full. If an error is encountered 229 while copying the extent to user memory, -EFAULT will be returned.
fiemap_fill_next_extent将利用传递过来的参数去填充在fm_extents数组中的下一个空闲的extent.“通用” extent 标志将自动的代表调用文件系统从具体的标志来设置,这使得用户空间的API不会被破坏。
fiemap_fill_next_extent() 范围0表示成功,当用户提供的fm_extends数组满的时候将返回1.如果在复制用户内存区的extent是出现了错误,那么将返回-EFAULT