O_DIRECT对齐

这是我对quora上的一篇回答的理解。

O_DIRECT用于直接将内存中的数据写入存储,而不经过操作系统文件缓存。通常的文件写操作需要经过“用户空间->内核空间->存储”,而O_DIRECT绕过了内核空间,路径为“用户空间->存储”。由此而来的问题是,内核处理的所有事情都需要用户自己来处理,其中最重要的是内存对齐--这件小事通常是内核来处理的。而对齐的大小,则是根据存储的sector size扇区大小(在linux 2.6 kernel以前对齐大小是文件系统的逻辑块大小),因为sector扇区是磁盘驱动可以操作的最小单位,block块是文件系统读写的最小单位。sector是物理概念,对应物理介质的最小粒度;block是逻辑概念,是文件系统捆绑一定数量的连续扇区而来,因此通常称为 “文件系统逻辑块” 。根据存储介质的不同,sector也会发生变化,因此O_DIRECT的内存对齐大小并不是一直固定的,需要根据实际的存储介质来确定。

获取sector 大小通过BLKSSZGET(BLocKSectorSiZeGET)工具,获取block逻辑块大小通过BLKBSZGET(BLocKBlockSiZeGET)工具。

O_DIRECT对齐需要结合CPU对内存访问的对齐来理解。关于cpu访问内存的对齐可以看这一篇  内存对齐 。

O_DIRECT与O_SYNC的区别在于,O_SYNC完整的走完了 “用户空间->内核空间->存储” 这一整个链条,并且在最后使用系统调用确保数据从文件系统缓存flush到磁盘。而O_DIRECT因为绕过了内核空间,直接落盘,并不需要flush。

原文及链接:

https://www.quora.com/Why-does-O_DIRECT-require-I-O-to-be-512-byte-aligned

O_DIRECT requires that I/O occur in multiples of 512 bytes and from memory aligned on a 512-byte boundary because O_DIRECT performs direct memory access (DMA) straight to the backing store, bypassing any intermediate buffers. Performing unaligned transfers would require "fixing up" the I/O by explicitly aligning the user-space buffer and zeroing out the slack (the space between the end of the buffer and the next 512 byte multiple), obviating the benefits of O_DIRECT.

Here's another way to look at it: The "beauty" of O_DIRECT (such as it is) is that it cuts out the VM from the I/O process. No copying from user to kernel-space, no page cache, no pages period. But this means all the little things that the kernel handles for you—alignment being the biggest—you, the user, now need to handle. The underlying backing store expects everything in sectors, so you need to talk in sectors, too.

That is why the magic number isn't always 512 bytes. It is the sector size of the underlying block device. This number is almost always 512 bytes for a standard hard drive, but it can vary. For example, ISO 9660, the standard format of CD-ROMs, sports 2048 byte sectors. You can use the BLKSSZGET ioctl to obtain a backing store's sector size. Portable code should use the value returned from this ioctl and not hard-code 512 bytes.

(As an historic aside, the use of the sector size as the magic number is new in version 2.6 of the Linux kernel. Prior, O_DIRECT calls needed to be aligned to the filesystem's logical block size, which is usually larger than the sector size and 4KB. You can obtain a filesystem's logical block size with the BLKBSZGET ioctl.)

你可能感兴趣的:(linux,O_DIRECT对齐)