How Linux does I/O


1. By default, the write() system call returns after all data has been copied from the user space file descriptor into the kernel space buffers.

There is no guarantee that data has actually reached the physical storage.

默认情况下,write()系统调用返回后所有数据的安全提供程序已从用户空间文件描述符都复制到内核空间缓冲区。那里是不能保证数据实际上已达到的物理存储。


2. The fsync() call is our friend here. This will block and return only after the data and metadata

   (e.g. file size, last update time) is completely transferred to the actual physical storage.

Fsync()调用是经常用的这里。这将阻止并将返回直到数据和元数据(例如文件大小,上次更新时间)完全转移到实际的物理存储。

fsync的功能是确保文件fd所有已修改的内容已经正确同步到硬盘上,该调用会阻塞等待直到设备报告IO完成。


3. There is also fdatasync() which only guarantees the data portion will be transferred, so it should be faster.

    有几个选项,我们可以指定在文件打开时,修改的 write()的行为︰

    fdatasync函数类似于fsync,但它只影响文件的数据部分。而除数据外,fsync还会同步更新文件的属性(metadata)。


a. O_SYNC

In this case, the write() system call will still write data to kernel space buffers, 

but it will block until the data is actually transferred from the kernel space buffers to the physical storage. There is no need to call fsync() after.


b. O_DIRECT

This completely bypasses any kernel space buffers, but requires that the writes are the same size as the underlying storage block size (usually 512 bytes or 4k).

By itself, it does not guarantee that the data is completely transferred to the device when the call returns.


c.O_SYNC + O_DIRECT

As stated above, we would need to use both options together guarantee true synchronous IO.



Relation with InnoDB flushing(Innodb引擎刷盘策略)


1. fsync: InnoDB uses the fsync() system call to flush both the data and log files. fsync is the default setting.(innodb_flush_method=NULL


2. O_DSYNC: InnoDB uses O_SYNC to open and flush the log files, and fsync() to flush the data files. InnoDB does not use O_DSYNC directly because there have been problems with it on many varieties of Unix.


3. nosync: This option is used for internal performance testing and is currently unsupported. Use at your own risk.


4. O_DIRECT: InnoDB uses O_DIRECT (or directio() on Solaris) to open the data files, and uses fsync() to flush both the data and log files


5. O_DIRECT_NO_FSYNC:  use O_DIRECT to open the data files(InnoDB uses O_DIRECT during flushing I/O) but don’t use fsync() system call to flush both the data and log files. This option isn’t suitable for XFS file system.



https://www.pythian.com/blog/innodb-flushing-linux-io/

https://www.percona.com/doc/percona-server/5.6/scalability/innodb_io.html#innodb_flush_method

http://www.gpfeng.com/?p=474