Linux大文件写入系列测试(三):POSIX的write()写入,且使用O_DIRECT选项
 
    在man 2 open的时候发现一个O_DIRECT选项,使用O_DIRECT选项后,可以不使用缓存直接写入。在海量数据写入的时候,不使用缓存貌似更快呢!于是也尝试写了一个用O_DIRECT选项的文件写入。完成O_DIRECT选项写入的代码还真不容易,使用new或者malloc分配的内存是无法在O_DIRECT选项下工作的,必须使用posix_memalign(或valloc, memalign,这两个函数已经被标记为废弃)。
      我的猜想(唉,没学过内核,只能是猜想)是:要使用高性能的调用,则在用户空间分配的内存的格式必须与内核空间所使用的内存页一致,这样内核才能高速地处理数据。posix_memalign是基于页的方式来分配内存的,且分配的大小必须是页大小的整数倍。

     以下是实现了O_DIRECT选项的文件写入代码:
//写入200MB数据花费的时间是3100994微秒,3.1秒,更慢了,而且慢太多!!!
/*
测试大数据写入的性能 test_io_4.cpp 使用posix函数,且使用O_DIRECT选项
*/

#include
#include
#include
#include
#include
#include
#include
#include

static int operator-(struct timeval& lsh, struct timeval& rsh)
{
    if (lsh.tv_sec==rsh.tv_sec)
    {
        return lsh.tv_usec - rsh.tv_usec;
    }
    else
    {
        return (lsh.tv_sec-rsh.tv_sec)*1000000 + (lsh.tv_usec - rsh.tv_usec);
    }
}

void test()
{
    struct timeval start;
    struct timeval end;
    const int DATA_LEN = 1024*1024*200; //200MB
    char* pData = NULL;
    printf("page size=%d\n", getpagesize());
    int nTemp = posix_memalign((void**)&pData, getpagesize(), DATA_LEN);
    if (0!=nTemp)
    {
        perror("posix_memalign error");
        return;
    }
    //pData[DATA_LEN-1] = '\0';
    gettimeofday(&start, NULL);
    int fd = open("write_direct.dat", O_RDWR | O_CREAT | O_DIRECT);
    if (fd<0)
    {
        perror("open error:");
        return;
    }
    int nLen = write(fd, pData, DATA_LEN);
    if (nLen     {
        perror("write error:");
        return;
    }
    close(fd);
    fd = -1;

    gettimeofday(&end, NULL);
    free(pData);
    pData = NULL;
    //显示占用时间
    struct tm stTime;
    localtime_r(&start.tv_sec, &stTime);
    char strTemp[40];
    strftime(strTemp, sizeof(strTemp)-1, "%Y-%m-%d %H:%M:%S", &stTime);
    printf("start=%s.%07d\n", strTemp, start.tv_usec);
    //
    localtime_r(&end.tv_sec, &stTime);
    strftime(strTemp, sizeof(strTemp)-1, "%Y-%m-%d %H:%M:%S", &stTime);
    printf("end =%s.%07d\n", strTemp, end.tv_usec);
    printf("spend=%d 微秒\n", end-start);
}

int main()
{
    test();
    return 1;
}

    使用了O_DIRECT选项反而是文件写入更慢了,百思不得其解,终于在网上找到这篇文章:
《Linus对O_DIRECT非常不感冒啊,呵呵》
http://www.linux-ren.org/modules/newbb/viewtopic.php?topic_id=2722

-------------------------------------------

A thread on the lkml began with a query about using O_DIRECT when opening a file. An early white paper written by Andrea Arcangeli [interview] to describe the O_DIRECT patch before it was merged into the 2.4 kernel explains, "with O_DIRECT the kernel will do DMA directly from/to the physical memory pointed [to] by the userspace buffer passed as [a] parameter to the read/write syscalls. So there will be no CPU and memory bandwidth spent in the copies between userspace memory and kernel cache, and there will be no CPU time spent in kernel in the management of the cache (like cache lookups, per-page locks etc..)." Linux creator Linus Torvalds was quick to reply that despite all the claims there is no good reason for mounting files with O_DIRECT, suggesting that interfaces like madvise() and posix_fadvise() should be used instead.

From: Aubrey [email blocked]
To: "Hua Zhong" [email blocked]O_
Subject: O_DIRECT question
Date: Thu, 11 Jan 2007 10:57:06 +0800

Hi all,

Opening file with O_DIRECT flag can do the un-buffered read/write access.
So if I need un-buffered access, I have to change all of my
applications to add this flag. What's more, Some scripts like "cp
oldfile newfile" still use pagecache and buffer.
Now, my question is, is there a existing way to mount a filesystem
with O_DIRECT flag? so that I don't need to change anything in my
system. If there is no option so far, What is the right way to achieve
my purpose?

Thanks a lot.
-Aubrey



From: Linus Torvalds [email blocked]
Subject: Re: O_DIRECT question
Date: Wed, 10 Jan 2007 19:05:30 -0800 (PST)



On Thu, 11 Jan 2007, Aubrey wrote:
>
> Now, my question is, is there a existing way to mount a filesystem
> with O_DIRECT flag? so that I don't need to change anything in my
> system. If there is no option so far, What is the right way to achieve
> my purpose?

The right way to do it is to just not use O_DIRECT.

The whole notion of "direct IO" is totally braindamaged. Just say no.

This is your brain: O
This is your brain on O_DIRECT: .

Any questions?

I should have fought back harder. There really is no valid reason for EVER
using O_DIRECT. You need a buffer whatever IO you do, and it might as well
be the page cache. There are better ways to control the page cache than
play games and think that a page cache isn't necessary.

So don't use O_DIRECT. Use things like madvise() and posix_fadvise()
instead.

Linus



From: Linus Torvalds [email blocked]
Subject: Re: O_DIRECT question
Date: Wed, 10 Jan 2007 19:15:48 -0800 (PST)



On Wed, 10 Jan 2007, Linus Torvalds wrote:
>
> So don't use O_DIRECT. Use things like madvise() and posix_fadvise()
> instead.

Side note: the only reason O_DIRECT exists is because database people are
too used to it, because other OS's haven't had enough taste to tell them
to do it right, so they've historically hacked their OS to get out of the
way.

As a result, our madvise and/or posix_fadvise interfaces may not be all
that strong, because people sadly don't use them that much. It's a sad
example of a totally broken interface (O_DIRECT) resulting in better
interfaces not getting used, and then not getting as much development
effort put into them.

So O_DIRECT not only is a total disaster from a design standpoint (just
look at all the crap it results in), it also indirectly has hurt better
interfaces. For example, POSIX_FADV_NOREUSE (which _could_ be a useful and
clean interface to make sure we don't pollute memory unnecessarily with
cached pages after they are all done) ends up being a no-op ;/

Sad. And it's one of those self-fulfilling prophecies. Still, I hope some
day we can just rip the damn disaster out.

Linus



From: Andrew Morton [email blocked]
Subject: Re: O_DIRECT question
Date: Wed, 10 Jan 2007 20:51:57 -0800

On Thu, 11 Jan 2007 10:57:06 +0800
Aubrey [email blocked] wrote:

> Hi all,
>
> Opening file with O_DIRECT flag can do the un-buffered read/write access.
> So if I need un-buffered access, I have to change all of my
> applications to add this flag. What's more, Some scripts like "cp
> oldfile newfile" still use pagecache and buffer.
> Now, my question is, is there a existing way to mount a filesystem
> with O_DIRECT flag? so that I don't need to change anything in my
> system. If there is no option so far, What is the right way to achieve
> my purpose?

Not possible, basically.

O_DIRECT reads and writes must be aligned to the device's block size
(usually 512 bytes) in memory addresses, file offsets and read/write request
sizes. Very few applications will bother to do that and will hence fail if
their files are automagically opened with O_DIRECT.
----------------------------------------------------------------------------------

    呵呵,注意这一句:The whole notion of "direct IO" is totally braindamaged. (使用"direct IO"的想法简直是脑子坏掉了!) 恩,看来O_DIRECT选项是早就不推荐使用的了。

NE:

使用O_DIRECT的话,就必须以页为单位进行I/O,这是没办法的事,因为设备本身就是块设备。你可以加一层中间代码,自己计算对齐后的文件偏移量,用posix_memalign生成对齐的buffer,进行I/O以后,再把buffer里面的内容copy到调用者的buffer里面去。