原文: http://weibo.com/p/1001603862394207076573?sudaref=weibo.com
epoll的I/O事件触发方式有两种模式:ET(Edge Triggered)和LT(Level Triggered)。
这个触发模式其实是events(事件)的属性,该属性是和POLLIN、POLLOUT等属性并列且混杂使用的,而events有时依附在fd(文件描述符)上的,所以可以说:这个触发模式是events所依附的fd的属性。区分events和fd的原因是kernel内部实现也是这么区分的:events由eppoll_entry代表,fd由epitem代表。
先简单了解下这两种模式。
LT模式就是针对一个上报事件,如果没处理完,那么下次该事件还在,可以继续处理。而相同场景下的ET模式处理是:下次该事件就不在了,也就是说只通知一次,不给第二次处理的机会。epoll的默认模式是LT,毕竟select/poll都仅仅支持LT模式。
先看ET模式。
ET模式的基本说明:
An application that employs the EPOLLET flag should use nonblocking file descriptors to avoid having a blocking read or write starve a task that is handling multiple file descriptors. The suggested way to use epoll as an edge-triggered (EPOLLET) interface is as follows:
1)with nonblocking file descriptors; and
2)by waiting for an event only after read(2) or write(2) return EAGAIN.
By contrast, when used as a level-triggered interface (the default, when EPOLLET is not specified), epoll is simply a faster poll(2), and can be used wherever the latter is used since it shares the same semantics.
可见,ET模式时要使用非堵塞的fd,并且要采用穷极方式来处理事件(直到得到满意的EAGAIN返回值)。我们再仔细看看处理事件的流程:
Q:Do I need to continuously read/write a file descriptor until EAGAIN when using the EPOLLET flag (edge-triggered behavior) ?
A:Receiving an event from epoll_wait(2) should suggest to you that such file descriptor is ready for the requested I/O operation. You must consider it ready until the next (nonblocking) read/write yields EAGAIN. When and how you will use the file descriptor is entirely up to you.
For packet/token-oriented files (e.g., datagram socket, terminal in canonical mode), the only way to detect the end of the read/write I/O space is to continue to read/write until EAGAIN.
For stream-oriented files (e.g., pipe, FIFO, stream socket), the condition that the read/write I/O space is exhausted can also be detected by checking the amount of data read from / written to the target file descriptor. For example, if you call read(2) by asking to read a certain amount of data and read(2) returns a lower number of bytes, you can be sure of having exhausted the read I/O space for the file descriptor. The same is true when writing using write(2). (Avoid this latter technique if you can‐not guarantee that the monitored file descriptor always refers to a stream-oriented file.)
也就是说,判断结束的方法有两种:看EAGAIN返回值;比较期望count和真实count。
对于ET模式,需要注意的一点是:在eventpoll(标识一个epoll实例)的就绪队列里,放的是events所依附的fd,而不是events本身。这会导致一种捎带现象:一个POLLIN事件触发一次后,不再出现了,虽然用户没有处理完。但是一个新的POLLOUT事件来了,此时旧的POLLIN事件又捎带出来了。(打个比方,人是fd,新帐旧帐是不同的events,则类似于:有旧账没关系,不碰面就行。就怕有新帐,有了新帐,就得碰面并且新帐旧账一起算。。。)
我们觉得,ET模式还是很合理的,跟网卡的NAPI读包方式很像,都带有poll(高效:对于epoll而言,毕竟节省了很多额外的epoll系统调用的次数)特色。。。
再看LT模式。
为何是先看ET再看LT?因为从代码实现上看,我们觉得epoll应该是在支持ET之后才增加的支持LT。
对LT的支持很简单,仅需要在已有ET实现的基础上多几行代码:
在处理上报事件之后又重新将事件放回到eventpoll(标识一个epoll实例)的就绪队列里,也就是说直接给了这些事件(实际上是epitem,即事件所依附的文件描述符)第二次处理的机会(ET模式是没有这个机会的),虽然这个机会可能不需要:因为用户可能第一次就穷极处理完了该事件。可见,LT模式的处理有点低效(特别是在fd数量很大的情况下)。
综上,epoll实现了一种I/O事件的通知方式的两种模式:LT和ET,从效率上看,ET应该更有优势。