1,select的用法

fd_set fd_in, fd_out;
struct timeval tv;
 
// Reset the sets
FD_ZERO( &fd_in );
FD_ZERO( &fd_out );
 
// Monitor sock1 for input events
FD_SET( sock1, &fd_in );
 
// Monitor sock2 for output events
FD_SET( sock2, &fd_out );
 
// Find out which socket has the largest numeric value as select requires it
int largest_sock = sock1 > sock2 ? sock1 : sock2;
 
// Wait up to 10 seconds
tv.tv_sec = 10;
tv.tv_usec = 0;
 
// Call the select
int ret = select( largest_sock + 1, &fd_in, &fd_out, NULL, &tv );
 
// Check if select actually succeed
if ( ret == -1 )
    // report error and abort
else if ( ret == 0 )
    // timeout; no event detected
else
{
    if ( FD_ISSET( sock1, &fd_in ) )
        // input event on sock1
 
    if ( FD_ISSET( sock2, &fd_out ) )
        // output event on sock2
}


select的缺点:

1,select所监听的fd_set不能重用,因为select这个函数会修改它。即使你监听的连接是没有改变的(没有减少或增加),在调用select()前也要重建一次fd_set。

 

可以将它们存在本地,每次调用select()监听的时候,就从本地的fd_set复制数据传给select函数。

memcpy(m_pFDRead.get(),m_pFDReadSave.get(),sizeof(fd_set)); //m_pFDReadSave为本地
memcpy(m_pFDWrite.get(),m_pFDWriteSave.get(),sizeof(fd_set));
memcpy(m_pFDError.get(),m_pFDErrorSave.get(),sizeof(fd_set));

int res = select( m_nMaxSocketID,  m_pFDRead.get() , m_pFDWrite.get(), 
                   m_pFDError.get(), ptv);

原文:

  • select modifies the passed fd_sets so none of them can be reused. Even if you don’t need to change anything – such as if one of descriptors received data and needs to receive more data – a whole set has to be either recreated again (argh!) or restored from a backup copy via FD_COPY. And this has to be done each time theselect is called.


2,需要轮询。 即使你监听2000个,只有最后一个socket就绪,你也要轮询2000次。

原文:

  • To find out which descriptors raised the events you have to manually iterate through all the descriptors in the set and call FD_ISSET on each one of them. When you have 2,000 of those descriptors and only one of them is active – and, likely, the last one – you’re wasting CPU cycles each time you wait.


3,最大能监听的socket句柄数:1024。你改了也没有用,linux直接忽略掉。

原文:

  • Did I just mention 2,000 descriptors? Well, select cannot support that much. At least on Linux. The maximum number of the supported descriptors is defined by the FD_SETSIZE constant, which Linux happily defines as 1024. And while some operating systems allow you to hack this restriction by redefining the FD_SETSIZE before including the sys/select.h, this is not portable. Indeed, Linux would just ignore this hack and the limit will stay the same.


4,对于select()函数监听着的socket集合,你不能从其他线程去修改它,例如,你想在另外一个线程关掉某个socket,会导致未知行为。

If a file descriptor being monitored by select() is closed in another thread, the result is unspecified。


原文:

  • You cannot modify the descriptor set from a different thread while waiting. Suppose a thread is executing the code above. Now suppose you have a housekeeping thread which decided that sock1 has been waiting too long for the input data, and it is time to cut the cord. Since this socket could be reused to serve another payingworking client, the housekeeping thread wants to close the socket. However the socket is in the fd_set whichselect is waiting for.
    Now what happens when this socket is closed? man select has the answer, and you won’t like it. The answer is, “If a file descriptor being monitored by select() is closed in another thread, the result is unspecified”.


5,对于select监听的socket,有的事件你是无法检测的,例如:socket是否关闭。如果你在读的话,你可以通过检测返回值是否为0来检查。但是如果你在往socket里面写数据的话就糟糕了,无法知道。

原文:

  • The choice of the events to wait for is limited; for example, to detect whether the remote socket is closed you have to a) monitor it for input and b) actually attempt to read the data from socket to detect the closure (readwill return 0). Which is fine if you want to read from this socket, but what if you’re sending a file and do not care about any input right now?


6,

  • select puts extra burden on you when filling up the descriptor list to calculate the largest descriptor number and provide it as a function parameter.

7,

  • Same problem arises if another thread suddenly decides to send something via sock1. It is not possible to start monitoring the socket for the output event until select returns.



那么,select是否应该完全弃用呢,不用它就好了?错了,有下面两种原因,需要用到select:

1,可移植性。每台机器都会有select,但不一定会有poll。poll只有在vista以上的机器才会提供。

2,timeouts的精度,select理论上可以控制到纳秒级别,但是poll与epoll只能控制到毫秒级别。普通的程序不需要关心这个,但是有些需要高精度的嵌入式程序,则是有可能用到,例如核反应堆。


但如果你写的程序,只需关心很少的连接,例如200个以下这样,poll与select没什么特别的性能差异。你喜欢哪个就用哪个好了。


原文:

Of course the operating system developers recognized those drawbacks and addressed most of them when designing the poll method. Therefore you may ask, is there is any reason to use select at all? Why don’t just store it in the shelf of the Computer Science Museum? Then you may be pleased to know that yes, there are two reasons, which may be either very important to you or not important at all.

The first reason is portability. select has been around for ages, and you can be sure that every single platform around which has network support and nonblocking sockets will have a working select implementation while it might not have poll at all. And unfortunately I’m not talking about the tubes and ENIAC here; poll is only available on Windows Vista and above which includes Windows XP – still used by the whooping 34% of users as of Sep 2013despite the Microsoft pressure. Another option would be to still use poll on those platforms and emulate it withselect on those which do not have it; it is up to you whether you consider it reasonable investment.

The second reason is more exotic, and is related to the fact that select can – theoretically – handle the timeouts withing the one nanosecond precision, while both poll and epoll can only handle the one millisecond precision. This is not likely to be a concern on a desktop or server system, which clocks doesn’t even run with such precision, but it may be necessary on a realtime embedded platform while interacting with some hardware components. Such as lowering control rods to shut down a nuclear reactor – in this case, please, use select to make sure we’re all stay safe!

The case above would probably be the only case where you would have to use select and could not use anything else. However if you are writing an application which would never have to handle more than a handful of sockets (like, 200), the difference between using poll and select would not be based on performance, but more on personal preference or other factors.




Polling with poll()


poll的用法:

// The structure for two events
struct pollfd fds[2];
 
// Monitor sock1 for input
fds[0].fd = sock1;
fds[0].events = POLLIN;
 
// Monitor sock2 for output
fds[1].fd = sock2;
fds[1].events = POLLOUT;
 
// Wait 10 seconds
int ret = poll( &fds, 2, 10000 );

poll解决了大部分select的缺陷。

poll有以下的优点:

1,监控的socket数量已经没有硬性的限制,所以1024那个限制在此已经没有了(或已经不适用)。

  • There is no hard limit on the number of descriptors poll can monitor, so the limit of 1024 does not apply here.


2,poll函数不会改变传进去的pollfd结构体,所以监控的pollfd可以重用。

  • It does not modify the data passed in the struct pollfd data. Therefore it could be reused between the poll() calls as long as set to zero the revents member for those descriptors which generated the events. The IEEE specification states that “In each pollfd structure, poll() shall clear the revents member, except that where the application requested a report on a condition by setting one of the bits of events listed above, poll() shall set the corresponding bit in revents if the requested condition is true“. However in my experience at least one platform did not follow this recommendation, and man 2 poll on Linux does not make such guarantee either (man 3p poll does though).


3,可以检测更细粒度的socket行为。例如远程socket已经关了可以检测出来,就不需要像select那样还要通过read来检测一下。

  • It allows more fine-grained control of events comparing to select. For example, it can detect remote peer shutdown without monitoring for read events.


poll还是有些缺点:

1,像select一样,还是需要轮询。

  • Like select, it is still not possible to find out which descriptors have the events triggered without iterating through the whole list and checking the revents. Worse, the same happens in the kernel space as well, as the kernel has to iterate through the list of file descriptors to find out which sockets are monitored, and iterate through the whole list again to set up the events.


2,像select一样,不能动态修改监听的set集合和关闭正在被监听的socket。

  • Like select, it is not possible to dynamically modify the set or close the socket which is being polled (see above).



Please keep in mind, however, that those issues might be considered unimportant for most client networking applications – the only exception would be client software such as P2P which may require handling of thousands of open connections. Those issues might not be important even for some server applications. Therefore poll should be your default choice over select unless you have specific reasons mentioned above. More, poll should be your preferred method even over epoll if the following is true:

在下面的情况下,poll的性能反而会比epoll好:


1,不止在linux下面用,又懒得封装。因为epoll只支持linux。

  • You need to support more than just Linux, and do not want to use epoll wrappers such as libevent (epoll is Linux only);


2,监控的连接在1000以下。

  • Your application needs to monitor less than 1000 sockets at a time (you are not likely to see any benefits from using epoll);


3,你的程序的监控的连接在1000以上,但是都是短连接,发送小部分数据就断开。

  • Your application needs to monitor more than 1000 sockets at a time, but the connections are very short-lived (this is a close case, but most likely in this scenario you are not likely to see any benefits from using epoll because the speedup in event waiting would be wasted on adding those new descriptors into the set – see below)


4,你的程序除了监控的线程外,不会有其他线程去改变正在监控中的socket。

  • Your application is not designed the way that it changes the events while another thread is waiting for them (i.e. you’re not porting an app using kqueue or IO Completion Ports).


Polling with epoll()

epoll is the latest, greatest, newest polling method in Linux (and only Linux). Well, it was actually added to kernel in 2002, so it is not so new. It differs both from poll and select in such a way that it keeps the information about the currently monitored descriptors and associated events inside the kernel, and exports the API to add/remove/modify those.

To use epoll, much more preparation is needed. A developer needs to:

  • Create the epoll descriptor by calling epoll_create;

  • Initialize the struct epoll structure with the wanted events and the context data pointer. Context could be anything, epoll passes this value directly to the returned events structure. We store there a pointer to our Connection class.

  • Call epoll_ctl( … EPOLL_CTL_ADD ) to add the descriptor into the monitoring set

  • Call epoll_wait() to wait for 20 events for which we reserve the storage space. Unlike previous methods, this call receives an empty structure, and fills it up only with the triggered events. For example, if there are 200 descriptors and 5 of them have events pending, the epoll_wait will return 5, and only the first five members of the pevents structure will be initialized. If 50 descriptors have events pending, the first 20 would be copied and 30 would be left in queue, they won’t get lost.

  • Iterate through the returned items. This will be a short iteration since the only events returned are those which are triggered.


A typical workflow looks like that:

// Create the epoll descriptor. Only one is needed per app, and is used to monitor all sockets.
// The function argument is ignored (it was not before, but now it is), so put your favorite number here
int pollingfd = epoll_create( 0xCAFE ); 

if ( pollingfd < 0 )
 // report error

// Initialize the epoll structure in case more members are added in future
struct epoll_event ev = { 0 };

// Associate the connection class instance with the event. You can associate anything
// you want, epoll does not use this information. We store a connection class pointer, pConnection1
ev.data.ptr = pConnection1;

// Monitor for input, and do not automatically rearm the descriptor after the event
ev.events = EPOLLIN | EPOLLONESHOT;
// Add the descriptor into the monitoring list. We can do it even if another thread is 
// waiting in epoll_wait - the descriptor will be properly added
if ( epoll_ctl( epollfd, EPOLL_CTL_ADD, pConnection1->getSocket(), &ev ) != 0 )
    // report error

// Wait for up to 20 events (assuming we have added maybe 200 sockets before that it may happen)
struct epoll_event pevents[ 20 ];

// Wait for 10 seconds
int ready = epoll_wait( pollingfd, pevents, 20, 10000 );
// Check if epoll actually succeed
if ( ret == -1 )
    // report error and abort
else if ( ret == 0 )
    // timeout; no event detected
else
{
    // Check if any events detected
    for ( int i = 0; i < ret; i++ )
    {
        if ( pevents[i].events & EPOLLIN )
        {
            // Get back our connection pointer
            Connection * c = (Connection*) pevents[i].data.ptr;
            c->handleReadEvent();
         }
    }
}

由此,我们看到,epoll的缺点就是更复杂,需要你写更多的代码,会调用更多的系统api。

原文:

Just looking at the implementation alone should give you the hint of what are the disadvantages of epoll, which we will mention firs. It is more complex to use, and requires you to write more code, and it requires more library calls comparing to other polling methods.


然而,在性能和功能上面,epoll会比select/poll会有更卓越的表现

原文:


1,不需要轮询。不需要遍历一万个去找一个。

  • epoll returns only the list of descriptors which triggered the events. No need to iterate through 10,000 descriptors anymore to find that one which triggered the event!


2,在添加监控的时候,你除了添加socket句柄外,还可以添加一些额外的自定义信息。例如添加一个行为指针,当被触发的时候,就调用这个指针的相关函数。

       struct epoll_event tmp_pfd;
        memset( &tmp_pfd, 0, sizeof(tmp_pfd) );
        int nOldEvents = static_cast( pAction->getExtValue() );

        tmp_pfd.data.fd = s;
        tmp_pfd.data.ptr = pAction;

原文:

  • You can attach meaningful context to the monitored event instead of socket file descriptors. In our example we attached the class pointers which could be called directly, saving you another lookup.


3,支持多线程。在epoll_wait监控的时候,你可以通过不同的线程添加,删除或者修改fd。一切都运转良好。


  • You can add sockets or remove them from monitoring anytime, even if another thread is in the epoll_waitfunction. You can even modify the descriptor events. Everything will work properly, and this behavior is supported and documented. This gives you much more flexibility in implementation.


4,即使你不调用epoll_wait,系统内核也会注册(或记下)就绪的事件。这对感兴趣的特征的检测对边缘触发提供了支持。

  • Since the kernel knows all the monitoring descriptors, it can register the events happening on them even when nobody is calling epoll_wait. This allows implementing interesting features such as edge triggering, which will be described in a separate article.


5,可以多线程调用epoll_wait检测同一个集合。边缘触发用到的方法。 在epoll与select中你是不可能做到的。

  • It is possible to have the multiple threads waiting on the same epoll queue with epoll_wait(), something you cannot do with select/poll. In fact it is not only possible with epoll, but the recommended method in the edge triggering mode.



然而,epoll并不是更好的poll,相比poll,它还是有些弊端:

However you need to keep in mind that epoll is not a “better poll”, and it also has disadvantages when comparing to poll:


1,对于读事件与写事件的切换,需要用到epoll_ctl系统调用。而poll的话只是一个在用户空间的位设定。如果将5000个socket从read变成write检测,那么需要调用epoll_ctl5000次,这将导致频繁的上下文切换。而poll的话,只需要对pollfd结构体进行一个简单的循环设置。

  • Changing the event flags (i.e. from READ to WRITE) requires the epoll_ctl syscall, while when using poll this is a simple bitmask operation done entirely in userspace. Switching 5,000 sockets from reading to writing with epollwould require 5,000 syscalls and hence context switches (as of 2014 calls to epoll_ctl still  could not be batched, and each descriptor must be changed separately), while in poll it would require a single loop over the pollfdstructure.


2,每个连接过来,你需要调用两个系统调用,epoll_ctl与epoll_wait, 而poll的话你只需要调用一个poll()。 如果你的程序有很多只接收或发送少量数据的短连接的话,epoll会比poll更慢。

  • Each accept()ed socket needs to be added to the set, and same as above, with epoll it has to be done by callingepoll_ctl – which means there are two required syscalls per new connection socket instead of one for poll. If your server has many short-lived connections which send or receive little traffic, epoll will likely take longer than pollto serve them.



3,epoll只在linux里面支持。

  • epoll is exclusively Linux domain, and while other platforms have similar mechanisms, they are not exactly the same – edge triggering, for example, is pretty unique (FreeBSD’s kqueue supports it too though).


4,为了高性能,程序逻辑会更加复杂和更难于debug,特别是在边缘触发模式下,如果你忘记了读或写的话,很容易死锁。

  • High performance processing logic is more complex and hence more difficult to debug, especially for edge triggering which is prone to deadlocks if you miss extra read/write.



因此,只建议你在以下的情况用epoll:

Therefore you should only use epoll if all following is true:

1,多线程程序。单线程的话,epoll的性能是跟不上poll的。

  • Your application runs a thread poll which handles many network connections by a handful of threads. You would lose most of epoll benefits in a single-threaded application, and most likely it won’t outperform poll.


2,大量连接的程序。起码大于1000个。如果连接很少的话,poll可能还比epoll性能更好。

  • You expect to have a reasonably large number of sockets to monitor (at least 1,000); with a smaller number epoll is not likely to have any performance benefits over poll and may actually worse the performance;


3,长连接。因为每次连接,epoll会调用额外的系统调用,所以,短连接用epoll会更慢。

  • Your connections are relatively long-lived; as stated above epoll will be slower than poll in a situation when a new connection sends a few bytes of data and immediately disconnects because of extra system call required to add the descriptor into epoll set;


4,

  • Your app depends on other Linux-specific features (so in case portability question would suddenly pop up, epoll wouldn’t be the only roadblock), or you can provide wrappers for other supported systems. In the last case you should strongly consider libevent.