查看 http://www.9say.com/2009/01/high-performence-linux-socket-server-api-epoll/
其实,epoll与select原理类似,只不过,epoll作出了一点重大改进,即:
当它们所监听的集合中有状态发生改变时,select需要循环检查整个集合,才能确定那个文件描述符状态发生改变,进而进行操作;
而epoll在添加文件描述符到集合时,已经绑定了该文件描述符的对应函数,因此,当该文件描述符状态改变时,不需要循环查询整个集合,因而将复杂度由0(n)将为o(1),性能得到几何量级的提高,尤其是在大量连接的情况下。
libevent是一个跨平台、高性能的函数库,在linux平台,它使用epoll,在freebsd平台,它使用kqueue,在windows平台,它使用iocp,原理上都是一致的。libevent官方主页上有一副图,对比说明了select与epoll/kqueue之间的性能差距,如下:
epoll有三个主要函数:
epoll_create
epoll_ctl
epoll_wait
NAME
epoll_create – open an epoll file descriptor
SYNOPSIS
#include <sys/epoll.h>
int epoll_create(int size)
DESCRIPTION
Open an epoll file descriptor by requesting the kernel allocate an event backing store dimensioned for size descriptors. The size is not the maximum size of the backing store but just a hint to the kernel about how to dimension internal structures. The returned file descriptor will be used for all the subsequent calls to the epoll interface. The file descriptor returned by epoll_create(2) must be closed by using close(2).
RETURN VALUE
When successful, epoll_create(2) returns a non-negative integer identifying the descriptor. When an error occurs, epoll_create(2) returns -1 and errno is set appropriately.
ERRORS
EINVAL size is not positive.
ENFILE The system limit on the total number of open files has been reached.
ENOMEM There was insufficient memory to create the kernel object.
CONFORMING TO
epoll_create(2) is a new API introduced in Linux kernel 2.5.44. The interface should be finalized by Linux kernel 2.5.66.
NAME
epoll_ctl – control interface for an epoll descriptor
SYNOPSIS
#include <sys/epoll.h>
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event)
DESCRIPTION
Control an epoll descriptor, epfd, by requesting that the operation op be performed on the target file descrip-tor, fd. The event describes the object linked to the file descriptor fd. The struct epoll_event is defined as :
typedef union epoll_data {
void *ptr;
int fd;
__uint32_t u32;
__uint64_t u64;
} epoll_data_t;
struct epoll_event {
__uint32_t events; /* Epoll events */
epoll_data_t data; /* User data variable */
};
The events member is a bit set composed using the following available event types :
EPOLLIN
The associated file is available for read(2) operations.
EPOLLOUT
The associated file is available for write(2) operations.
EPOLLPRI
There is urgent data available for read(2) operations.
EPOLLERR
Error condition happened on the associated file descriptor. epoll_wait(2) will always wait for this event; it is not necessary to set it in events.
EPOLLHUP
Hang up happened on the associated file descriptor. epoll_wait(2) will always wait for this event; it is not necessary to set it in events.
EPOLLET
Sets the Edge Triggered behaviour for the associated file descriptor. The default behaviour for epoll is Level Triggered. See epoll(4) for more detailed information about Edge and Level Triggered event distribution architectures.
EPOLLONESHOT
Sets the one-shot behaviour for the associated file descriptor. This means that after an event is pulled out with epoll_wait(2) the associated file descriptor is internally disabled and no other events will be reported by the epoll interface. The user must call epoll_ctl(2) with EPOLL_CTL_MOD to re-enable the file descriptor with a new event mask.
The epoll interface supports all file descriptors that support poll(2). Valid values for the op parameter are :
EPOLL_CTL_ADD
Add the target file descriptor fd to the epoll descriptor epfd and associate the event event with the internal file linked to fd.
EPOLL_CTL_MOD
Change the event event associated with the target file descriptor fd.
EPOLL_CTL_DEL
Remove the target file descriptor fd from the epoll file descriptor, epfd. The event is ignored and can be NULL (but see BUGS below).
RETURN VALUE
When successful, epoll_ctl(2) returns zero. When an error occurs, epoll_ctl(2) returns -1 and errno is set appropriately.
ERRORS
EBADF epfd is not a valid file descriptor.
EEXIST op was EPOLL_CTL_ADD, and the supplied file descriptor fd is already in epfd.
EINVAL epfd is not an epoll file descriptor, or fd is the same as epfd, or the requested operation op is not supported by this interface.
ENOENT op was EPOLL_CTL_MOD or EPOLL_CTL_DEL, and fd is not in epfd.
ENOMEM There was insufficient memory to handle the requested op control operation.
EPERM The target file fd does not support epoll.
CONFORMING TO
epoll_ctl(2) is a new API introduced in Linux kernel 2.5.44. The interface should be finalized by Linux kernel 2.5.66.
BUGS
In kernel versions before 2.6.9, the EPOLL_CTL_DEL operation required a non-NULL pointer in event, even though this argument is ignored. Since kernel 2.6.9, event can be specified as NULL when using EPOLL_CTL_DEL.
NAME
epoll_wait – wait for an I/O event on an epoll file descriptor
SYNOPSIS
#include <sys/epoll.h>
int epoll_wait(int epfd, struct epoll_event * events, int maxevents, int timeout)
DESCRIPTION
Wait for events on the epoll file descriptor epfd for a maximum time of timeout milliseconds. The memory area pointed to by events will contain the events that will be available for the caller. Up to maxevents are returned by epoll_wait(2). The maxevents parameter must be greater than zero. Specifying a timeout of -1 makes epoll_wait(2) wait indefinitely, while specifying a timeout equal to zero makes epoll_wait(2) to return immediately even if no events are available ( return code equal to zero ). The struct epoll_event is defined as :
typedef union epoll_data {
void *ptr;
int fd;
__uint32_t u32;
__uint64_t u64;
} epoll_data_t;
struct epoll_event {
__uint32_t events; /* Epoll events */
epoll_data_t data; /* User data variable */
};
The data of each returned structure will contain the same data the user set with a epoll_ctl(2) (EPOLL_CTL_ADD,EPOLL_CTL_MOD) while the events member will contain the returned event bit field.
RETURN VALUE
When successful, epoll_wait(2) returns the number of file descriptors ready for the requested I/O, or zero if no file descriptor became ready during the requested timeout milliseconds. When an error occurs, epoll_wait(2) returns -1 and errno is set appropriately.
ERRORS
EBADF epfd is not a valid file descriptor.
EFAULT The memory area pointed to by events is not accessible with write permissions.
EINTR The call was interrupted by a signal handler before any of the requested events occurred or the timeout expired.
EINVAL epfd is not an epoll file descriptor, or maxevents is less than or equal to zero.
CONFORMING TO
epoll_wait(2) is a new API introduced in Linux kernel 2.5.44. The interface should be finalized by Linux kernel 2.5.66.
Linux网络编程一步一步学-epoll同时处理海量连接的代码
/*####################################################################################*//** *# @file: epol.c *# TODO *####################################################################################*/ #include <stdio.h> #include <stdlib.h> #include <errno.h> #include <string.h> #include <sys/types.h> #include <netinet/in.h> #include <sys/socket.h> #include <sys/wait.h> #include <unistd.h> #include <arpa/inet.h> #include <openssl/ssl.h> #include <openssl/err.h> #include <fcntl.h> #include <sys/epoll.h> #include <sys/time.h> #include <sys/resource.h> #define MAXBUF 1024 #define MAXEPOLLSIZE 10000 /* * setnonblocking - 设置句柄为非阻塞方式 */ int setnonblocking(int sockfd) { if(fcntl(sockfd, F_SETFL, fcntl(sockfd, F_GETFD, 0) | O_NONBLOCK) == -1) { return(-1); } return(0); } /* * handle_message - 处理每个 socket 上的消息收发 */ int handle_message(int new_fd) { char buf[MAXBUF + 1]; int len; /* 开始处理每个新连接上的数据收发 */ bzero(buf, MAXBUF + 1); /* 接收客户端的消息 */ len = recv(new_fd, buf, MAXBUF, 0); if(len > 0) { printf ("%d接收消息成功:'%s',共%d个字节的数据/n", new_fd, buf, len); } else { if(len < 0) { printf ("消息接收失败!错误代码是%d,错误信息是'%s'/n", errno, strerror(errno) ); } close(new_fd); return(-1); } /* 处理每个新连接上的数据收发结束 */ return(len); } int main(int argc, char **argv) { int listener, new_fd, kdpfd, nfds, n, ret, curfds; socklen_t len; struct sockaddr_in my_addr, their_addr; unsigned int myport, lisnum; struct epoll_event ev; struct epoll_event events[MAXEPOLLSIZE]; struct rlimit rt; if(argv[1]) { myport = atoi(argv[1]); } else { myport = 7838; } if(argv[2]) { lisnum = atoi(argv[2]); } else { lisnum = 2; } /* 设置每个进程允许打开的最大文件数 */ rt.rlim_max = rt.rlim_cur = MAXEPOLLSIZE; if(setrlimit(RLIMIT_NOFILE, &rt) == -1) { perror("setrlimit"); exit(1); } else { printf("设置系统资源参数成功!/n"); } /* 开启 socket 监听 */ if( (listener = socket(PF_INET, SOCK_STREAM, 0) ) == -1) { perror("socket"); exit(1); } else { pintf("socket 创建成功!/n"); } setnonblocking(listener); bzero(&my_addr, sizeof(my_addr) ); my_addr.sin_family = PF_INET; my_addr.sin_port = htons(myport); if(argv[3]) { my_addr.sin_addr.s_addr = inet_addr(argv[3]); } else { my_addr.sin_addr.s_addr = INADDR_ANY; } if(bind (listener, (struct sockaddr *)&my_addr, sizeof(struct sockaddr) ) == -1) { perror("bind"); exit(1); } else { printf("IP 地址和端口绑定成功/n"); } if(listen(listener, lisnum) == -1) { perror("listen"); exit(1); } else { printf("开启服务成功!/n"); } /* 创建 epoll 句柄,把监听 socket 加入到 epoll 集合里 */ kdpfd = epoll_create(MAXEPOLLSIZE); len = sizeof(struct sockaddr_in); ev.events = EPOLLIN | EPOLLET; ev.data.fd = listener; if(epoll_ctl(kdpfd, EPOLL_CTL_ADD, listener, &ev) < 0) { fprintf(stderr, "epoll set insertion error: fd=%d/n", listener); return(-1); } else { printf("监听 socket 加入 epoll 成功!/n"); } curfds = 1; while(1) { /* 等待有事件发生 */ nfds = epoll_wait(kdpfd, events, curfds, -1); if(nfds == -1) { perror("epoll_wait"); break; } /* 处理所有事件 */ for(n = 0; n < nfds; ++n) { if(events[n].data.fd == listener) { new_fd = accept(listener, (struct sockaddr *)&their_addr, &len); if(new_fd < 0) { perror("accept"); continue; } else { printf("有连接来自于: %d:%d, 分配的 socket 为:%d/n", inet_ntoa(their_addr.sin_addr), ntohs(their_addr.sin_port), new_fd); } setnonblocking(new_fd); ev.events = EPOLLIN | EPOLLET; ev.data.fd = new_fd; if(epoll_ctl(kdpfd, EPOLL_CTL_ADD, new_fd, &ev) < 0) { fprintf(stderr, "把 socket '%d' 加入 epoll 失败!%s/n", new_fd, strerror(errno) ); return(-1); } curfds++; } else { ret = handle_message(events[n].data.fd); if(ret < 1 && errno != 11) { epoll_ctl(kdpfd, EPOLL_CTL_DEL, events[n].data.fd, &ev); curfds--; } } } } close(listener); return(0); }