How to use epoll? A complete example in C

How to use epoll? A complete example in C - Banu Blog

How to use epoll? A complete example in C

Thursday, 2 June 2011 @ 1238 GMT by Mukund Sivaraman

Network servers are traditionally implemented using a separate process or thread per connection. For high performance applications that need to handle a very large number of clients simultaneously, this approach won't work well, because factors such as resource usage and context-switching time influence the ability to handle many clients at a time. An alternate method is to perform non-blocking I/O in a single thread, along with some readiness notification method which tells you when you can read or write more data on a socket.

This article is an introduction to Linux's epoll(7) facility, which is the best readiness notification facility in Linux. We will write sample code for a complete TCP server implementation in C. I assume you have C programming experience, know how to compile and run programs on Linux, and can read manpages of the various C functions that are used.

epoll was introduced in Linux 2.6, and is not available in other UNIX-like operating systems. It provides a facility similar to the select(2) and poll(2) functions:

  • select(2) can monitor up to FD_SETSIZE number of descriptors at a time, typically a small number determined at libc's compile time.
  • poll(2) doesn't have a fixed limit of descriptors it can monitor at a time, but apart from other things, even we have to perform a linear scan of all the passed descriptors every time to check readiness notification, which is O(n) and slow.

epoll has no such fixed limits, and does not perform any linear scans. Hence it is able to perform better and handle a larger number of events.

An epoll instance is created by epoll_create(2) or epoll_create1(2) (they take different arguments), which return an epoll instance. epoll_ctl(2) is used to add/remove descriptors to be watched on the epoll instance. To wait for events on the watched set, epoll_wait(2) is used, which blocks until events are available. Please see their manpages for more info.

When descriptors are added to an epoll instance, they can be added in two modes: level triggered and edge triggered. When you use level triggered mode, and data is available for reading, epoll_wait(2) will always return with ready events. If you don't read the data completely, and call epoll_wait(2) on the epoll instance watching the descriptor again, it will return again with a ready event because data is available. In edge triggered mode, you will only get a readiness notfication once. If you don't read the data fully, and call epoll_wait(2) on the epoll instance watching the descriptor again, it will block because the readiness event was already delivered.

The epoll event structure that you pass to epoll_ctl(2) is shown below. With every descriptor being watched, you can associate an integer or a pointer as user data.

typedef union epoll_data
{
  void        *ptr;
  int          fd;
  __uint32_t   u32;
  __uint64_t   u64;
} epoll_data_t;

struct epoll_event
{
  __uint32_t   events; /* Epoll events */
  epoll_data_t data;   /* User data variable */
};

Let's write code now. We'll implement a tiny TCP server that prints everything sent to the socket on standard output. We'll begin by writing a function create_and_bind() which creates and binds a TCP socket:

static int
create_and_bind (char *port)
{
  struct addrinfo hints;
  struct addrinfo *result, *rp;
  int s, sfd;

  memset (&hints, 0, sizeof (struct addrinfo));
  hints.ai_family = AF_UNSPEC;     /* Return IPv4 and IPv6 choices */
  hints.ai_socktype = SOCK_STREAM; /* We want a TCP socket */
  hints.ai_flags = AI_PASSIVE;     /* All interfaces */

  s = getaddrinfo (NULL, port, &hints, &result);
  if (s != 0)
    {
      fprintf (stderr, "getaddrinfo: %s\n", gai_strerror (s));
      return -1;
    }

  for (rp = result; rp != NULL; rp = rp->ai_next)
    {
      sfd = socket (rp->ai_family, rp->ai_socktype, rp->ai_protocol);
      if (sfd == -1)
        continue;

      s = bind (sfd, rp->ai_addr, rp->ai_addrlen);
      if (s == 0)
        {
          /* We managed to bind successfully! */
          break;
        }

      close (sfd);
    }

  if (rp == NULL)
    {
      fprintf (stderr, "Could not bind\n");
      return -1;
    }

  freeaddrinfo (result);

  return sfd;
}

create_and_bind() contains a standard code block for a portable way of getting a IPv4 or IPv6 socket. It accepts a port argument as a string, where argv[1] can be passed. The getaddrinfo(3) function returns a bunch of addrinfo structures in result, which are compatible with the hints passed in the hints argument. The addrinfo struct looks like this:

struct addrinfo
{
  int              ai_flags;
  int              ai_family;
  int              ai_socktype;
  int              ai_protocol;
  size_t           ai_addrlen;
  struct sockaddr *ai_addr;
  char            *ai_canonname;
  struct addrinfo *ai_next;
};

We walk through the structures one by one and try creating sockets using them, until we are able to both create and bind a socket. If we were successful, create_and_bind() returns the socket descriptor. If unsuccessful, it returns -1.

Next, let's write a function to make a socket non-blocking. make_socket_non_blocking() sets the O_NONBLOCK flag on the descriptor passed in the sfd argument:

static int
make_socket_non_blocking (int sfd)
{
  int flags, s;

  flags = fcntl (sfd, F_GETFL, 0);
  if (flags == -1)
    {
      perror ("fcntl");
      return -1;
    }

  flags |= O_NONBLOCK;
  s = fcntl (sfd, F_SETFL, flags);
  if (s == -1)
    {
      perror ("fcntl");
      return -1;
    }

  return 0;
}

你可能感兴趣的:(example)