IO multiplexing (select, poll, epoll)

Keywords: C C++ Linux Back-end

The TCP server can be explained with the following diagram:
Welcome is equivalent to listenfd, which has been listening to the client's requests. When a customer arrives, guide the customer to the waiter in the lobby. At this time, a connection is established. The lobby waiter is equivalent to connfd, and all interactions with the current client are conducted by connfd.

The three handshakes for establishing a connection are completed in the protocol stack, do not occur in any api of the server, and are not controlled by the application

IO multiplexing

Three implementation modes of IO multiplexing: select, poll and epoll

select function

Definition: this function allows the process to instruct the kernel to wait for any one of multiple events to occur, and wake up the function only after one or more events occur or go through a specified period of time.

For example, call select to tell the kernel to return only when:

  • Any descriptor in the set {1,4,5} is ready to read
  • Any descriptor in the set {2,7} is ready to write
  • Any descriptor in the set {1,4} has an exception condition pending
  • The set timeout has been exceeded

Call select to tell the kernel which file descriptors (read / write / exception) are interested in and how long to wait

#include <sys/select.h>
#include <sys/time.h>

int select(int maxfdp1, fd_set *readset, fd_set *writeset, fd_set *exceptset,
			const struct timeval *timeout);

Function parameters (5 parameters in total)

  1. timeout – tells the kernel how long it can take to wait for any of the specified descriptors to be ready.
    The timeval structure is used to specify the number of seconds and microseconds during this period
    struct timeval {
    long tv_sec / sec/
    long tv_usev / microsecond/
    There are three possibilities for this parameter:
    (1) Wait forever (blocking): returns only when a descriptor is ready for I/O (read / write). Set this parameter to a null pointer
    (2) Wait for a fixed period of time: it returns when a descriptor is ready for I/O, but it cannot exceed the set timeout
    (3) Don't wait at all: return immediately after checking the descriptor, which is called polling.
  2. The three parameters in the middle are readset, writeset and excepset
    Specify the set of descriptors that we want the kernel to test read, write and exception conditions; If you are not interested in a condition, you can set the variable to a null pointer.
    We allocate fd_set the descriptor set of the data type and set or test each bit in the set with the following macros
void FD_ZERO(fd_set *fdset); //Clear all bits in fdset
void FD_SET(int fd, fd_set *fdset); // Add fd to fdset set
void FD_CLR(int fd, fd_set *fdset); // Clear fd from fdset set
void FD_ISSET(int fd, fd_set *fdset); // Determine whether fd is in the fdset set
  1. maxfdp1 specifies the number of descriptors to be tested
    Its value is the maximum file descriptor to be tested + 1

The select function modifies the descriptor set pointed to by the pointers readset, writeset and exceptset, so these three parameters are value result parameters.

When we call this function, we specify the values of the descriptors we care about. When the function returns, the result will indicate that those descriptors are ready.
After the function returns, use FD_ISSET macro to test FD_ Descriptor in the set data type. Any bit in the descriptor set corresponding to the non ready descriptor is cleared to 0 when returned. Therefore, each time the select function is called again, the bits of interest in all descriptor sets must be set to 1 again.

Return value
Integer: represents the total number of bits ready for all descriptor sets.
0: indicates the timer expires before any descriptor is ready.
-1: Something went wrong

//todo: I'd like to supplement the code when I'm free

The monitored file descriptors are limited (up to 1024); The file descriptor must be copied from the user state to the kernel state every time listening, which is inefficient; After listening to the returned change file descriptor, you need to traverse the query again; Difficulty in coding
Good portability, Windows also supports

poll function

poll provides functions similar to select

#include <poll.h>
int poll(struct pollfd *fdarray, unsigned long nfds, int timeout);

Function parameters

  1. fdarray – pointer to the first element of a structure array. Each array element is a pollfd structure that specifies the file descriptor to monitor
struct pollfd {
	int fd; // File descriptor
	short events; // Monitoring time
	short revents; // Monitor the events returned if the conditions are met
  1. nfds – monitors how many file descriptors in the array need to be monitored
  2. timeout – millisecond wait
    -1: Blocking, etc
    0: return now
    >0: wait for the specified number of milliseconds. If the accuracy of the current system time is not enough, take the value upward
    Return value
    Number of ready descriptors if any, 0 if timeout, and - 1 if error

//todo: I'd like to supplement the code when I'm free

Similar to select, it has the following advantages over select: there is no limit of 1024 maximum file descriptors; The incoming and outgoing events are separated. It is not necessary to reset the listening event every time you call (the select parameter is the input and output parameter, and the file descriptor requested and returned is a variable)

epoll function

Three API s, the data structure used is red black tree

Basic API

  1. Create an epoll handle. The parameter size is used to tell the kernel the number of file descriptors to listen for, which is related to the memory size
#include <sys/epoll.h>
int epoll_create(int size)
// size: number of listeners
// Return value: successful, non negative file descriptor; Failed, - 1
  1. Controls the time on the file descriptor monitored by an epoll: registration, modification, deletion
    Add the file descriptor to be monitored to the red black tree (the red black tree is very efficient in inserting and deleting elements)
#include <sys/epoll.h>
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event)
// epfd: epoll_create created handle
// op: represents the action, which is represented by three macros
// EPOLL_CTL_ADD (register new fd to epfd)
// EPOLL_CTL_MOD (modify the listening event of registered fd)
// EPOLL_CTL_DEL (delete one FD from epfd)
// event: tells the kernel what events to listen for

struct epoll_event {
	__uint32_t events; /* Epoll events */
	epoll_data_t data; /* User data variable */
typedef union epoll_data {
	void *ptr;
	int fd;
	uint32_t u32;
	uint64_t u64;
} epoll_data_t;
  1. Wait for an event to be generated on the monitored file descriptor (similar to the call of select)
#include <sys/epoll.h>
int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout)
// Events: used to store the collection of events obtained by the kernel, which can be simply regarded as an array.
// Maxevents: tell the kernel how big the events are, and the value of maxevents cannot be greater than the value of epoll created_ size at create(),
// Timeout: timeout
// -1: Block
// 0: return immediately, non blocking
// >0: specify milliseconds

// Return value: successfully returns how many file descriptors are ready, 0 when the time expires, and - 1 when an error occurs

Significantly improve the system CPU utilization of the program in a large number of connections and only a small number of active connections (1000 connections, only listening to 10 file descriptors). It will reuse the file descriptor set to pass the results, instead of forcing the developer to re prepare the file descriptor set to be listened before waiting for events every time (epoll_wait passes in an empty structure array, and the file descriptor with event occurrence is written into the array);
When obtaining events, you do not need to traverse the entire monitored descriptor set, but only those descriptor sets that are asynchronously awakened by kernel IO events and added to the ready queue.

Request 1 thread in server

Posted by egiblock on Sun, 28 Nov 2021 13:28:56 -0800