linux系统调用select

I/O复用技术是:把我们关注的描述符组成一个描述符表(通常不止一个描述符),调用I/O复用函数(select/poll/epoll),当描述符表中有可进行非阻塞I/O操作的描述符时,复用函数返回;否则阻塞复用函数,直到描述符表中有可进行非阻塞I/O操作的描述符出现时,才唤醒进程继续执行复用函数;当复用函数正常返回时,就可以知道哪些描述符可进行非阻塞I/O操作。
I/O复用的描述符通常包括:终端/伪终端,pipes,socket等

 

I/O复用函数主要过程:
1.遍历描述符表,判断该描述符表中是否有描述符可进行非阻塞I/O操作(读、写、异常等);
2.如果描述符表中有描述符可进行非阻塞I/O操作,I/O复用函数通知用户进程这些描述符;
3.如果描述符表中没有描述符可进行非阻塞I/O操作,那么I/O复用函数被阻塞,并将进程添加到描述符表中所有描述符的poll等待队列中
4.当有描述符可进行非阻塞I/O操作时,内核唤醒该描述符poll等待队列中的阻塞进程;进程唤醒后继续执行I/O复用函数,I/O复用函数将进程从描述符表中所有描述符的poll等待队列中移除;然后重新遍历描述符表

 

I.select
select是I/O复用函数之一,其原型为:

int select(int nfds, fd_set *restrict ,
              fd_set *restrict writefds, fd_set *restrict errorfds,
              struct timeval *restrict timeout);

输入参数:
1、readfds、writefds、errorfds:我们所关心的可进行非阻塞读、写、异常操作的描述符表
2、nfds:文件描述符表大小
3、timeout:超时时间(一直、指定时间、不等待),当超时后仍没有文件可进行非阻塞I/O操作时,复用函数返回。
输出参数:
1、readfds、writefds、errorfds:可进行非阻塞读、写、异常操作的描述符表
2、timeout:有超时设置时,超时剩余的时间;即输出timeout=输入timeout-select所花时间
返回值:
0:超时
-1:错误
>0:可进行非阻塞读、写、异常操作的描述符表大小,即readfds、writefds、errorfds中位置1的个数,如一个文件即可进行读也可进行写则会算成2

注:
  在输入参数readfds、writefds中有,但是在输入参数errorfds中没有的描述符,当这些描述符表示的文件有错误时,会在输出参数errorfds中包含该描述符,并且I/O复用函数立即返回。
  输出参数readfds是输入参数readfds的子集,输出参数writefds是输入参数writefds的子集,输出参数errorfds是输出参数readfds,writefds,errorfds的子集


II.数据结构

i.fd_set_bits
 80 /*
 81  * Scaleable version of the fd_set.
 82  */
 83 
 84 typedef struct {
 85         unsigned long *in, *out, *ex;
 86         unsigned long *res_in, *res_out, *res_ex;
 87 } fd_set_bits;
 88 
 89 /*
 90  * How many longwords for "nr" bits?
 91  */
 92 #define FDS_BITPERLONG  (8*sizeof(long))
 93 #define FDS_LONGS(nr)   (((nr)+FDS_BITPERLONG-1)/FDS_BITPERLONG)
 94 #define FDS_BYTES(nr)   (FDS_LONGS(nr)*sizeof(long))

fd_set_bits:用于标识出可读、可写、异常描述符位图内核内存块起始位置(包括输入和输出);内存块的大小是FDS_BYTES(nfds),与select函数参数nfds的大小对应,FDS_BYTES会对nfds大小做long对齐(即32或64对齐)
fd_set_bits用法如下图:
linux系统调用select_第1张图片
ii.poll_wqueues

 33 typedef struct poll_table_struct {
 34         poll_queue_proc qproc;
 35         unsigned long key;
 36 } poll_table;
 
 50 struct poll_table_entry {
 51         struct file *filp;
 52         unsigned long key;
 53         wait_queue_t wait;
 54         wait_queue_head_t *wait_address;
 55 };
 56 
 57 /*
 58  * Structures and helpers for sys_poll/sys_poll
 59  */
 60 struct poll_wqueues {
 61         poll_table pt;
 62         struct poll_table_page *table;
 63         struct task_struct *polling_task;
 64         int triggered;
 65         int error;
 66         int inline_index;
 67         struct poll_table_entry inline_entries[N_INLINE_POLL_ENTRIES];
 68 };

poll_table:对每个文件进行poll操作时,判断是否能够非阻塞的进行key值(poll事件组成)标识的I/O操作;如果不能,调用回调函数qproc将进程添加到文件的poll等待队列中
poll_table_entry:用于阻塞进程并将进程添加到文件的poll等待队列中,一个文件对应一个poll_table_entry
poll_wqueues:用于在select/poll时,如果需要阻塞进程,将进程添加到描述符表标识的所有文件的poll等待队列中,以便任意一个文件可进行非阻塞I/O操作时唤醒进程

iii.进程、打开文件、poll等待队列之间关系图

linux系统调用select_第2张图片
 

III.复用函数阻塞/唤醒
i.poll_wqueues的初始化

 44 static inline void init_poll_funcptr(poll_table *pt, poll_queue_proc qproc)
 45 {
 46         pt->qproc = qproc;
 47         pt->key   = ~0UL; /* all events enabled */
 48 }

116 void poll_initwait(struct poll_wqueues *pwq)
117 {
118         init_poll_funcptr(&pwq->pt, __pollwait);
119         pwq->polling_task = current;
120         pwq->triggered = 0;
121         pwq->error = 0;
122         pwq->table = NULL;
123         pwq->inline_index = 0;
124 }

1.将阻塞回调函数设置成__pollwait
2.将阻塞进程设置成当前进程


ii.文件poll阻塞

1.poll阻塞
当对单个文件执行poll操作时,如果文件不能非阻塞的进行key标识的I/O操作,会将当前进程添加到该文件的poll等待队列中
tcp阻塞f_op->poll:socket_file_ops->sock_poll->inet_stream_ops->tcp_poll->sock_poll_wait->poll_wait
pipe阻塞f_op->poll:write_pipefifo_fops->pipe_poll->poll_wait

 38 static inline void poll_wait(struct file * filp, wait_queue_head_t * wait_address, poll_table *p)
 39 {
 40         if (p && wait_address)
 41                 p->qproc(filp, wait_address, p);
 42 }

216 static void __pollwait(struct file *filp, wait_queue_head_t *wait_address,
217                                 poll_table *p)
218 {
219         struct poll_wqueues *pwq = container_of(p, struct poll_wqueues, pt);
220         struct poll_table_entry *entry = poll_get_entry(pwq);
221         if (!entry)
222                 return;
223         get_file(filp);
224         entry->filp = filp;
225         entry->wait_address = wait_address;
226         entry->key = p->key;
227         init_waitqueue_func_entry(&entry->wait, pollwake);
228         entry->wait.private = pwq;
229         add_wait_queue(wait_address, &entry->wait);
230 }

a.poll_wait中的qproc在poll_initwait时设置成__pollwait;如果poll_table与wait_address非NULL,则调用__poll_wait
b.将poll等待队列的waiter唤醒函数设置成pollwake
c.将poll_table_entry放入wait_address(socket为sock->sk_sleep,pipe为pipe_inode_info->wait)的等待队列中

 

3.poll_get_entry

 98 #define POLL_TABLE_FULL(table) \
 99         ((unsigned long)((table)->entry+1) > PAGE_SIZE + (unsigned long)(table))
 
155 static struct poll_table_entry *poll_get_entry(struct poll_wqueues *p)
156 {
157         struct poll_table_page *table = p->table;
158 
159         if (p->inline_index < N_INLINE_POLL_ENTRIES)
160                 return p->inline_entries + p->inline_index++;
161 
162         if (!table || POLL_TABLE_FULL(table)) {
163                 struct poll_table_page *new_table;
164 
165                 new_table = (struct poll_table_page *) __get_free_page(GFP_KERNEL);
166                 if (!new_table) {
167                         p->error = -ENOMEM;
168                         return NULL;
169                 }
170                 new_table->entry = new_table->entries;
171                 new_table->next = table;
172                 p->table = new_table;
173                 table = new_table;
174         }
175 
176         return table->entry++;
177 }

a.poll_get_entry用于获取poll_wqueues中的poll_table_entry
b.如果poll_wqueues的INLINE空间有空闲entry,则从INLINE空间中分配entry
c.如果INLINE空间没有空闲entry,则分配新页帧作为poll_table_page;新poll_table_page插入链表头,以便下次分配只查看首结点就能知道是否有空闲entry

 

iii.文件poll唤醒
当复用函数被阻塞后,如果有异步事件出现而使文件能非阻塞的进行key标识的I/O操作时,会调用wake_up_interruptible_sync_poll唤醒被阻塞的复用函数
tcp数据接收事件唤醒复用函数:tcp_protocol->tcp_v4_rcv->tcp_v4_do_rcv->tcp_rcv_established->sk_data_ready->sock_def_readable->wake_up_interruptible_sync_poll
pipe写事件唤醒复用函数:pipe_write->wake_up_interruptible_sync_poll

164 #define wake_up_interruptible_sync(x)   __wake_up_sync((x), TASK_INTERRUPTIBLE, 1)

5897 /*
 5898  * The core wakeup function. Non-exclusive wakeups (nr_exclusive == 0) just
 5899  * wake everything up. If it's an exclusive wakeup (nr_exclusive == small +ve
 5900  * number) then we wake all the non-exclusive tasks and one exclusive task.
 5901  *
 5902  * There are circumstances in which we can try to wake a task which has already
 5903  * started to run but is not in state TASK_RUNNING. try_to_wake_up() returns
 5904  * zero in this (rare) case, and we handle it by continuing to scan the queue.
 5905  */
 5906 static void __wake_up_common(wait_queue_head_t *q, unsigned int mode,
 5907                         int nr_exclusive, int wake_flags, void *key)
 5908 {
 5909         wait_queue_t *curr, *next;
 5910 
 5911         list_for_each_entry_safe(curr, next, &q->task_list, task_list) {
 5912                 unsigned flags = curr->flags;
 5913 
 5914                 if (curr->func(curr, mode, wake_flags, key) &&
 5915                                 (flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive)
 5916                         break;
 5917         }
 5918 }
 5954 /**
 5955  * __wake_up_sync_key - wake up threads blocked on a waitqueue.
 5956  * @q: the waitqueue
 5957  * @mode: which threads
 5958  * @nr_exclusive: how many wake-one or wake-many threads to wake up
 5959  * @key: opaque value to be passed to wakeup targets
 5960  *
 5961  * The sync wakeup differs that the waker knows that it will schedule
 5962  * away soon, so while the target thread will be woken up, it will not
 5963  * be migrated to another CPU - ie. the two threads are 'synchronized'
 5964  * with each other. This can prevent needless bouncing between CPUs.
 5965  *
 5966  * On UP it can prevent extra preemption.
 5967  *
 5968  * It may be assumed that this function implies a write memory barrier before
 5969  * changing the task state if and only if any tasks are woken up.
 5970  */
 5971 void __wake_up_sync_key(wait_queue_head_t *q, unsigned int mode,
 5972                         int nr_exclusive, void *key)
 5973 {
 5974         unsigned long flags;
 5975         int wake_flags = WF_SYNC;
 5976 
 5977         if (unlikely(!q))
 5978                 return;
 5979 
 5980         if (unlikely(!nr_exclusive))
 5981                 wake_flags = 0;
 5982 
 5983         spin_lock_irqsave(&q->lock, flags);
 5984         __wake_up_common(q, mode, nr_exclusive, wake_flags, key);
 5985         spin_unlock_irqrestore(&q->lock, flags);
 5986 }
 5987 EXPORT_SYMBOL_GPL(__wake_up_sync_key);
 5988 
 5989 /*
 5990  * __wake_up_sync - see __wake_up_sync_key()
 5991  */
 5992 void __wake_up_sync(wait_queue_head_t *q, unsigned int mode, int nr_exclusive)
 5993 {
 5994         __wake_up_sync_key(q, mode, nr_exclusive, NULL);
 5995 }
 5996 EXPORT_SYMBOL_GPL(__wake_up_sync);      /* For internal use only */

wake_up_interruptible_sync同步唤醒一个等待进程
__wake_up_common中的func是由__pollwait设置成的pollwake

179 static int __pollwake(wait_queue_t *wait, unsigned mode, int sync, void *key)
180 {
181         struct poll_wqueues *pwq = wait->private;
182         DECLARE_WAITQUEUE(dummy_wait, pwq->polling_task);
183 
184         /*
185          * Although this function is called under waitqueue lock, LOCK
186          * doesn't imply write barrier and the users expect write
187          * barrier semantics on wakeup functions.  The following
188          * smp_wmb() is equivalent to smp_wmb() in try_to_wake_up()
189          * and is paired with set_mb() in poll_schedule_timeout.
190          */
191         smp_wmb();
192         pwq->triggered = 1;
193 
194         /*
195          * Perform the default wake up operation using a dummy
196          * waitqueue.
197          *
198          * TODO: This is hacky but there currently is no interface to
199          * pass in @sync.  @sync is scheduled to be removed and once
200          * that happens, wake_up_process() can be used directly.
201          */
202         return default_wake_function(&dummy_wait, mode, sync, key);
203 }
204 
205 static int pollwake(wait_queue_t *wait, unsigned mode, int sync, void *key)
206 {
207         struct poll_table_entry *entry;
208 
209         entry = container_of(wait, struct poll_table_entry, wait);
210         if (key && !((unsigned long)key & entry->key))
211                 return 0;
212         return __pollwake(wait, mode, sync, key);
213 }

1.如果唤醒进程的事件不是复用函数所关心的事件,则不会去唤醒复用函数;如果是关心的事件,则调用__pollwake唤醒复用函数
2.将triggered置1 ;在遍历完复用函数所提供的文件描述符表后,如果没有满足的描述符时会阻塞进程;但是如果已经遍历的文件在遍历的过程中,有异步事件出现而使文件能非阻塞的进行key标识的I/O操作时,则不会去阻塞进程;阻塞进程时会检查triggered标识,如果是0才会去阻塞进程 ,否则不会去阻塞。
3.通过default_wake_function->try_to_wake_up唤醒复用函数的调用进程(当进程已经是TASK_RUNNING时,则直接返回)

 

iv.poll_wqueues释放

127 static void free_poll_entry(struct poll_table_entry *entry)
128 {
129         remove_wait_queue(entry->wait_address, &entry->wait);
130         fput(entry->filp);
131 }
132 
133 void poll_freewait(struct poll_wqueues *pwq)
134 {
135         struct poll_table_page * p = pwq->table;
136         int i;
137         for (i = 0; i < pwq->inline_index; i++)
138                 free_poll_entry(pwq->inline_entries + i);
139         while (p) {
140                 struct poll_table_entry * entry;
141                 struct poll_table_page *old;
142 
143                 entry = p->entry;
144                 do {
145                         entry--;
146                         free_poll_entry(entry);
147                 } while (entry > p->entries);
148                 old = p;
149                 p = p->next;
150                 free_page((unsigned long) old);
151         }
152 }

1.由于在遍历描述符表中文件的过程中,不知道未遍历到的文件能否非阻塞的进行key标识的I/O操作;所以当前文件不能非阻塞的进行key标识的I/O操作时,就会将进程添加到文件的poll等待队列中,以便后续文件不能非阻塞的进行I/O操作时不用再遍历描述符表去将进程添加到文件的poll等待队列中。
2.不管是阻塞被唤醒(进程添加到描述符表中所有文件的poll队列中)还是未阻塞(进程已经添加到描述符表中可进行非阻塞I/O操作文件之前的所有文件的poll队列中),在复用函数退出时,都会调用poll_freewait将poll_wqueues中所有的waiter从文件的等待队列中清空,及释放相应的文件及内存

 

IV.select实现
i.select
fs/select.c:

596 SYSCALL_DEFINE5(select, int, n, fd_set __user *, inp, fd_set __user *, outp,
597                 fd_set __user *, exp, struct timeval __user *, tvp)
598 {
599         struct timespec end_time, *to = NULL;
600         struct timeval tv;
601         int ret;
602 
603         if (tvp) {
604                 if (copy_from_user(&tv, tvp, sizeof(tv)))
605                         return -EFAULT;
606 
607                 to = &end_time;
608                 if (poll_select_set_timeout(to,
609                                 tv.tv_sec + (tv.tv_usec / USEC_PER_SEC),
610                                 (tv.tv_usec % USEC_PER_SEC) * NSEC_PER_USEC))
611                         return -EINVAL;
612         }
613 
614         ret = core_sys_select(n, inp, outp, exp, to);
615         ret = poll_select_copy_remaining(&end_time, tvp, 1, ret);
616 
617         return ret;
618 }

1.将超时参数timeout由用户空间复制到内核空间,并将其由timeval类型转换成timespec类型
2.调用core_sys_select
3.计算超时剩余时间

ii.core_sys_select

521 int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp,
522                            fd_set __user *exp, struct timespec *end_time)
523 {
524         fd_set_bits fds;
525         void *bits;
526         int ret, max_fds;
527         unsigned int size;
528         struct fdtable *fdt;
529         /* Allocate small arguments on the stack to save memory and be faster */
530         long stack_fds[SELECT_STACK_ALLOC/sizeof(long)];
531 
532         ret = -EINVAL;
533         if (n < 0)
534                 goto out_nofds;
535 
536         /* max_fds can increase, so grab it once to avoid race */
537         rcu_read_lock();
538         fdt = files_fdtable(current->files);
539         max_fds = fdt->max_fds;
540         rcu_read_unlock();
541         if (n > max_fds)
542                 n = max_fds;
543 
544         /*
545          * We need 6 bitmaps (in/out/ex for both incoming and outgoing),
546          * since we used fdset we need to allocate memory in units of
547          * long-words. 
548          */
549         size = FDS_BYTES(n);
550         bits = stack_fds;
551         if (size > sizeof(stack_fds) / 6) {
552                 /* Not enough space in on-stack array; must use kmalloc */
553                 ret = -ENOMEM;
554                 bits = kmalloc(6 * size, GFP_KERNEL);
555                 if (!bits)
556                         goto out_nofds;
557         }
558         fds.in      = bits;
559         fds.out     = bits +   size;
560         fds.ex      = bits + 2*size;
561         fds.res_in  = bits + 3*size;
562         fds.res_out = bits + 4*size;
563         fds.res_ex  = bits + 5*size;
564 
565         if ((ret = get_fd_set(n, inp, fds.in)) ||
566             (ret = get_fd_set(n, outp, fds.out)) ||
567             (ret = get_fd_set(n, exp, fds.ex)))
568                 goto out;
569         zero_fd_set(n, fds.res_in);
570         zero_fd_set(n, fds.res_out);
571         zero_fd_set(n, fds.res_ex);
572 
573         ret = do_select(n, &fds, end_time);
575         if (ret < 0)
576                 goto out;
577         if (!ret) {
578                 ret = -ERESTARTNOHAND;
579                 if (signal_pending(current))
580                         goto out;
581                 ret = 0;
582         }
583 
584         if (set_fd_set(n, inp, fds.res_in) ||
585             set_fd_set(n, outp, fds.res_out) ||
586             set_fd_set(n, exp, fds.res_ex))
587                 ret = -EFAULT;
588 
589 out:
590         if (bits != stack_fds)
591                 kfree(bits);
592 out_nofds:
593         return ret;
594 }

1.输入参数n(描述符表大小)检查,如果大于进程打开的最大描述符,则以打开的最大文件描述符为准
2.将读、写、异常描述符表位图从用户空间复制到内核空间,并由fd_set_bits标识出位图超始位置;并将返回读、写、异常描述符表位图置0
3.调用do_select
4.当有信号产生时返回ERESTARTNOHAND,通知内核信号处理完成后自动重新调用select5.将返回文件描述符表位图由内核空间复制到用户空间

5.将返回文件描述符表位图由内核空间复制到用户空间

 

iii.do_select

396 int do_select(int n, fd_set_bits *fds, struct timespec *end_time)
397 {
398         ktime_t expire, *to = NULL;
399         struct poll_wqueues table;
400         poll_table *wait;
401         int retval, i, timed_out = 0;
402         unsigned long slack = 0;
403 
404         rcu_read_lock();
405         retval = max_select_fd(n, fds);
406         rcu_read_unlock();
407 
408         if (retval < 0)
409                 return retval;
410         n = retval;
411 
412         poll_initwait(&table);
413         wait = &table.pt;
414         if (end_time && !end_time->tv_sec && !end_time->tv_nsec) {
415                 wait = NULL;
416                 timed_out = 1;
417         }
418 
419         if (end_time && !timed_out)
420                 slack = estimate_accuracy(end_time);
421 
422         retval = 0;
423         for (;;) {
424                 unsigned long *rinp, *routp, *rexp, *inp, *outp, *exp;
425 
426                 inp = fds->in; outp = fds->out; exp = fds->ex;
427                 rinp = fds->res_in; routp = fds->res_out; rexp = fds->res_ex;
428 
429                 for (i = 0; i < n; ++rinp, ++routp, ++rexp) {
430                         unsigned long in, out, ex, all_bits, bit = 1, mask, j;
431                         unsigned long res_in = 0, res_out = 0, res_ex = 0;
432                         const struct file_operations *f_op = NULL;
433                         struct file *file = NULL;
434 
435                         in = *inp++; out = *outp++; ex = *exp++;
436                         all_bits = in | out | ex;
437                         if (all_bits == 0) {
438                                 i += __NFDBITS;
439                                 continue;
440                         }
441 
442                         for (j = 0; j < __NFDBITS; ++j, ++i, bit <<= 1) {
443                                 int fput_needed;
444                                 if (i >= n)
445                                         break;
446                                 if (!(bit & all_bits))
447                                         continue;
448                                 file = fget_light(i, &fput_needed);
449                                 if (file) {
450                                         f_op = file->f_op;
451                                         mask = DEFAULT_POLLMASK;
452                                         if (f_op && f_op->poll) {
453                                                 wait_key_set(wait, in, out, bit);
454                                                 mask = (*f_op->poll)(file, wait);
455                                         }
456                                         fput_light(file, fput_needed);
457                                         if ((mask & POLLIN_SET) && (in & bit)) {
458                                                 res_in |= bit;
459                                                 retval++;
460                                                 wait = NULL;
461                                         }
462                                         if ((mask & POLLOUT_SET) && (out & bit)) {
463                                                 res_out |= bit;
464                                                 retval++;
465                                                 wait = NULL;
466                                         }
467                                         if ((mask & POLLEX_SET) && (ex & bit)) {
468                                                 res_ex |= bit;
469                                                 retval++;
470                                                 wait = NULL;
471                                         }
472                                 }
473                         }
474                         if (res_in)
475                                 *rinp = res_in;
476                         if (res_out)
477                                 *routp = res_out;
478                         if (res_ex)
479                                 *rexp = res_ex;
480                         cond_resched();
481                 }
482                 wait = NULL;
483                 if (retval || timed_out || signal_pending(current))
484                         break;
485                 if (table.error) {
486                         retval = table.error;
487                         break;
488                 }
489 
490                 /*
491                  * If this is the first loop and we have a timeout
492                  * given, then we convert to ktime_t and set the to
493                  * pointer to the expiry value.
494                  */
495                 if (end_time && !to) {
496                         expire = timespec_to_ktime(*end_time);
497                         to = &expire;
498                 }
499 
500                 if (!poll_schedule_timeout(&table, TASK_INTERRUPTIBLE,
501                                            to, slack))
502                         timed_out = 1;
503         }
504 
505         poll_freewait(&table);
506 
507         return retval;
508 }

1.取读、写、异常文件描述符表位图中最大文件描述符,并做合法性检查(检查位图中置位的文件是否打开,如果未打开则返回EBADF)
2.初始化poll_wqueues,包括poll_table及poll_table的回调函数等
3.遍历描述符表位图中的文件,并对每个文件进行poll操作
  a.如果文件不能非阻塞的进行key标识的I/O操作,则poll会自动将进程添加到文件的poll等待队列中(见poll阻塞);
  b.如果文件能非阻塞的进行key标识的I/O操作,则poll返回相应的poll事件,并记录到返回文件描述符位图中;将poll_table置成NULL,即不用再将进程添加到文件的poll等待队列中;即使该文件不能进行非阻塞的I/O操作也不用将进程添加到文件的等待队列中,因为复用函数只需要有一个文件可进行非阻塞I/O操作即可
4.遍历结束后
  A.如果描述符表中有文件能非阻塞的进行key标识的I/O操作,则走5
  B.如果描述符表中没有文件能非阻塞的进行key标识的I/O操作且未超时
    a.triggered=1(在遍历的过程中,异步收到数据等情况时,文件能非阻塞的进行key标识的I/O操作),跳转到步骤3重新遍历描述符表位图
    b.triggered=0,则会阻塞进程,被唤醒后跳转到步骤3重新遍历描述符表位图
  C.超时,则超时退出,跳转到步骤5释放poll_wqueues
5.释放poll_wqueues

iv.poll_schedule_timeout

232 int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
233                           ktime_t *expires, unsigned long slack)
234 {
235         int rc = -EINTR;
236 
237         set_current_state(state);
238         if (!pwq->triggered)
239                 rc = schedule_hrtimeout_range(expires, slack, HRTIMER_MODE_ABS);
240         __set_current_state(TASK_RUNNING);
241 
242         /*
243          * Prepare for the next iteration.
244          *
245          * The following set_mb() serves two purposes.  First, it's
246          * the counterpart rmb of the wmb in pollwake() such that data
247          * written before wake up is always visible after wake up.
248          * Second, the full barrier guarantees that triggered clearing
249          * doesn't pass event check of the next iteration.  Note that
250          * this problem doesn't exist for the first iteration as
251          * add_wait_queue() has full barrier semantics.
252          */
253         set_mb(pwq->triggered, 0);
254 
255         return rc;
256 }

1.当pwq->triggered=0时,则阻塞进程并超时等待;
2.否则,表示已经有文件触发waiter,不用阻塞进程

你可能感兴趣的:(select)