I/O复用技术是:把我们关注的描述符组成一个描述符表(通常不止一个描述符),调用I/O复用函数(select/poll/epoll),当描述符表中有可进行非阻塞I/O操作的描述符时,复用函数返回;否则阻塞复用函数,直到描述符表中有可进行非阻塞I/O操作的描述符出现时,才唤醒进程继续执行复用函数;当复用函数正常返回时,就可以知道哪些描述符可进行非阻塞I/O操作。
I/O复用的描述符通常包括:终端/伪终端,pipes,socket等
I/O复用函数主要过程:
1.遍历描述符表,判断该描述符表中是否有描述符可进行非阻塞I/O操作(读、写、异常等);
2.如果描述符表中有描述符可进行非阻塞I/O操作,I/O复用函数通知用户进程这些描述符;
3.如果描述符表中没有描述符可进行非阻塞I/O操作,那么I/O复用函数被阻塞,并将进程添加到描述符表中所有描述符的poll等待队列中
4.当有描述符可进行非阻塞I/O操作时,内核唤醒该描述符poll等待队列中的阻塞进程;进程唤醒后继续执行I/O复用函数,I/O复用函数将进程从描述符表中所有描述符的poll等待队列中移除;然后重新遍历描述符表
I.select
select是I/O复用函数之一,其原型为:
int select(int nfds, fd_set *restrict , fd_set *restrict writefds, fd_set *restrict errorfds, struct timeval *restrict timeout);
输入参数:
1、readfds、writefds、errorfds:我们所关心的可进行非阻塞读、写、异常操作的描述符表
2、nfds:文件描述符表大小
3、timeout:超时时间(一直、指定时间、不等待),当超时后仍没有文件可进行非阻塞I/O操作时,复用函数返回。
输出参数:
1、readfds、writefds、errorfds:可进行非阻塞读、写、异常操作的描述符表
2、timeout:有超时设置时,超时剩余的时间;即输出timeout=输入timeout-select所花时间
返回值:
0:超时
-1:错误
>0:可进行非阻塞读、写、异常操作的描述符表大小,即readfds、writefds、errorfds中位置1的个数,如一个文件即可进行读也可进行写则会算成2
注:
在输入参数readfds、writefds中有,但是在输入参数errorfds中没有的描述符,当这些描述符表示的文件有错误时,会在输出参数errorfds中包含该描述符,并且I/O复用函数立即返回。
输出参数readfds是输入参数readfds的子集,输出参数writefds是输入参数writefds的子集,输出参数errorfds是输出参数readfds,writefds,errorfds的子集
II.数据结构
80 /* 81 * Scaleable version of the fd_set. 82 */ 83 84 typedef struct { 85 unsigned long *in, *out, *ex; 86 unsigned long *res_in, *res_out, *res_ex; 87 } fd_set_bits; 88 89 /* 90 * How many longwords for "nr" bits? 91 */ 92 #define FDS_BITPERLONG (8*sizeof(long)) 93 #define FDS_LONGS(nr) (((nr)+FDS_BITPERLONG-1)/FDS_BITPERLONG) 94 #define FDS_BYTES(nr) (FDS_LONGS(nr)*sizeof(long))
fd_set_bits:用于标识出可读、可写、异常描述符位图内核内存块起始位置(包括输入和输出);内存块的大小是FDS_BYTES(nfds),与select函数参数nfds的大小对应,FDS_BYTES会对nfds大小做long对齐(即32或64对齐)
fd_set_bits用法如下图:
ii.poll_wqueues
33 typedef struct poll_table_struct { 34 poll_queue_proc qproc; 35 unsigned long key; 36 } poll_table; 50 struct poll_table_entry { 51 struct file *filp; 52 unsigned long key; 53 wait_queue_t wait; 54 wait_queue_head_t *wait_address; 55 }; 56 57 /* 58 * Structures and helpers for sys_poll/sys_poll 59 */ 60 struct poll_wqueues { 61 poll_table pt; 62 struct poll_table_page *table; 63 struct task_struct *polling_task; 64 int triggered; 65 int error; 66 int inline_index; 67 struct poll_table_entry inline_entries[N_INLINE_POLL_ENTRIES]; 68 };
poll_table:对每个文件进行poll操作时,判断是否能够非阻塞的进行key值(poll事件组成)标识的I/O操作;如果不能,调用回调函数qproc将进程添加到文件的poll等待队列中
poll_table_entry:用于阻塞进程并将进程添加到文件的poll等待队列中,一个文件对应一个poll_table_entry
poll_wqueues:用于在select/poll时,如果需要阻塞进程,将进程添加到描述符表标识的所有文件的poll等待队列中,以便任意一个文件可进行非阻塞I/O操作时唤醒进程
iii.进程、打开文件、poll等待队列之间关系图
III.复用函数阻塞/唤醒
i.poll_wqueues的初始化
44 static inline void init_poll_funcptr(poll_table *pt, poll_queue_proc qproc) 45 { 46 pt->qproc = qproc; 47 pt->key = ~0UL; /* all events enabled */ 48 } 116 void poll_initwait(struct poll_wqueues *pwq) 117 { 118 init_poll_funcptr(&pwq->pt, __pollwait); 119 pwq->polling_task = current; 120 pwq->triggered = 0; 121 pwq->error = 0; 122 pwq->table = NULL; 123 pwq->inline_index = 0; 124 }
1.将阻塞回调函数设置成__pollwait
2.将阻塞进程设置成当前进程
ii.文件poll阻塞
1.poll阻塞
当对单个文件执行poll操作时,如果文件不能非阻塞的进行key标识的I/O操作,会将当前进程添加到该文件的poll等待队列中
tcp阻塞f_op->poll:socket_file_ops->sock_poll->inet_stream_ops->tcp_poll->sock_poll_wait->poll_wait
pipe阻塞f_op->poll:write_pipefifo_fops->pipe_poll->poll_wait
38 static inline void poll_wait(struct file * filp, wait_queue_head_t * wait_address, poll_table *p) 39 { 40 if (p && wait_address) 41 p->qproc(filp, wait_address, p); 42 } 216 static void __pollwait(struct file *filp, wait_queue_head_t *wait_address, 217 poll_table *p) 218 { 219 struct poll_wqueues *pwq = container_of(p, struct poll_wqueues, pt); 220 struct poll_table_entry *entry = poll_get_entry(pwq); 221 if (!entry) 222 return; 223 get_file(filp); 224 entry->filp = filp; 225 entry->wait_address = wait_address; 226 entry->key = p->key; 227 init_waitqueue_func_entry(&entry->wait, pollwake); 228 entry->wait.private = pwq; 229 add_wait_queue(wait_address, &entry->wait); 230 }
a.poll_wait中的qproc在poll_initwait时设置成__pollwait;如果poll_table与wait_address非NULL,则调用__poll_wait
b.将poll等待队列的waiter唤醒函数设置成pollwake
c.将poll_table_entry放入wait_address(socket为sock->sk_sleep,pipe为pipe_inode_info->wait)的等待队列中
3.poll_get_entry
98 #define POLL_TABLE_FULL(table) \ 99 ((unsigned long)((table)->entry+1) > PAGE_SIZE + (unsigned long)(table)) 155 static struct poll_table_entry *poll_get_entry(struct poll_wqueues *p) 156 { 157 struct poll_table_page *table = p->table; 158 159 if (p->inline_index < N_INLINE_POLL_ENTRIES) 160 return p->inline_entries + p->inline_index++; 161 162 if (!table || POLL_TABLE_FULL(table)) { 163 struct poll_table_page *new_table; 164 165 new_table = (struct poll_table_page *) __get_free_page(GFP_KERNEL); 166 if (!new_table) { 167 p->error = -ENOMEM; 168 return NULL; 169 } 170 new_table->entry = new_table->entries; 171 new_table->next = table; 172 p->table = new_table; 173 table = new_table; 174 } 175 176 return table->entry++; 177 }
a.poll_get_entry用于获取poll_wqueues中的poll_table_entry
b.如果poll_wqueues的INLINE空间有空闲entry,则从INLINE空间中分配entry
c.如果INLINE空间没有空闲entry,则分配新页帧作为poll_table_page;新poll_table_page插入链表头,以便下次分配只查看首结点就能知道是否有空闲entry
iii.文件poll唤醒
当复用函数被阻塞后,如果有异步事件出现而使文件能非阻塞的进行key标识的I/O操作时,会调用wake_up_interruptible_sync_poll唤醒被阻塞的复用函数
tcp数据接收事件唤醒复用函数:tcp_protocol->tcp_v4_rcv->tcp_v4_do_rcv->tcp_rcv_established->sk_data_ready->sock_def_readable->wake_up_interruptible_sync_poll
pipe写事件唤醒复用函数:pipe_write->wake_up_interruptible_sync_poll
164 #define wake_up_interruptible_sync(x) __wake_up_sync((x), TASK_INTERRUPTIBLE, 1) 5897 /* 5898 * The core wakeup function. Non-exclusive wakeups (nr_exclusive == 0) just 5899 * wake everything up. If it's an exclusive wakeup (nr_exclusive == small +ve 5900 * number) then we wake all the non-exclusive tasks and one exclusive task. 5901 * 5902 * There are circumstances in which we can try to wake a task which has already 5903 * started to run but is not in state TASK_RUNNING. try_to_wake_up() returns 5904 * zero in this (rare) case, and we handle it by continuing to scan the queue. 5905 */ 5906 static void __wake_up_common(wait_queue_head_t *q, unsigned int mode, 5907 int nr_exclusive, int wake_flags, void *key) 5908 { 5909 wait_queue_t *curr, *next; 5910 5911 list_for_each_entry_safe(curr, next, &q->task_list, task_list) { 5912 unsigned flags = curr->flags; 5913 5914 if (curr->func(curr, mode, wake_flags, key) && 5915 (flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive) 5916 break; 5917 } 5918 } 5954 /** 5955 * __wake_up_sync_key - wake up threads blocked on a waitqueue. 5956 * @q: the waitqueue 5957 * @mode: which threads 5958 * @nr_exclusive: how many wake-one or wake-many threads to wake up 5959 * @key: opaque value to be passed to wakeup targets 5960 * 5961 * The sync wakeup differs that the waker knows that it will schedule 5962 * away soon, so while the target thread will be woken up, it will not 5963 * be migrated to another CPU - ie. the two threads are 'synchronized' 5964 * with each other. This can prevent needless bouncing between CPUs. 5965 * 5966 * On UP it can prevent extra preemption. 5967 * 5968 * It may be assumed that this function implies a write memory barrier before 5969 * changing the task state if and only if any tasks are woken up. 5970 */ 5971 void __wake_up_sync_key(wait_queue_head_t *q, unsigned int mode, 5972 int nr_exclusive, void *key) 5973 { 5974 unsigned long flags; 5975 int wake_flags = WF_SYNC; 5976 5977 if (unlikely(!q)) 5978 return; 5979 5980 if (unlikely(!nr_exclusive)) 5981 wake_flags = 0; 5982 5983 spin_lock_irqsave(&q->lock, flags); 5984 __wake_up_common(q, mode, nr_exclusive, wake_flags, key); 5985 spin_unlock_irqrestore(&q->lock, flags); 5986 } 5987 EXPORT_SYMBOL_GPL(__wake_up_sync_key); 5988 5989 /* 5990 * __wake_up_sync - see __wake_up_sync_key() 5991 */ 5992 void __wake_up_sync(wait_queue_head_t *q, unsigned int mode, int nr_exclusive) 5993 { 5994 __wake_up_sync_key(q, mode, nr_exclusive, NULL); 5995 } 5996 EXPORT_SYMBOL_GPL(__wake_up_sync); /* For internal use only */
wake_up_interruptible_sync同步唤醒一个等待进程
__wake_up_common中的func是由__pollwait设置成的pollwake
179 static int __pollwake(wait_queue_t *wait, unsigned mode, int sync, void *key) 180 { 181 struct poll_wqueues *pwq = wait->private; 182 DECLARE_WAITQUEUE(dummy_wait, pwq->polling_task); 183 184 /* 185 * Although this function is called under waitqueue lock, LOCK 186 * doesn't imply write barrier and the users expect write 187 * barrier semantics on wakeup functions. The following 188 * smp_wmb() is equivalent to smp_wmb() in try_to_wake_up() 189 * and is paired with set_mb() in poll_schedule_timeout. 190 */ 191 smp_wmb(); 192 pwq->triggered = 1; 193 194 /* 195 * Perform the default wake up operation using a dummy 196 * waitqueue. 197 * 198 * TODO: This is hacky but there currently is no interface to 199 * pass in @sync. @sync is scheduled to be removed and once 200 * that happens, wake_up_process() can be used directly. 201 */ 202 return default_wake_function(&dummy_wait, mode, sync, key); 203 } 204 205 static int pollwake(wait_queue_t *wait, unsigned mode, int sync, void *key) 206 { 207 struct poll_table_entry *entry; 208 209 entry = container_of(wait, struct poll_table_entry, wait); 210 if (key && !((unsigned long)key & entry->key)) 211 return 0; 212 return __pollwake(wait, mode, sync, key); 213 }
1.如果唤醒进程的事件不是复用函数所关心的事件,则不会去唤醒复用函数;如果是关心的事件,则调用__pollwake唤醒复用函数
2.将triggered置1 ;在遍历完复用函数所提供的文件描述符表后,如果没有满足的描述符时会阻塞进程;但是如果已经遍历的文件在遍历的过程中,有异步事件出现而使文件能非阻塞的进行key标识的I/O操作时,则不会去阻塞进程;阻塞进程时会检查triggered标识,如果是0才会去阻塞进程 ,否则不会去阻塞。
3.通过default_wake_function->try_to_wake_up唤醒复用函数的调用进程(当进程已经是TASK_RUNNING时,则直接返回)
iv.poll_wqueues释放
127 static void free_poll_entry(struct poll_table_entry *entry) 128 { 129 remove_wait_queue(entry->wait_address, &entry->wait); 130 fput(entry->filp); 131 } 132 133 void poll_freewait(struct poll_wqueues *pwq) 134 { 135 struct poll_table_page * p = pwq->table; 136 int i; 137 for (i = 0; i < pwq->inline_index; i++) 138 free_poll_entry(pwq->inline_entries + i); 139 while (p) { 140 struct poll_table_entry * entry; 141 struct poll_table_page *old; 142 143 entry = p->entry; 144 do { 145 entry--; 146 free_poll_entry(entry); 147 } while (entry > p->entries); 148 old = p; 149 p = p->next; 150 free_page((unsigned long) old); 151 } 152 }
1.由于在遍历描述符表中文件的过程中,不知道未遍历到的文件能否非阻塞的进行key标识的I/O操作;所以当前文件不能非阻塞的进行key标识的I/O操作时,就会将进程添加到文件的poll等待队列中,以便后续文件不能非阻塞的进行I/O操作时不用再遍历描述符表去将进程添加到文件的poll等待队列中。
2.不管是阻塞被唤醒(进程添加到描述符表中所有文件的poll队列中)还是未阻塞(进程已经添加到描述符表中可进行非阻塞I/O操作文件之前的所有文件的poll队列中),在复用函数退出时,都会调用poll_freewait将poll_wqueues中所有的waiter从文件的等待队列中清空,及释放相应的文件及内存
IV.select实现
i.select
fs/select.c:
596 SYSCALL_DEFINE5(select, int, n, fd_set __user *, inp, fd_set __user *, outp, 597 fd_set __user *, exp, struct timeval __user *, tvp) 598 { 599 struct timespec end_time, *to = NULL; 600 struct timeval tv; 601 int ret; 602 603 if (tvp) { 604 if (copy_from_user(&tv, tvp, sizeof(tv))) 605 return -EFAULT; 606 607 to = &end_time; 608 if (poll_select_set_timeout(to, 609 tv.tv_sec + (tv.tv_usec / USEC_PER_SEC), 610 (tv.tv_usec % USEC_PER_SEC) * NSEC_PER_USEC)) 611 return -EINVAL; 612 } 613 614 ret = core_sys_select(n, inp, outp, exp, to); 615 ret = poll_select_copy_remaining(&end_time, tvp, 1, ret); 616 617 return ret; 618 }
1.将超时参数timeout由用户空间复制到内核空间,并将其由timeval类型转换成timespec类型
2.调用core_sys_select
3.计算超时剩余时间
ii.core_sys_select
521 int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp, 522 fd_set __user *exp, struct timespec *end_time) 523 { 524 fd_set_bits fds; 525 void *bits; 526 int ret, max_fds; 527 unsigned int size; 528 struct fdtable *fdt; 529 /* Allocate small arguments on the stack to save memory and be faster */ 530 long stack_fds[SELECT_STACK_ALLOC/sizeof(long)]; 531 532 ret = -EINVAL; 533 if (n < 0) 534 goto out_nofds; 535 536 /* max_fds can increase, so grab it once to avoid race */ 537 rcu_read_lock(); 538 fdt = files_fdtable(current->files); 539 max_fds = fdt->max_fds; 540 rcu_read_unlock(); 541 if (n > max_fds) 542 n = max_fds; 543 544 /* 545 * We need 6 bitmaps (in/out/ex for both incoming and outgoing), 546 * since we used fdset we need to allocate memory in units of 547 * long-words. 548 */ 549 size = FDS_BYTES(n); 550 bits = stack_fds; 551 if (size > sizeof(stack_fds) / 6) { 552 /* Not enough space in on-stack array; must use kmalloc */ 553 ret = -ENOMEM; 554 bits = kmalloc(6 * size, GFP_KERNEL); 555 if (!bits) 556 goto out_nofds; 557 } 558 fds.in = bits; 559 fds.out = bits + size; 560 fds.ex = bits + 2*size; 561 fds.res_in = bits + 3*size; 562 fds.res_out = bits + 4*size; 563 fds.res_ex = bits + 5*size; 564 565 if ((ret = get_fd_set(n, inp, fds.in)) || 566 (ret = get_fd_set(n, outp, fds.out)) || 567 (ret = get_fd_set(n, exp, fds.ex))) 568 goto out; 569 zero_fd_set(n, fds.res_in); 570 zero_fd_set(n, fds.res_out); 571 zero_fd_set(n, fds.res_ex); 572 573 ret = do_select(n, &fds, end_time); 575 if (ret < 0) 576 goto out; 577 if (!ret) { 578 ret = -ERESTARTNOHAND; 579 if (signal_pending(current)) 580 goto out; 581 ret = 0; 582 } 583 584 if (set_fd_set(n, inp, fds.res_in) || 585 set_fd_set(n, outp, fds.res_out) || 586 set_fd_set(n, exp, fds.res_ex)) 587 ret = -EFAULT; 588 589 out: 590 if (bits != stack_fds) 591 kfree(bits); 592 out_nofds: 593 return ret; 594 }
1.输入参数n(描述符表大小)检查,如果大于进程打开的最大描述符,则以打开的最大文件描述符为准
2.将读、写、异常描述符表位图从用户空间复制到内核空间,并由fd_set_bits标识出位图超始位置;并将返回读、写、异常描述符表位图置0
3.调用do_select
4.当有信号产生时返回ERESTARTNOHAND,通知内核信号处理完成后自动重新调用select5.将返回文件描述符表位图由内核空间复制到用户空间
5.将返回文件描述符表位图由内核空间复制到用户空间
iii.do_select
396 int do_select(int n, fd_set_bits *fds, struct timespec *end_time) 397 { 398 ktime_t expire, *to = NULL; 399 struct poll_wqueues table; 400 poll_table *wait; 401 int retval, i, timed_out = 0; 402 unsigned long slack = 0; 403 404 rcu_read_lock(); 405 retval = max_select_fd(n, fds); 406 rcu_read_unlock(); 407 408 if (retval < 0) 409 return retval; 410 n = retval; 411 412 poll_initwait(&table); 413 wait = &table.pt; 414 if (end_time && !end_time->tv_sec && !end_time->tv_nsec) { 415 wait = NULL; 416 timed_out = 1; 417 } 418 419 if (end_time && !timed_out) 420 slack = estimate_accuracy(end_time); 421 422 retval = 0; 423 for (;;) { 424 unsigned long *rinp, *routp, *rexp, *inp, *outp, *exp; 425 426 inp = fds->in; outp = fds->out; exp = fds->ex; 427 rinp = fds->res_in; routp = fds->res_out; rexp = fds->res_ex; 428 429 for (i = 0; i < n; ++rinp, ++routp, ++rexp) { 430 unsigned long in, out, ex, all_bits, bit = 1, mask, j; 431 unsigned long res_in = 0, res_out = 0, res_ex = 0; 432 const struct file_operations *f_op = NULL; 433 struct file *file = NULL; 434 435 in = *inp++; out = *outp++; ex = *exp++; 436 all_bits = in | out | ex; 437 if (all_bits == 0) { 438 i += __NFDBITS; 439 continue; 440 } 441 442 for (j = 0; j < __NFDBITS; ++j, ++i, bit <<= 1) { 443 int fput_needed; 444 if (i >= n) 445 break; 446 if (!(bit & all_bits)) 447 continue; 448 file = fget_light(i, &fput_needed); 449 if (file) { 450 f_op = file->f_op; 451 mask = DEFAULT_POLLMASK; 452 if (f_op && f_op->poll) { 453 wait_key_set(wait, in, out, bit); 454 mask = (*f_op->poll)(file, wait); 455 } 456 fput_light(file, fput_needed); 457 if ((mask & POLLIN_SET) && (in & bit)) { 458 res_in |= bit; 459 retval++; 460 wait = NULL; 461 } 462 if ((mask & POLLOUT_SET) && (out & bit)) { 463 res_out |= bit; 464 retval++; 465 wait = NULL; 466 } 467 if ((mask & POLLEX_SET) && (ex & bit)) { 468 res_ex |= bit; 469 retval++; 470 wait = NULL; 471 } 472 } 473 } 474 if (res_in) 475 *rinp = res_in; 476 if (res_out) 477 *routp = res_out; 478 if (res_ex) 479 *rexp = res_ex; 480 cond_resched(); 481 } 482 wait = NULL; 483 if (retval || timed_out || signal_pending(current)) 484 break; 485 if (table.error) { 486 retval = table.error; 487 break; 488 } 489 490 /* 491 * If this is the first loop and we have a timeout 492 * given, then we convert to ktime_t and set the to 493 * pointer to the expiry value. 494 */ 495 if (end_time && !to) { 496 expire = timespec_to_ktime(*end_time); 497 to = &expire; 498 } 499 500 if (!poll_schedule_timeout(&table, TASK_INTERRUPTIBLE, 501 to, slack)) 502 timed_out = 1; 503 } 504 505 poll_freewait(&table); 506 507 return retval; 508 }
1.取读、写、异常文件描述符表位图中最大文件描述符,并做合法性检查(检查位图中置位的文件是否打开,如果未打开则返回EBADF)
2.初始化poll_wqueues,包括poll_table及poll_table的回调函数等
3.遍历描述符表位图中的文件,并对每个文件进行poll操作
a.如果文件不能非阻塞的进行key标识的I/O操作,则poll会自动将进程添加到文件的poll等待队列中(见poll阻塞);
b.如果文件能非阻塞的进行key标识的I/O操作,则poll返回相应的poll事件,并记录到返回文件描述符位图中;将poll_table置成NULL,即不用再将进程添加到文件的poll等待队列中;即使该文件不能进行非阻塞的I/O操作也不用将进程添加到文件的等待队列中,因为复用函数只需要有一个文件可进行非阻塞I/O操作即可
4.遍历结束后
A.如果描述符表中有文件能非阻塞的进行key标识的I/O操作,则走5
B.如果描述符表中没有文件能非阻塞的进行key标识的I/O操作且未超时
a.triggered=1(在遍历的过程中,异步收到数据等情况时,文件能非阻塞的进行key标识的I/O操作),跳转到步骤3重新遍历描述符表位图
b.triggered=0,则会阻塞进程,被唤醒后跳转到步骤3重新遍历描述符表位图
C.超时,则超时退出,跳转到步骤5释放poll_wqueues
5.释放poll_wqueues
iv.poll_schedule_timeout
232 int poll_schedule_timeout(struct poll_wqueues *pwq, int state, 233 ktime_t *expires, unsigned long slack) 234 { 235 int rc = -EINTR; 236 237 set_current_state(state); 238 if (!pwq->triggered) 239 rc = schedule_hrtimeout_range(expires, slack, HRTIMER_MODE_ABS); 240 __set_current_state(TASK_RUNNING); 241 242 /* 243 * Prepare for the next iteration. 244 * 245 * The following set_mb() serves two purposes. First, it's 246 * the counterpart rmb of the wmb in pollwake() such that data 247 * written before wake up is always visible after wake up. 248 * Second, the full barrier guarantees that triggered clearing 249 * doesn't pass event check of the next iteration. Note that 250 * this problem doesn't exist for the first iteration as 251 * add_wait_queue() has full barrier semantics. 252 */ 253 set_mb(pwq->triggered, 0); 254 255 return rc; 256 }
1.当pwq->triggered=0时,则阻塞进程并超时等待;
2.否则,表示已经有文件触发waiter,不用阻塞进程