sys_call.s中系统调用



这是sys_call.s中很关键的代码。system_call是linux 0x80 系统调用的入口处理程序,而后面的ret_from_sys_call则是系统调用后的处理工作,

Linus Torvalds
5 */
6
7 /*
8 * system_call.s contains the system-call low-level handling routines.
9 * This also contains the timer-interrupt handler, as some of the code is
10 * the same. The hd- and flopppy-interrupts are also here.
11 *
12 * NOTE: This code handles signal-recognition, which happens every time
13 * after a timer-interrupt and after each system call. Ordinary interrupts
14 * don't handle signal-recognition, as that would clutter them up totally
15 * unnecessarily.

76 bad_sys_call:                                                                                                                                                                                               
 77         pushl $-ENOSYS                                                                                                                                                                                      
 78         jmp ret_from_sys_call                                                                                                                                                                               
 79 .align 2                                                                                                                                                                                                    
 80 reschedule:                                                                                                                                                                                                 
 81         pushl $ret_from_sys_call                                                                                                                                                                            
 82         jmp _schedule                                                                                                                                                                                       
 83 .align 2                                                                                                                                                                                                    
 84 _system_call:                                                                                                                                                                                               
 85         push %ds                                                                                                                                                                                            
 86         push %es                                                                                                                                                                                            
 87         push %fs                                                                                                                                                                                            
 88         pushl %eax              # save the orig_eax                                                                                                                                                         
 89         pushl %edx                                                                                                                                                                                          
 90         pushl %ecx              # push %ebx,%ecx,%edx as parameters                                                                                                                                         
 91         pushl %ebx              # to the system call                                                                                                                                                        
 92         movl $0x10,%edx         # set up ds,es to kernel space                                                                                                                                              
 93         mov %dx,%ds                                                                                                                                                                                         
 94         mov %dx,%es                                                                                                                                                                                         
 95         movl $0x17,%edx         # fs points to local data space                                                                                                                                             
 96         mov %dx,%fs                                                                                                                                                                                         
 97         cmpl _NR_syscalls,%eax                                                                                                                                                                              
 98         jae bad_sys_call                                                                                                                                                                                    
 99         call _sys_call_table(,%eax,4)                                                                                                                                                                       
100         pushl %eax                                                                                                                                                                                          
101 2:                                                                                                                                                                                                          
102         movl _current,%eax                                                                                                                                                                                  
103         cmpl $0,state(%eax)             # state                                                                                                                                                             
104         jne reschedule                                                                                                                                                                                      
105         cmpl $0,counter(%eax)           # counter                                                                                                                                                           
106         je reschedule                                                                                                                                                                                       
107 ret_from_sys_call:                                                                                                                                                                                          
108         movl _current,%eax                                                                                                                                                                                  
109         cmpl _task,%eax                 # task[0] cannot have signals                                                                                                                                       
110         je 3f                                                                                                                                                                                               
111         cmpw $0x0f,CS(%esp)             # was old code segment supervisor ?                                                                                                                                 
112         jne 3f                                                                                                                                                                                              
113         cmpw $0x17,OLDSS(%esp)          # was stack segment = 0x17 ?                                                                                                                                        
114         jne 3f                                                                                                                                                                                              
115         movl signal(%eax),%ebx                                                                                                                                                                              
116         movl blocked(%eax),%ecx                                                                                                                                                                             
117         notl %ecx                                                                                                                                                                                           
118         andl %ebx,%ecx                                                                                                                                                                                      
119         bsfl %ecx,%ecx                                                                                                                                                                                      
120         je 3f                                                                                                                                                                                               
121         btrl %ecx,%ebx                                                                                                                                                                                      
122         movl %ebx,signal(%eax)                                                                                                                                                                              
123         incl %ecx                                                                                                                                                                                           
124         pushl %ecx                                                                                                                                                                                          
125         call _do_signal                                                                                                                                                                                     
126         popl %ecx                                                                                                                                                                                           
127         testl %eax, %eax                                                                                                                                                                                    
128         jne 2b          # see if we need to switch tasks, or do more signals                                                                                                                                
129 3:      popl %eax                                                                                                                                                                                           
130         popl %ebx                                                                                                                                                                                           
131         popl %ecx                                                                                                                                                                                           
132         popl %edx                                                                                                                                                                                           
133         addl $4, %esp   # skip orig_eax                                                                                                                                                                     
134         pop %fs                                                                                                                                                                                             
135         pop %es                                                                                                                                                                                             
136         pop %ds                                                                                                                                                                                             
137         iret        

linus对这段代码的注释,

以下是赵迥的解释

注意:这段代码处理信号(signal)识别,在每次时钟中断和系统调用之后都会进行识别。一般终端过程并不进行识别,因为会给系统造成混乱

上面linus原注释中一般中断过程是指除系统调用中断(ox80)和时钟中断(int 0x20)以外的其他中断。这些中断会在内核态或用户态随机发生,若这些中断过程也信号识别的话

就有可能与系统调用中断和时钟中断过程对信号的识别处理相冲突,违反了内核代码的非抢占原则。

抢占式内核在中断处理中,当检测到有高优先级任务就绪时,
就会切换到高优先级任务里,而不是等到退出中断后,再进行任务调度。
非抢占式内核在中断处理中,是不会切换到其他任务的,即使时间片已到
linux0.11内核属于非抢占内核,如果一个进程在内核中执行,除非它主动让出,没有人会让它退出的
linux2.6内核属于抢占内核,如果进程在内核中执行,这时,发生时钟中断,判断时间片到,就会调度其他程序执行,所以说linux2.6并不是一个实时系统
  实时系统,一旦高优先级的进程就绪,不管正在内核执行的时间片是否用完,立刻抢占执行。

本来是要吧sys_call和ret_from_sys_call分开来理解的,但是最后觉得还是放在一起来看比较合理。两段程序在内容上看不难理解:

先说sys_call:

系统调用,当某个进程执行系统调用是(int 0x80),不管外层的接口怎么样,cpu最终是执行这一段程序的。

_system_call:                                                                                                                                                                                               
 85         push %ds                                                                                                                                                                                            
 86         push %es                                                                                                                                                                                            
 87         push %fs                                                                                                                                                                                            
 88         pushl %eax              # save the orig_eax                                                                                                                                                        
将当前进程的各个寄存器入栈保存,当系统调用返回是恢复源程序运行,保持了进程在用户态时的现场信息,

               pushl %edx                                                                                                                                                                                          
 90         pushl %ecx              # push %ebx,%ecx,%edx as parameters                                                                                                                                         
 91         pushl %ebx              # to the system call

一个系统调用可最多3个参数,也可以不带入栈的这三个寄存器存放着系统调用相应的C语言函数的调用参数,这几个寄存器的入栈顺序是GNU gcc规定的,eax是用来存放系统调用号的。下面可以看到。

               movl $0x10,%edx         # set up ds,es to kernel space                                                                                                                                              
 93         mov %dx,%ds                                                                                                                                                                                         
 94         mov %dx,%es                                                                                                                                                                                         
 95         movl $0x17,%edx         # fs points to local data space                                                                                                                                             
 96         mov %dx,%fs        
LINUX内核默认把段寄存器ds,es用于内核数据段,而fs用于用户数据段,在系统调用运行过程中,段寄存器ds,es指向内核数据空间,而fs被设置为指向用户数据空间。

至于0x10,0x17是前面的内容,参看段选择符。

               cmpl _NR_syscalls,%eax                                                                                                                                                                              
 98         jae bad_sys_call    
这两句是检测系统调用号是否有效。NR_syscalls是什么?直接吧原码复制来可能最好理解:/include/linux/sys.h

93 fn_ptr sys_call_table[] = { sys_setup, sys_exit, sys_fork, sys_read,
 94 sys_write, sys_open, sys_close, sys_waitpid, sys_creat, sys_link,
 95 sys_unlink, sys_execve, sys_chdir, sys_time, sys_mknod, sys_chmod,
 96 sys_chown, sys_break, sys_stat, sys_lseek, sys_getpid, sys_mount,
 97 sys_umount, sys_setuid, sys_getuid, sys_stime, sys_ptrace, sys_alarm,
 98 sys_fstat, sys_pause, sys_utime, sys_stty, sys_gtty, sys_access,
 99 sys_nice, sys_ftime, sys_sync, sys_kill, sys_rename, sys_mkdir,
100 sys_rmdir, sys_dup, sys_pipe, sys_times, sys_prof, sys_brk, sys_setgid,
101 sys_getgid, sys_signal, sys_geteuid, sys_getegid, sys_acct, sys_phys,
102 sys_lock, sys_ioctl, sys_fcntl, sys_mpx, sys_setpgid, sys_ulimit,
103 sys_uname, sys_umask, sys_chroot, sys_ustat, sys_dup2, sys_getppid,
104 sys_getpgrp, sys_setsid, sys_sigaction, sys_sgetmask, sys_ssetmask,
105 sys_setreuid,sys_setregid, sys_sigsuspend, sys_sigpending, sys_sethostname,
106 sys_setrlimit, sys_getrlimit, sys_getrusage, sys_gettimeofday,
107 sys_settimeofday, sys_getgroups, sys_setgroups, sys_select, sys_symlink,
108 sys_lstat, sys_readlink, sys_uselib };
109
110 /* So we don't have to do any more manual updating.... */
111 int NR_syscalls = sizeof(sys_call_table)/sizeof(fn_ptr);
sys_call_table[]是系统调用数组,NR_syscalls是这个数组的大小,也就是系统调用数组,该数组的各个元素就是每一个系统调用。每个内核版本的系统调用数量是不顾定的,0.11是72个。0.12是82个。那么这两句的意义就可以理解了,eax是系统调用号,这句号是看正在运行程序的系统调用号是不是有效。如过无效就跳转到bad_sys_call

 76         bad_sys_call:                                                                                                                                                                                               
 77         pushl $-ENOSYS                                                                                                                                                                                      
 78         jmp ret_from_sys_call     
这是bad_sys_call   其会调用ret_from_sys_call   这个程序下面理解。

                call _sys_call_table(,%eax,4)                                                                                                                                                                       
100         pushl %eax

这个程序就会调用相应的系统调用。sys_call_table上面已经解释。

2:                                                                                                                                                                                                          
102         movl _current,%eax                                                                                                                                                                                  
103         cmpl $0,state(%eax)             # state                                                                                                                                                             
104         jne reschedule                                                                                                                                                                                      
105         cmpl $0,counter(%eax)           # counter                                                                                                                                                           
106         je reschedule         
这几句程序要先理解task_struct这个结构。(include/linux/sched.h)state是程序的运行状态,counter是程序剩余的运行时间。

 /* -1 unrunnable, 0 runnable, >0 stopped */ 0是就绪状态。

这几行程序表示:如果不再就绪状态(state不等于0)就去执行调度程序。如果该任务就在就绪状态,但是其时间片用完也去执行调度。当执行到jne reschedule时,处于内核态的进程希望主动放弃CPU,实现进程调度。

那么为什么要执行这几行程序呢?

我的理解是:在执行系统调用时,任务是处于内核态的,而早期的linux内核是非抢占式的,也就是在执行系统调用时该任务是不会主动放弃CPU的,除非任务本身主动放弃。

所以在系统调用后及时执行sechedule进行系统调度。

下面分析ret_from_sys_call:

movl _current,%eax                                                                                                                                                                                  
109         cmpl _task,%eax                 # task[0] cannot have signals                                                                                                                                       
110         je 3f                                                                                                                                                                                               
111         cmpw $0x0f,CS(%esp)             # was old code segment supervisor ?                                                                                                                                 
112         jne 3f                                                                                                                                                                                              
113         cmpw $0x17,OLDSS(%esp)          # was stack segment = 0x17 ?                                                                                                                                        
114         jne 3f                                          




你可能感兴趣的:(table,System,Parameters,任务,Signal,linux内核)