linux系统调用execve

exec系列函数主要实现装入新的可执行文件或脚本镜像,并执行;调用后不再返回,而是跳转到新镜像的入口去执行。
exec在linux上均是对execve系统调用的封装,除了下述内容外,进程其它内容均保持不变:
1.将设置了处理函数的信号handler,重置为默认SIG_DFL
2.内存映射,mmap
3.SysV共享内存,shmat
4.POSIX共享内存,shm_open
5.POSIX消息队列,mq_overview
6.POSIX信号量,sem_overview
7.打开的目录,open_dir
8.内存锁,mlock/mlockall
9.exit处理,atexit/on_exit
10.浮点数环境,fenv


execve的主要作用为:
1.分配进程新的地址空间,将环境变量、main参数等拷贝到新地址空间的推栈中;
2.解析可执行文件,将代码、数据装入/映射到内存
3.进程新环境的设置,如关闭设置FD_CLOEXEC的文件等
4.设置execve函数返回到用户态时的执行地址;解析器入口地址或程序的入口地址

I.do_execve
execve系统调用是通过do_execve实现的,其处理流程如下:
1.进程分配并使用自己的文件描述符表,不再使用共享的文件描述符表
2.分配进程新的地址空间mm_struct,并复制老mm_struct的context,并分配一页大小的堆栈内存区。
3.打开可执行文件;检查可执行文件权限,并预读可执行文件的前BINPRM_BUF_SIZE=128字节数据到buf中
4.计算环境变量、main参数个数;并将这些参数复制到新地址空间的堆栈中
5.根据可执行文件的格式,查找对应的handler作后续处理;

do_execve的实现fs/exec.c:

1342 /*
1343  * sys_execve() executes a new program.
1344  */
1345 int do_execve(char * filename,
1346         char __user *__user *argv,
1347         char __user *__user *envp,
1348         struct pt_regs * regs)
1349 {

1355 
1356         retval = unshare_files(&displaced);
1357         if (retval)
1358                 goto out_ret;
1359 

1375         file = open_exec(filename);
1376         retval = PTR_ERR(file);
1377         if (IS_ERR(file))
1378                 goto out_unmark;
1379 

1385 
1386         retval = bprm_mm_init(bprm);
1387         if (retval)
1388                 goto out_file;
1389 
1390         bprm->argc = count(argv, MAX_ARG_STRINGS);
1391         if ((retval = bprm->argc) < 0)
1392                 goto out;
1393 
1394         bprm->envc = count(envp, MAX_ARG_STRINGS);
1395         if ((retval = bprm->envc) < 0)
1396                 goto out;
1397 
1398         retval = prepare_binprm(bprm);
1399         if (retval < 0)
1400                 goto out;
1401 
1402         retval = copy_strings_kernel(1, &bprm->filename, bprm);
1403         if (retval < 0)
1404                 goto out;
1405 
1406         bprm->exec = bprm->p;
1407         retval = copy_strings(bprm->envc, envp, bprm);
1408         if (retval < 0)
1409                 goto out;
1410 
1411         retval = copy_strings(bprm->argc, argv, bprm);
1412         if (retval < 0)
1413                 goto out;
1414 
1415         current->flags &= ~PF_KTHREAD;
1416         retval = search_binary_handler(bprm,regs);
1417         if (retval < 0)
1418                 goto out;
1419 

1453         return retval;
1454 }

字符串复制copy_strings函数:
1、将字符串从当前地址空间复制到新地址空间的堆栈中;
2、复制字符串时,使用的仍是老地址空间页表,所以不能使用新地址空间的堆栈地址直接复制,必须映射堆栈页帧到内核空间后再做复制操作。
3、由于字符串可能跨页,一个字符串可能存在两次页帧映射与复制

 

II.ELF格式文件处理
ELF格式文件处理主要对ELF文件进行解析,将可执行程序的代码及数据装载到内存中;并根据ELF文件信息,设置进程的环境

ELF格式:
linux系统调用execve_第1张图片
ELF的管理信息主要由三部分组成:ELF头、程序头(程序段)表、节区表;一个段的地址空间通常包括一个或多个节区。
如数据段包括数据节区、BSS节区等

linux对ELF格式文件处理代码为load_elf_binary,主要流程如下:
i.根据ELF文件头数据,检查ELF文件的合法性
ii.根据ELF文件头中程序头表偏移和程序头数目,读出程序头表数据。
iii.根据解析器程序头信息,读出解析器可执行文件头数据放入buf中,以便后续处理
iv.清除原exec环境
  1.线程处理
    a.发送SIGKILL到线程组其它线程,杀死除自己以外所有的线程
    b.如果自己不是主线程,继承主线程信息并将自己设置成主线程
    c.删除信号时钟,并清空时钟信号
    d.如果信号处理handler表与其它进程共享,则创建并复制自己的handler表
  2.将可执行文件句柄设置成新的文件句柄
  3.切换地址空间,使用新的地址空间
v.设置新exec环境
  1.设置进程名称
  2.更新信号handler表,非忽略信号设置成行为,并清空mask
  3.关闭有FD_CLOEXEC标识的文件
vi. 堆栈内存区最后处理,更新内存区标识、权限,内存区可能重定位和扩容
vii.将ELF文件中PT_LOAD类型的程序段映射到内存中
  1.根据PT_LOAD类型的程序头信息,将可执行文件中代码段、数据段等通过文件映射方式映射到内存
  2.PT_LOAD一般包括代码段和数据段,数据段虚拟内存大小通常大于文件大小,多出的部分包括bss内存空间等;采用文件映射且程序段虚拟内存大小大于文件大小时,将多出部分清0(最后一个程序段中,多出部分通常采用匿名映射)
  3.设置start_code,end_code,start_data,end_data,elf_bss,elf_brk;一般情况,elf_bss=数据段虚拟地址+文件大小,elf_brk=数据段虚拟地址+虚拟内存大小
viii.对elf_bss与elf_brk之间的部分(BSS等)作匿名映射;
ix.设置exec返回用户空间后,执行代码的地址;如果有解析器(可执行程序通常是/lib/ld-linux.so.2,ld-linux.so.2作用是装载可执行文件所需的动态链接库),装载解析器,并将程序入口设置成解析器的入口;否则设置成镜像的入口点。
x.设置elf_table
  1.将main参数个数,参数字符串地址,NULL写到堆栈中
  2.将环境变量字符串地址,NULL写到堆栈中
  3.设置elf表,表项为(id,val)对;如程序的入口地址等 
xi.设置进程参数,如IP更改为新程序入口地址,堆栈切换;然后返回用户空间执行解析器代码或程序代码


ELF格式文件处理的实现fs/binfmt_elf.c:

563 static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs)
 564 {
 565         struct file *interpreter = NULL; /* to shut gcc up */
 566         unsigned long load_addr = 0, load_bias = 0;
 567         int load_addr_set = 0;
 568         char * elf_interpreter = NULL;
 569         unsigned long error;
 570         struct elf_phdr *elf_ppnt, *elf_phdata;
 571         unsigned long elf_bss, elf_brk;
 572         int retval, i;
 573         unsigned int size;
 574         unsigned long elf_entry;
 575         unsigned long interp_load_addr = 0;
 576         unsigned long start_code, end_code, start_data, end_data;
 577         unsigned long reloc_func_desc = 0;
 578         int executable_stack = EXSTACK_DEFAULT;
 579         unsigned long def_flags = 0;
 580         struct {
 581                 struct elfhdr elf_ex;
 582                 struct elfhdr interp_elf_ex;
 583         } *loc;
 584 
 585         loc = kmalloc(sizeof(*loc), GFP_KERNEL);
 586         if (!loc) {
 587                 retval = -ENOMEM;
 588                 goto out_ret;
 589         }
 590 
 591         /* Get the exec-header */
 592         loc->elf_ex = *((struct elfhdr *)bprm->buf);
 593 
 594         retval = -ENOEXEC;
 595         /* First of all, some simple consistency checks */
 596         if (memcmp(loc->elf_ex.e_ident, ELFMAG, SELFMAG) != 0)
 597                 goto out;
 598 
 599         if (loc->elf_ex.e_type != ET_EXEC && loc->elf_ex.e_type != ET_DYN)
 600                 goto out;
 601         if (!elf_check_arch(&loc->elf_ex))
 602                 goto out;
 603         if (!bprm->file->f_op||!bprm->file->f_op->mmap)
 604                 goto out;
 605 
 606         /* Now read in all of the header information */
 607         if (loc->elf_ex.e_phentsize != sizeof(struct elf_phdr))
 608                 goto out;
 609         if (loc->elf_ex.e_phnum < 1 ||
 610                 loc->elf_ex.e_phnum > 65536U / sizeof(struct elf_phdr))
 611                 goto out;
 612         size = loc->elf_ex.e_phnum * sizeof(struct elf_phdr);
 613         retval = -ENOMEM;
 614         elf_phdata = kmalloc(size, GFP_KERNEL);
 615         if (!elf_phdata)
 616                 goto out;
 617 
 618         retval = kernel_read(bprm->file, loc->elf_ex.e_phoff,
 619                              (char *)elf_phdata, size);
 620         if (retval != size) {
 621                 if (retval >= 0)
 622                         retval = -EIO;
 623                 goto out_free_ph;
 624         }
 625 
 626         elf_ppnt = elf_phdata;
 627         elf_bss = 0;
 628         elf_brk = 0;
 629 
 630         start_code = ~0UL;
 631         end_code = 0;
 632         start_data = 0;
 633         end_data = 0;
 634 
 635         for (i = 0; i < loc->elf_ex.e_phnum; i++) {
 636                 if (elf_ppnt->p_type == PT_INTERP) {
 637                         /* This is the program interpreter used for
 638                          * shared libraries - for now assume that this
 639                          * is an a.out format binary
 640                          */
 641                         retval = -ENOEXEC;
 642                         if (elf_ppnt->p_filesz > PATH_MAX ||
 643                             elf_ppnt->p_filesz < 2)
 644                                 goto out_free_ph;
 645 
 646                         retval = -ENOMEM;
 647                         elf_interpreter = kmalloc(elf_ppnt->p_filesz,
 648                                                   GFP_KERNEL);
 649                         if (!elf_interpreter)
 650                                 goto out_free_ph;
 651 
 652                         retval = kernel_read(bprm->file, elf_ppnt->p_offset,
 653                                              elf_interpreter,
 654                                              elf_ppnt->p_filesz);
 655                         if (retval != elf_ppnt->p_filesz) {
 656                                 if (retval >= 0)
 657                                         retval = -EIO;
 658                                 goto out_free_interp;
 659                         }
 660                         /* make sure path is NULL terminated */
 661                         retval = -ENOEXEC;
 662                         if (elf_interpreter[elf_ppnt->p_filesz - 1] != '\0')
 663                                 goto out_free_interp;
 664 
 665                         interpreter = open_exec(elf_interpreter);
 666                         retval = PTR_ERR(interpreter);
 667                         if (IS_ERR(interpreter))
 668                                 goto out_free_interp;
 669 
 670                         /*
 671                          * If the binary is not readable then enforce
 672                          * mm->dumpable = 0 regardless of the interpreter's
 673                          * permissions.
 674                          */
 675                         if (file_permission(interpreter, MAY_READ) < 0)
 676                                 bprm->interp_flags |= BINPRM_FLAGS_ENFORCE_NONDUMP;
 677 
 678                         retval = kernel_read(interpreter, 0, bprm->buf,
 679                                              BINPRM_BUF_SIZE);
 680                         if (retval != BINPRM_BUF_SIZE) {
 681                                 if (retval >= 0)
 682                                         retval = -EIO;
 683                                 goto out_free_dentry;
 684                         }
 685 
 686                         /* Get the exec headers */
 687                         loc->interp_elf_ex = *((struct elfhdr *)bprm->buf);
 688                         break;
 689                 }
 690                 elf_ppnt++;
 691         }
 692 
 693         elf_ppnt = elf_phdata;
 694         for (i = 0; i < loc->elf_ex.e_phnum; i++, elf_ppnt++)
 695                 if (elf_ppnt->p_type == PT_GNU_STACK) {
 696                         if (elf_ppnt->p_flags & PF_X)
 697                                 executable_stack = EXSTACK_ENABLE_X;
 698                         else
 699                                 executable_stack = EXSTACK_DISABLE_X;
 700                         break;
 701                 }
 702 
 703         /* Some simple consistency checks for the interpreter */
 704         if (elf_interpreter) {
 705                 retval = -ELIBBAD;
 706                 /* Not an ELF interpreter */
 707                 if (memcmp(loc->interp_elf_ex.e_ident, ELFMAG, SELFMAG) != 0)
 708                         goto out_free_dentry;
 709                 /* Verify the interpreter has a valid arch */
 710                 if (!elf_check_arch(&loc->interp_elf_ex))
 711                         goto out_free_dentry;
 712         }
 713 
 714         /* Flush all traces of the currently running executable */
 715         retval = flush_old_exec(bprm);
 716         if (retval)
 717                 goto out_free_dentry;
 718 
 719         /* OK, This is the point of no return */
 720         current->flags &= ~PF_FORKNOEXEC;
 721         current->mm->def_flags = def_flags;
 722 
 723         /* Do this immediately, since STACK_TOP as used in setup_arg_pages
 724            may depend on the personality.  */
 725         SET_PERSONALITY(loc->elf_ex);
 726         if (elf_read_implies_exec(loc->elf_ex, executable_stack))
 727                 current->personality |= READ_IMPLIES_EXEC;
 728 
 729         if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space)
 730                 current->flags |= PF_RANDOMIZE;
 731 
 732         setup_new_exec(bprm);
 733 
 734         /* Do this so that we can load the interpreter, if need be.  We will
 735            change some of these later */
 736         current->mm->free_area_cache = current->mm->mmap_base;
 737         current->mm->cached_hole_size = 0;
 738         retval = setup_arg_pages(bprm, randomize_stack_top(STACK_TOP),
 739                                  executable_stack);
 740         if (retval < 0) {
 741                 send_sig(SIGKILL, current, 0);
 742                 goto out_free_dentry;
 743         }
 744 
 745         current->mm->start_stack = bprm->p;
 746 
 747         /* Now we do a little grungy work by mmaping the ELF image into
 748            the correct location in memory. */
 749         for(i = 0, elf_ppnt = elf_phdata;
 750             i < loc->elf_ex.e_phnum; i++, elf_ppnt++) {
 751                 int elf_prot = 0, elf_flags;
 752                 unsigned long k, vaddr;
 753 
 754                 if (elf_ppnt->p_type != PT_LOAD)
 755                         continue;
 756 
 757                 if (unlikely (elf_brk > elf_bss)) {
 758                         unsigned long nbyte;
 759 
 760                         /* There was a PT_LOAD segment with p_memsz > p_filesz
 761                            before this one. Map anonymous pages, if needed,
 762                            and clear the area.  */
 763                         retval = set_brk (elf_bss + load_bias,
 764                                           elf_brk + load_bias);
 765                         if (retval) {
 766                                 send_sig(SIGKILL, current, 0);
 767                                 goto out_free_dentry;
 768                         }
 769                         nbyte = ELF_PAGEOFFSET(elf_bss);
 770                         if (nbyte) {
 771                                 nbyte = ELF_MIN_ALIGN - nbyte;
 772                                 if (nbyte > elf_brk - elf_bss)
 773                                         nbyte = elf_brk - elf_bss;
 774                                 if (clear_user((void __user *)elf_bss +
 775                                                         load_bias, nbyte)) {
 776                                         /*
 777                                          * This bss-zeroing can fail if the ELF
 778                                          * file specifies odd protections. So
 779                                          * we don't check the return value
 780                                          */
 781                                 }
 782                         }
 783                 }
 784 
 785                 if (elf_ppnt->p_flags & PF_R)
 786                         elf_prot |= PROT_READ;
 787                 if (elf_ppnt->p_flags & PF_W)
 788                         elf_prot |= PROT_WRITE;
 789                 if (elf_ppnt->p_flags & PF_X)
 790                         elf_prot |= PROT_EXEC;
 791 
 792                 elf_flags = MAP_PRIVATE | MAP_DENYWRITE | MAP_EXECUTABLE;
 793 
 794                 vaddr = elf_ppnt->p_vaddr;
 795                 if (loc->elf_ex.e_type == ET_EXEC || load_addr_set) {
 796                         elf_flags |= MAP_FIXED;
 797                 } else if (loc->elf_ex.e_type == ET_DYN) {
 798                         /* Try and get dynamic programs out of the way of the
 799                          * default mmap base, as well as whatever program they
 800                          * might try to exec.  This is because the brk will
 801                          * follow the loader, and is not movable.  */
 802 #ifdef CONFIG_X86
 803                         load_bias = 0;
 804 #else
 805                         load_bias = ELF_PAGESTART(ELF_ET_DYN_BASE - vaddr);
 806 #endif
 807                 }
 808 
 809                 error = elf_map(bprm->file, load_bias + vaddr, elf_ppnt,
 810                                 elf_prot, elf_flags, 0);
 811                 if (BAD_ADDR(error)) {
 812                         send_sig(SIGKILL, current, 0);
 813                         retval = IS_ERR((void *)error) ?
 814                                 PTR_ERR((void*)error) : -EINVAL;
 815                         goto out_free_dentry;
 816                 }
 817 
 818                 if (!load_addr_set) {
 819                         load_addr_set = 1;
 820                         load_addr = (elf_ppnt->p_vaddr - elf_ppnt->p_offset);
 821                         if (loc->elf_ex.e_type == ET_DYN) {
 822                                 load_bias += error -
 823                                              ELF_PAGESTART(load_bias + vaddr);
 824                                 load_addr += load_bias;
 825                                 reloc_func_desc = load_bias;
 826                         }
 827                 }
 828                 k = elf_ppnt->p_vaddr;
 829                 if (k < start_code)
 830                         start_code = k;
 831                 if (start_data < k)
 832                         start_data = k;
 833 
 834                 /*
 835                  * Check to see if the section's size will overflow the
 836                  * allowed task size. Note that p_filesz must always be
 837                  * <= p_memsz so it is only necessary to check p_memsz.
 838                  */
 839                 if (BAD_ADDR(k) || elf_ppnt->p_filesz > elf_ppnt->p_memsz ||
 840                     elf_ppnt->p_memsz > TASK_SIZE ||
 841                     TASK_SIZE - elf_ppnt->p_memsz < k) {
 842                         /* set_brk can never work. Avoid overflows. */
 843                         send_sig(SIGKILL, current, 0);
 844                         retval = -EINVAL;
 845                         goto out_free_dentry;
 846                 }
 847 
 848                 k = elf_ppnt->p_vaddr + elf_ppnt->p_filesz;
 849 
 850                 if (k > elf_bss)
 851                         elf_bss = k;
 852                 if ((elf_ppnt->p_flags & PF_X) && end_code < k)
 853                         end_code = k;
 854                 if (end_data < k)
 855                         end_data = k;
 856                 k = elf_ppnt->p_vaddr + elf_ppnt->p_memsz;
 857                 if (k > elf_brk)
 858                         elf_brk = k;
 859         }
 860 
 861         loc->elf_ex.e_entry += load_bias;
 862         elf_bss += load_bias;
 863         elf_brk += load_bias;
 864         start_code += load_bias;
 865         end_code += load_bias;
 866         start_data += load_bias;
 867         end_data += load_bias;
 868 
 869         /* Calling set_brk effectively mmaps the pages that we need
 870          * for the bss and break sections.  We must do this before
 871          * mapping in the interpreter, to make sure it doesn't wind
 872          * up getting placed where the bss needs to go.
 873          */
 874         retval = set_brk(elf_bss, elf_brk);
 875         if (retval) {
 876                 send_sig(SIGKILL, current, 0);
 877                 goto out_free_dentry;
 878         }
 879         if (likely(elf_bss != elf_brk) && unlikely(padzero(elf_bss))) {
 880                 send_sig(SIGSEGV, current, 0);
 881                 retval = -EFAULT; /* Nobody gets to see this, but.. */
 882                 goto out_free_dentry;
 883         }
 884 
 885         if (elf_interpreter) {
 886                 unsigned long uninitialized_var(interp_map_addr);
 887 
 888                 elf_entry = load_elf_interp(&loc->interp_elf_ex,
 889                                             interpreter,
 890                                             &interp_map_addr,
 891                                             load_bias);
 892                 if (!IS_ERR((void *)elf_entry)) {
 893                         /*
 894                          * load_elf_interp() returns relocation
 895                          * adjustment
 896                          */
 897                         interp_load_addr = elf_entry;
 898                         elf_entry += loc->interp_elf_ex.e_entry;
 899                 }
 900                 if (BAD_ADDR(elf_entry)) {
 901                         force_sig(SIGSEGV, current);
 902                         retval = IS_ERR((void *)elf_entry) ?
 903                                         (int)elf_entry : -EINVAL;
 904                         goto out_free_dentry;
 905                 }
 906                 reloc_func_desc = interp_load_addr;
 907 
 908                 allow_write_access(interpreter);
 909                 fput(interpreter);
 910                 kfree(elf_interpreter);
 911         } else {
 912                 elf_entry = loc->elf_ex.e_entry;
 913                 if (BAD_ADDR(elf_entry)) {
 914                         force_sig(SIGSEGV, current);
 915                         retval = -EINVAL;
 916                         goto out_free_dentry;
 917                 }
 918         }
 919 
 920         kfree(elf_phdata);
 921 
 922         set_binfmt(&elf_format);
 923 
 924 #ifdef ARCH_HAS_SETUP_ADDITIONAL_PAGES
 925         retval = arch_setup_additional_pages(bprm, !!elf_interpreter);
 926         if (retval < 0) {
 927                 send_sig(SIGKILL, current, 0);
 928                 goto out;
 929         }
 930 #endif /* ARCH_HAS_SETUP_ADDITIONAL_PAGES */
 931 
 932         install_exec_creds(bprm);
 933         current->flags &= ~PF_FORKNOEXEC;
 934         retval = create_elf_tables(bprm, &loc->elf_ex,
 935                           load_addr, interp_load_addr);
 936         if (retval < 0) {
 937                 send_sig(SIGKILL, current, 0);
 938                 goto out;
 939         }
 940         /* N.B. passed_fileno might not be initialized? */
 941         current->mm->end_code = end_code;
 942         current->mm->start_code = start_code;
 943         current->mm->start_data = start_data;
 944         current->mm->end_data = end_data;
 945         current->mm->start_stack = bprm->p;
 946 
 947 #ifdef arch_randomize_brk
 948         if ((current->flags & PF_RANDOMIZE) && (randomize_va_space > 1))
 949                 current->mm->brk = current->mm->start_brk =
 950                         arch_randomize_brk(current->mm);
 951 #endif
 952 
 953         if (current->personality & MMAP_PAGE_ZERO) {
 954                 /* Why this, you ask???  Well SVr4 maps page 0 as read-only,
 955                    and some applications "depend" upon this behavior.
 956                    Since we do not have the power to recompile these, we
 957                    emulate the SVr4 behavior. Sigh. */
 958                 down_write(¤t->mm->mmap_sem);
 959                 error = do_mmap(NULL, 0, PAGE_SIZE, PROT_READ | PROT_EXEC,
 960                                 MAP_FIXED | MAP_PRIVATE, 0);
 961                 up_write(¤t->mm->mmap_sem);
 962         }
 963 
 964 #ifdef ELF_PLAT_INIT
 965         /*
 966          * The ABI may specify that certain registers be set up in special
 967          * ways (on i386 %edx is the address of a DT_FINI function, for
 968          * example.  In addition, it may also specify (eg, PowerPC64 ELF)
 969          * that the e_entry field is the address of the function descriptor
 970          * for the startup routine, rather than the address of the startup
 971          * routine itself.  This macro performs whatever initialization to
 972          * the regs structure is required as well as any relocations to the
 973          * function descriptor entries when executing dynamically links apps.
 974          */
 975         ELF_PLAT_INIT(regs, reloc_func_desc);
 976 #endif
 977 
 978         start_thread(regs, elf_entry, bprm->p);
 979         retval = 0;
 980 out:
 981         kfree(loc);
 982 out_ret:
 983         return retval;
 984 
 985         /* error cleanup */
 986 out_free_dentry:
 987         allow_write_access(interpreter);
 988         if (interpreter)
 989                 fput(interpreter);
 990 out_free_interp:
 991         kfree(elf_interpreter);
 992 out_free_ph:
 993         kfree(elf_phdata);
 994         goto out;
 995 }

execve过程中堆栈的变化情况:

linux系统调用execve_第2张图片


 

III.缺页异常
用户态进程的动态内存分配跟内核的动态内存分配不同。当内核申请内存时直接分配页帧,而用户态进程内存申请时只获得了地址空间的使用权,而并没有分配实际页帧。
这是因为,进程做内存申请时被认为不是急需的(比如装载代码段到内存后,进程并不会访问所有的代码页),这样内核就可以延后内存分配,延后分配主要是通过缺页异常来实现的
在exec过程中,内核没有为进程分配实际的页帧,而只是作了内存映射,也就是获得了地址空间的使用权。

当没有分配页帧时,页表中的Present标识位为0,此时访问该页时会产生缺页异常;如果进程有该地址的使用权,缺页异常处理handler就会为进程分配页帧。

exec过程中主要做的有代码段和数据段的文件内存映射,BSS段的匿名映射,堆栈的匿名映射

i.文件内存映射缺页异常处理
会调用文件中的映射函数做相应处理

ii.BSS段的匿名映射缺页异常处理
BSS段中的页帧分配后应该是全0的,缺页异常的处理如下:
1.当读内存引发的缺页异常(Present=0),匿名映射时会分配zero page;
  do_page_fault->handle_mm_fault-> handle_pte_fault->do_anonymous_page->pte_mkspecial
2.当写内存引发的缺页异常(Present=0),匿名映射会分配页帧并将页帧数据清0
  do_page_fault->handle_mm_fault-> handle_pte_fault->do_anonymous_page->alloc_zeroed_user_highpage_movable
3.先读分配zero page(Present=1),再写时由于写保护产生的缺页异常会触发COW;COW处理时,如果原页帧是zero page,就会分配新页并将页帧数据清0
  do_page_fault->handle_mm_fault-> handle_pte_fault->do_wp_page->is_zero_pfn&&alloc_zeroed_user_highpage_movable


iii.堆栈的匿名映射缺页异常处理
堆栈是唯一一种在超出地址使用范围,缺页异常还会分配页帧情况

1081         vma = find_vma(mm, address);
1082         if (unlikely(!vma)) {
1083                 bad_area(regs, error_code, address);
1084                 return;
1085         }
1086         if (likely(vma->vm_start <= address))
1087                 goto good_area;
1088         if (unlikely(!(vma->vm_flags & VM_GROWSDOWN))) {
1089                 bad_area(regs, error_code, address);
1090                 return;
1091         }
1092         if (error_code & PF_USER) {
1093                 /*
1094                  * Accessing the stack below %sp is always a bug.
1095                  * The large cushion allows instructions like enter
1096                  * and pusha to work. ("enter $65535, $31" pushes
1097                  * 32 pointers and then decrements %sp by 65535.)
1098                  */
1099                 if (unlikely(address + 65536 + 32 * sizeof(unsigned long) < regs->sp)) {
1100                         bad_area(regs, error_code, address);
1101                         return;
1102                 }
1103         }
1104         if (unlikely(expand_stack(vma, address))) {
1105                 bad_area(regs, error_code, address);
1106                 return;
1107         }


find_vma是以address小于地址区间结尾为查找条件,最块情况找到的也是堆栈区间(当然堆栈增长方向为VM_GROWSUP除外);
如果是堆栈引发的缺页异常,就会自动对堆栈进行扩容。

你可能感兴趣的:(linux,kernel)