区域函数[置顶] linux 3.4.10 内核内存管理源代码分析5:伙伴系统初始化

在本篇文章中,我们主要分析区域函数的内容,自我感觉有个不错的建议和大家分享下

    法律声明《linux 3.4.10 内核内存管理源代码分析》系列文章由陈晋飞([email protected])发表于http://write.blog.csdn.net/postedit/8943026,文章遵循GPL协议。欢送转载,转载请注明作者和此条款。

    

    

5 伙伴系统初始化

         计算机在启动时都是先加电,然后停止硬件检测并加载引导程序。

    引导程序把Linux系统内核装载到内存,加载内核后引导程序跳转到

    arch/x86/boot/compressed/head_32.S的startup_32标号处执行。

         在arch/x86/boot/compressed/head_32.S中会调用arch/x86/boot/main.c中的main函数。

         main函数执行完后会跳转到arch/x86/kernel/head_32.S的标号startup_32处执行。

         在arch/x86/kernel/head_32.S中会调用arch/x86/kernel/head32.c中的i386_start_kernel

         i386_start_kernel调用init/main.c中的start_kernel函数,start_kernel是用来启动内核的主函数。

 

 

         start_kernel函数会调用arch/x86/kernel/setup.c中的setup_arch函数

         setup_arch函数会调用arch/x86/mm/init_32.c中的paging_init函数

 

         初始化后释放初始化内存分配器的内存到伙伴系统的流程是:

    start_kernel ()at init/main.c:524

         mm_init() at init/main.c:458

         mem_init() at arch/x86/mm/init_32.c:752

         free_all_bootmem() at mm/nobootmem.c:168

         free_low_memory_core_early() at mm/nobootmem.c:130

         __free_memory_core() at mm/nobootmem.c:118

         __free_pages_memory() at mm/nobootmem.c:99

         __free_pages_bootmem() at mm/page_alloc.c:749

         __free_pages() at mm/page_alloc.c:2506

         最终由伙伴系统的内存释放函数__free_pages来吧初始化内存分配器的内存释放到伙伴系统。

 

         伙伴系统的初始化实质是从初始化内存分配器接管内存管理的权限。而伙伴初始化也分红两个步调,第一步是伙伴系统各种结构和管理数据的初始化,第二个步调是把初始化内存分配器中的闲暇内存释放到伙伴系统,以后就可以正式应用伙伴系统分配内存了。第一个步调症结由zone_sizes_init和build_all_zonelists函数完成。第二个步调为的执行流程我们在上面已经列出,具体代码将在初始内存分配器的实现代码。

           我们晓得在numa系统中,包含若干节点,而每一个节点包含若干区域,每一个区域包含若干闲暇区域,每一个闲暇区域包含若干迁徙类型,对每一个迁徙类型,都有一个闲暇链表。闲暇链表链接的是闲暇块。

           在初始化过程中症结的是节点和区域的初始化,因为闲暇区域的初始化只是对包含的闲暇链表数组的每一个链表初始化为空链表。并对闲暇块计数初始化为0而已。

           节点初始化主要的部分是找到第一个可用的页的页帧,节点包含也页面数和节点的可用页面数,还有初始化节点中page结构数组。

           对区域的初始化的症结也查找也是第一个可用的页的页帧,以及区域页面数和可用页面数的初始化。实际节点的可用页面数就是节点的所有区域可用页面数的和,节点的页面数是节点的所有区域包含的页面数的和。

    为了在后面的分析避免嵌套过深,上面先分析一个函数,包括计算区域页面数和可用页面数的函数和计算节点范围的函数:

     

    

===============

    

      zone_spanned_pages_in_node函数

         zone_spanned_pages_in_node函数计算区域的的包含的页面数,包含旁边可能存在的空泛。计算区域总页面数要斟酌两个因数:

    1:在系统中包含一个数组arch_zone_lowest_possible_pfn,保存了每种类型的区域可能的最小的页帧号,另外一个数组arch_zone_highest_possible_pfn,保存了每种类型的区域可能的最大的页帧号。

    2:另外有一种区域类型是ZONE_MOVABLE,这是系统为了避免内存碎片退出的一种区域类型,其他类型的区域不能包含ZONE_MOVABLE区域的页面。一个节点中的区域是按顺序存放的,ZONE_MOVABLE存放在节点的最高端。

    zone_spanned_pages_in_node在mm/page_alloc.c中实现代码如下:

    4090 staticunsigned long __meminit zone_spanned_pages_in_node(int nid,

    4091                                        unsigned long zone_type,

    4092                                        unsigned long *ignored)

    4093 {

    4094         unsigned long node_start_pfn,node_end_pfn;

    4095         unsigned long zone_start_pfn,zone_end_pfn;

    4096

    4097         /* Get the start and end of the nodeand zone */

    4098         get_pfn_range_for_nid(nid,&node_start_pfn, &node_end_pfn);

    4099         zone_start_pfn = arch_zone_lowest_possible_pfn[zone_type];

    4100         zone_end_pfn =arch_zone_highest_possible_pfn[zone_type];

    4101        adjust_zone_range_for_zone_movable(nid, zone_type,

    4102                                node_start_pfn, node_end_pfn,

    4103                                 &zone_start_pfn,&zone_end_pfn);

    4104

    4105         /* Check that this node has pageswithin the zone's required range */

    4106         if (zone_end_pfn < node_start_pfn|| zone_start_pfn > node_end_pfn)

    4107                 return 0;

    4108

    4109         /* Move the zone boundaries inside thenode if necessary */

    4110         zone_end_pfn = min(zone_end_pfn,node_end_pfn);

    4111         zone_start_pfn = max(zone_start_pfn,node_start_pfn);

    4112

    4113         /* Return the spanned pages */

    4114         return zone_end_pfn - zone_start_pfn;

    4115 }

 

    4098行调用get_pfn_range_for_nid函数遍历初始化内存分配器的每一个闲暇段,取得最小的闲暇页帧和最大的闲暇页帧。

    4098-4099行取得系统允许的区域最大页帧和最小页帧。

    在区域中可能于ZONE_MOVABLE类型区域有重合,4101行调用adjust_zone_range_for_zone_movable函数去掉与区域ZONE_MOVABLE类型重合的部分。

    4106-4107行如果区域不在地点的节点的页范围内,返回0.

    4110-4111行区域的页面范围只能在地点节点的范围内。

 

    

adjust_zone_range_for_zone_movable函数

         adjust_zone_range_for_zone_movable函数是用来保存ZONE_MOVABLE类型区域的页面的。在mm/page_alloc.c中实现代码如下:

         4060static void __meminit adjust_zone_range_for_zone_movable(int nid,

    4061                                        unsigned long zone_type,

    4062                                        unsigned long node_start_pfn,

    4063                                        unsigned long node_end_pfn,

    4064                                         unsigned long *zone_start_pfn,

    4065                                        unsigned long *zone_end_pfn)

    4066 {

    4067        /* Only adjust if ZONE_MOVABLE is on this node */

    4068        if (zone_movable_pfn[nid]) {

    4069                 /* Size ZONE_MOVABLE */

    4070                 if (zone_type == ZONE_MOVABLE){

    4071                         *zone_start_pfn =zone_movable_pfn[nid];

    4072                         *zone_end_pfn =min(node_end_pfn,

    4073                                arch_zone_highest_possible_pfn[movable_zone]);

    4074

    4075                 /* Adjust for ZONE_MOVABLEstarting within this range */

    4076                 } else if (*zone_start_pfn< zone_movable_pfn[nid] &&

    4077                                 *zone_end_pfn> zone_movable_pfn[nid]) {

    4078                         *zone_end_pfn =zone_movable_pfn[nid];

    4079

    4080                 /* Check if this whole rangeis within ZONE_MOVABLE */

    4081                 } else if (*zone_start_pfn>= zone_movable_pfn[nid])

    4082                         *zone_start_pfn = *zone_end_pfn;

    4083        }

    4084 }

         4068行只有在zone_movable_pfn[nid]数组中的先不为0,才斟酌ZONE_MOVABLE类型区域。

         4070-4073行处理的是zone_typ等于ZONE_MOVABLE的情况。ZONE_MOVABLE区域范围的求法是:在系统中有个zone_movable_pfn数组,以节点号为下标可以确定每一个节点的ZONE_MOVABLE类型区域的首页帧号,另外有个变量movable_zone,用来保存一个区域类型,表示ZONE_MOVABLE类型区域的最大页帧号和movable_zone类型的最大页帧号相等。

         全局变量movable_zone表示一个区域类型,表示ZONE_MOVABLE类型区域的最大页帧号和movable_zone类型的最大页帧号相等,也就数说ZONE_MOVAB区域的最大页帧号和同一节点的其他类型的一个区域的最大页帧号相等,这样如果zone_type不等于ZONE_MOVABLE,想像ZONE_MOVABLE区域从最大页帧向下扩展,则会出现三种情况:ZONE_MOVABLE在zone_typ区域内,zone_typ区域在ZONE_MOVABLE区域内,两个区域不相交。4076-4078处理的是ZONE_MOVABLE在区域zone_type内的情景,4081-4082行处理的是zone_typ区域在ZONE_MOVABLE区域内的情景。

 

    

absent_pages_in_range函数

    

__absent_pages_in_range函数

         absent_pages_in_range函数计算区间不可用的页面数。absent_pages_in_range是调用__absent_pages_in_range来实现的, __absent_pages_in_range在mm/page_alloc.c中实现代码如下:

         4121unsigned long __meminit __absent_pages_in_range(int nid,

    4122                                 unsigned longrange_start_pfn,

    4123                                 unsigned longrange_end_pfn)

    4124 {

    4125        unsigned long nr_absent = range_end_pfn - range_start_pfn;

    4126        unsigned long start_pfn, end_pfn;

    4127        int i;

    4128

    4129        for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) {

    4130                 start_pfn = clamp(start_pfn,range_start_pfn, range_end_pfn);

    4131                 end_pfn = clamp(end_pfn,range_start_pfn, range_end_pfn);

    4132                 nr_absent -= end_pfn -start_pfn;

    4133        }

    4134        return nr_absent;

    4135 }

         用区间的总页面数减去在区间中所有闲暇的页面数,就取得了区间中不可用的页面数。4125求的区间的页面数,4129-4133遍历每一个在初始化分配器中的区段,减去区段在区间中的页面数。clamp函数返回三个参数的旁边值。

 

    

get_pfn_range_for_nid函数

    get_pfn_range_for_nid函数取得节点的页帧范围,在mm/page_alloc.c中实现,代码如下:

    4011 void __meminitget_pfn_range_for_nid(unsigned int nid,

    4012                         unsigned long*start_pfn, unsigned long *end_pfn)

    4013 {

    4014        unsigned long this_start_pfn, this_end_pfn;

    4015        int i;

    4016

    4017        *start_pfn = -1UL;

    4018        *end_pfn = 0;

    4019

    4020        for_each_mem_pfn_range(i, nid, &this_start_pfn, &this_end_pfn,NULL) {

    4021                 *start_pfn = min(*start_pfn,this_start_pfn);

    4022                 *end_pfn = max(*end_pfn,this_end_pfn);

    4023        }

    4024

    4025        if (*start_pfn == -1UL)

    4026                *start_pfn = 0;

    4027 }

         get_pfn_range_for_nid函数遍历memblock分配器在节点的每一个闲暇区域,取得最大和最小页帧号。

 

 

    

zone_absent_pages_in_node函数

    zone_absent_pages_in_node函数取得区域的不可用的页面数

    4151 static unsigned long __meminitzone_absent_pages_in_node(int nid,

    4152                                         unsigned long zone_type,

    4153                                        unsigned long *ignored)

    4154 {

    4155        unsigned long zone_low = arch_zone_lowest_possible_pfn[zone_type];

    4156        unsigned long zone_high = arch_zone_highest_possible_pfn[zone_type];

    4157        unsigned long node_start_pfn, node_end_pfn;

    4158        unsigned long zone_start_pfn, zone_end_pfn;

    4159

    4160        get_pfn_range_for_nid(nid, &node_start_pfn, &node_end_pfn);

    4161        zone_start_pfn = clamp(node_start_pfn, zone_low, zone_high);

    4162        zone_end_pfn = clamp(node_end_pfn, zone_low, zone_high);

    4163

    4164        adjust_zone_range_for_zone_movable(nid, zone_type,

    4165                         node_start_pfn,node_end_pfn,

    4166                         &zone_start_pfn,&zone_end_pfn);

    4167        return __absent_pages_in_range(nid, zone_start_pfn, zone_end_pfn);

    4168 }

 

         4155-4156行从arch_zone_lowest_possible_pfn数组取得区域最小页帧号,从arch_zone_highest_possible_pfn数组取得区域最大页帧号。

         4160-4162行把区域范围限制到节点范围中。

         4167调用__absent_pages_in_range求得区域范围中不可用的页面数。

 

 

    

=========================================

    

zone_sizes_init函数

         伙伴系统的初始化主要是指zone_sizes_init函数中完成的,调用zone_sizes_init函数的流程是:

    start_kernel () at init/main.c:496

    setup_arch () atarch/x86/kernel/setup.c:972

    paging_init () at arch/x86/mm/init_32.c:700

    zone_sizes_init () atarch/x86/mm/init.c:398

         zone_sizes_init函数在arch/x86/mm/init.c中实现,代码如下:

    397 void __init zone_sizes_init(void)

    398 {

    399        unsigned long max_zone_pfns[MAX_NR_ZONES];

    400

    401        memset(max_zone_pfns, 0, sizeof(max_zone_pfns));

    402

    403 #ifdef CONFIG_ZONE_DMA

    404        max_zone_pfns[ZONE_DMA]         =MAX_DMA_PFN;

    405 #endif

    406 #ifdef CONFIG_ZONE_DMA32

    407        max_zone_pfns[ZONE_DMA32]       =MAX_DMA32_PFN;

    408 #endif

    409        max_zone_pfns[ZONE_NORMAL]      =max_low_pfn;

    410 #ifdef CONFIG_HIGHMEM

    411        max_zone_pfns[ZONE_HIGHMEM]     =max_pfn;

    412 #endif

    413

    414        free_area_init_nodes(max_zone_pfns);

    415 }

 

         max_zone_pfns是一数组定义了每一个区域类型范围。max_low_pfn和max_pfn在前面已经确定。

 

    

free_area_init_nodes函数

         free_area_init_nodes在mm/page_alloc.c中实现,代码如下:

    4734 void __initfree_area_init_nodes(unsigned long *max_zone_pfn)

    4735 {

    4736        unsigned long start_pfn, end_pfn;

    4737        int i, nid;

    4738

    4739        /* Record where the zone boundaries are */

    4740        memset(arch_zone_lowest_possible_pfn, 0,

    4741                                sizeof(arch_zone_lowest_possible_pfn));

    4742        memset(arch_zone_highest_possible_pfn, 0,

    4743                                sizeof(arch_zone_highest_possible_pfn));

    4744        arch_zone_lowest_possible_pfn[0] = find_min_pfn_with_active_regions();

    4745        arch_zone_highest_possible_pfn[0] = max_zone_pfn[0];

    4746        for (i = 1; i < MAX_NR_ZONES; i++) {

    4747                 if (i == ZONE_MOVABLE)

    4748                         continue;

    4749                arch_zone_lowest_possible_pfn[i] =

    4750                         arch_zone_highest_possible_pfn[i-1];

    4751                arch_zone_highest_possible_pfn[i] =

    4752                         max(max_zone_pfn[i],arch_zone_lowest_possible_pfn[i]);

    4753        }

    4754        arch_zone_lowest_possible_pfn[ZONE_MOVABLE] = 0;

    4755        arch_zone_highest_possible_pfn[ZONE_MOVABLE] = 0;

    4756

    4757        /* Find the PFNs that ZONE_MOVABLE begins at in each node */

    4758        memset(zone_movable_pfn, 0, sizeof(zone_movable_pfn));

    4759        find_zone_movable_pfns_for_nodes();

    4760

    4761        /* Print out the zone ranges */

    4762        printk("Zone PFN ranges:\n");

    4763        for (i = 0; i < MAX_NR_ZONES; i++) {

    4764                 if (i == ZONE_MOVABLE)

    4765                         continue;

    4766                 printk("  %-8s ", zone_names[i]);

    4767                 if(arch_zone_lowest_possible_pfn[i] ==

    4768                                arch_zone_highest_possible_pfn[i])

    4769                        printk("empty\n");

    4770                 else

    4771                        printk("%0#10lx ->%0#10lx\n",

    4772                                arch_zone_lowest_possible_pfn[i],

    4773                                arch_zone_highest_possible_pfn[i]);

    4774        }

    4775

    4776        /* Print out the PFNs ZONE_MOVABLE begins at in each node */

    4777        printk("Movable zone start PFN for each node\n");

    4778        for (i = 0; i < MAX_NUMNODES; i++) {

    4779                 if (zone_movable_pfn[i])

    4780                         printk("  Node %d: %lu\n", i, zone_movable_pfn[i]);

    4781        }

    4782

    4783        /* Print out the early_node_map[] */

    4784        printk("Early memory PFN ranges\n");

    4785        for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn,&nid)

    4786                 printk("  %3d: %0#10lx -> %0#10lx\n", nid,start_pfn, end_pfn);

    4787

    4788        /* Initialise every node */

    4789        mminit_verify_pageflags_layout();

    4790        setup_nr_node_ids();

    4791        for_each_online_node(nid) {

    4792                 pg_data_t *pgdat = NODE_DATA(nid);

    4793                 free_area_init_node(nid, NULL,

    4794                                find_min_pfn_for_node(nid), NULL);

    4795

    4796                 /* Any memory on that node */

    4797                 if(pgdat->node_present_pages)

    4798                        node_set_state(nid,N_HIGH_MEMORY);

    4799                check_for_regular_memory(pgdat);

    4800        }

    4801 }

         free_area_init_nodes的代码比较长,但主要作了两项工作,确定节点的每一个区域的上下界,然后对每一个节点初始化。

         除ZONE_MOVABLE区域类型外,区域范围的确定方法是用两个数组,arch_zone_lowest_possible_pfn确定区域的最小页帧号,arch_zone_highest_possible_pfn确定区域的最大页帧号,一个区域的页帧号pfn所允许的范围是arch_zone_lowest_possible_pfn<= pfn<arch_zone_highest_possible_pfn[zone_type]。ZONE_MOVABLE的区域的范围的确定方法涉及到一个数组zone_movable_pfn和一个变量movable_zone,对一个节点号为nid的节点,ZONE_MOVABLE的页帧号pfn的区域是:zone_movable_pfn[nid]<= pfn < arch_zone_highest_possible_pfn[movable_zone]。

         4740-4753行求得从区域除ZONE_MOVABLE区域类型外的区域范围,从区域范围的求法可以晓得,区域在节点中是依次连续的。

         4754-4759行求ZONE_MOVABLE区域的范围。其中症结是find_zone_movable_pfns_for_nodes函数,分析本函数后分析find_zone_movable_pfns_for_nodes函数。

         4762-4786行打印区域范围信息。

         4789行mminit_verify_pageflags_layout函数验证位码信息并输出一些调式信息。

         4790行调用setup_nr_node_ids函数设置节点总数。保存在变量nr_node_ids中。

         4791-4799一个循环,变量每一个节点,4793行调用free_area_init_node函数对每一个节点初始化,free_area_init_node函数中后面停止分析。4797-4799行主要是设置一些节点是否内存的状态信息。系统定义了一个枚举变量enum node_states,用来记录一个节点是否能用(N_POSSIBLE),是否在线(N_ONLINE),是否具有普通内存区域(N_NORMAL_MEMORY),是否有普通内存或高端内存内存(N_HIGH_MEMORY),是否有连接有cpu(N_CPU)。mm/page_alloc.c中有个节点掩码数组node_states[]对enum node_states的每项都有个节点掩码,来记录节点的状态信息。

    

find_zone_movable_pfns_for_nodes函数

         find_zone_movable_pfns_for_nodes的工作是确定ZONE_MOVABLE区域的范围。在mm/page_alloc.c中实现,代码如下:

    4567 static void __initfind_zone_movable_pfns_for_nodes(void)

    4568 {

    4569        int i, nid;

    4570        unsigned long usable_startpfn;

    4571        unsigned long kernelcore_node, kernelcore_remaining;

    4572        /* save the state before borrow the nodemask */

    4573        nodemask_t saved_node_state = node_states[N_HIGH_MEMORY];

    4574        unsigned long totalpages = early_calculate_totalpages();

    4575        int usable_nodes = nodes_weight(node_states[N_HIGH_MEMORY]);

    4576

    4577        /*

    4578          * If movablecore was specified,calculate what size of

    4579          * kernelcore that corresponds so thatmemory usable for

    4580          * any allocation type is evenly spread.If both kernelcore

    4581          * and movablecore are specified, thenthe value of kernelcore

    4582          * will be used forrequired_kernelcore if it's greater than

    4583          * what movablecore would haveallowed.

    4584          */

    4585        if (required_movablecore) {

    4586                 unsigned long corepages;

    4587

    4588                 /*

    4589                  * Round-up so thatZONE_MOVABLE is at least as large as what

    4590                  * was requested by the user

    4591                  */

    4592                 required_movablecore =

    4593                        roundup(required_movablecore, MAX_ORDER_NR_PAGES);

    4594                 corepages = totalpages -required_movablecore;

    4595

    4596                 required_kernelcore = max(required_kernelcore,corepages);

    4597        }

    4598

    4599        /* If kernelcore was not specified, there is no ZONE_MOVABLE */

    4600        if (!required_kernelcore)

    4601                 goto out;

    4602

    4603        /* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */

    4604        find_usable_zone_for_movable();

    4605        usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone];

    4606

    4607 restart:

    4608        /* Spread kernelcore memory as evenly as possible throughout nodes */

    4609        kernelcore_node = required_kernelcore / usable_nodes;

    4610        for_each_node_state(nid, N_HIGH_MEMORY) {

    4611                 unsigned long start_pfn,end_pfn;

    4612

    4613                 /*

    4614                  * Recalculate kernelcore_nodeif the division per node

    4615                  * now exceeds what isnecessary to satisfy the requested

    4616                  * amount of memory for thekernel

    4617                  */

    4618                 if (required_kernelcore <kernelcore_node)

    4619                         kernelcore_node =required_kernelcore / usable_nodes;

    4620

    4621                 /*

    4622                  * As the map is walked, wetrack how much memory is usable

    4623                  * by the kernel usingkernelcore_remaining. When it is

    4624                  * 0, the rest of the node isusable by ZONE_MOVABLE

    4625                  */

    4626                 kernelcore_remaining =kernelcore_node;

    4627

    4628                 /* Go through each range ofPFNs within this node */

    4629                 for_each_mem_pfn_range(i, nid,&start_pfn, &end_pfn, NULL) {

    4630                         unsigned longsize_pages;

    4631

    4632                         start_pfn =max(start_pfn, zone_movable_pfn[nid]);

    4633                         if (start_pfn >=end_pfn)

    4634                                 continue;

    4635

    4636                         /* Account for what isonly usable for kernelcore */

    4637                         if (start_pfn <usable_startpfn) {

    4638                                 unsigned longkernel_pages;

    4639                                 kernel_pages =min(end_pfn, usable_startpfn)

    4640                                                                - start_pfn;

    4641

    4642                                kernelcore_remaining -= min(kernel_pages,

    4643                                                        kernelcore_remaining);

    4644                                required_kernelcore -= min(kernel_pages,

    4645                                                        required_kernelcore);

    4646

    4647                                 /* Continue ifrange is now fully accounted */

    4648                                 if (end_pfn<= usable_startpfn) {

    4649

    4650                                         /*

    4651                                          * Push zone_movable_pfn to the endso

    4652                                          *that if we have to rebalance

    4653                                          *kernelcore across nodes, we will

    4654                                          * notdouble account here

    4655                                          */

    4656                                        zone_movable_pfn[nid] = end_pfn;

    4657                                        continue;

    4658                                 }

    4659                                start_pfn =usable_startpfn;

    4660                         }

    4661

    4662                         /*

    4663                          * The usable PFNrange for ZONE_MOVABLE is from

    4664                          *start_pfn->end_pfn. Calculate size_pages as the

    4665                          * number of pagesused as kernelcore

    4666                          */

    4667                         size_pages = end_pfn -start_pfn;

    4668                         if (size_pages >kernelcore_remaining)

    4669                                 size_pages =kernelcore_remaining;

    4670                         zone_movable_pfn[nid]= start_pfn + size_pages;

    4671

    4672                         /*

    4673                          * Some kernelcore hasbeen met, update counts and

    4674                          * break if thekernelcore for this node has been

    4675                          * satisified

    4676                          */

    4677                         required_kernelcore -=min(required_kernelcore,

    4678                                                                size_pages);

    4679                         kernelcore_remaining-= size_pages;

    4680                         if(!kernelcore_remaining)

    4681                                 break;

    4682                 }

    4683        }

    4684

    4685        /*

    4686          * If there is stillrequired_kernelcore, we do another pass with one

    4687          * less node in the count. This willpush zone_movable_pfn[nid] further

    4688          * along on the nodes that still havememory until kernelcore is

    4689          * satisified

    4690          */

    4691        usable_nodes--;

    4692        if (usable_nodes && required_kernelcore > usable_nodes)

    4693                 goto restart;

    4694

    4695        /* Align start of ZONE_MOVABLE on all nids to MAX_ORDER_NR_PAGES */

    4696        for (nid = 0; nid < MAX_NUMNODES; nid++)

    每日一道理
感叹人生,是因为曾经没有过轰轰烈烈的壮举,觉得渺小,觉得平庸,似乎生活过于简单,简单得让人感觉烦躁。没有大言不惭地说过将来,只是比较现实地握住了现在,我想,这是一条路,每一个人所必须踏上的一次旅程,曾经看到过这样一句话:成长的过程漫长却充实,自毁的过程短暂却留下一生痛苦,人生可以说是一次考验,何去何从取决于自我。

    4697                 zone_movable_pfn[nid] =

    4698                        roundup(zone_movable_pfn[nid], MAX_ORDER_NR_PAGES);

    4699

    4700 out:

    4701        /* restore the node_state */

    4702        node_states[N_HIGH_MEMORY] = saved_node_state;

    4703 }

         这个函数的目的是计算zone_movable_pfn数组。在系统中有两个变量required_movablecore和required_kernelcore,这两个变量的值是通过命令行传进来的,变量required_movablecore通知内核保存给ZONE_MOVABLE区域的页面数,required_kernelcore是需要保存的非ZONE_MOVABLE区域的页面数。

         4585-4601行,由4585行和4600行晓得,如果这两个数据都没有通过命令行设置,则直接跳到out标号,也就是ZONE_MOVABLE区域为空。corepages变量由early_calculate_totalpages初始化,是闲暇内存的总数,roundup(x, y)是一个宏,返回大于等于x的是y的倍数的第一个数。4592-4593行设置required_movablecore是MAX_ORDER_NR_PAGES的倍数,4596行如果设置为指定的required_kernelcore和剩余的闲暇的区域required_movablecore页后的页面,其实也就是让页面优先用做非ZONE_MOVABLE区域的页面数。

    在4602行的后面required_movablecore变量没有再出现,后面的代码主要做了两部分工作,先选定一个区域,选的方法是从高到低的第一个不空的非ZONE_MOVABLE区域,然后在这个区域的低端往上收缩,保证非ZONE_MOVABLE区域的页面数达到required_kernelcore。

         4604行调用函数find_usable_zone_for_movable设置变量movable_zone,movable_zone被设置的值就是最高不空的非ZONE_MOVABLE区域。

         4605行设置usable_startpfn变量的值,usable_startpfn也就是第一个能作为ZONE_MOVABL区域的页帧的值。

         4609行设置kernelcore_node变量的值,usable_nodes一个商数,初始化为是具有ZONE_MOVABLE区域的节点数,在第一次扫描中kernelcore_node初始化为对每一个节点是均匀保存非ZONE_MOVABLE区域页面的,以后每次扫描会自减usable_node。在计算zone_movable_pfn数组时,会对一个节点集合遍历,kernelcore_node变量是每一个节点应该保存给非ZONE_MOVABLE区域的页面数。        

         4610行对在节点掩码node_states[N_HIGH_MEMORY]中可用的每一个节点停止扫描。

         4618-4619行如果required_kernelcore < kernelcore_node重新设置kernelcore_node变量的值

         4626行kernelcore_remaining变量是在本次对节点的扫描要变量的页面数,赋值为required_kernelcore。

         4629行对每一个初始化内存分配器中的闲暇区域停止遍历。

         4632-4634行zone_movable_pfn[nid]是本次扫描节点ZONE_MOVABLE区域的最小页帧号,如果end_pfn <=zone_movable_pfn[nid]或者end_pfn <=start_pfn就是本次扫描的闲暇区段不再ZONE_MOVABLE区域范围内或者是空区段,继续扫描下一个区段。

         4637行,start_pfn是本次扫描的闲暇区段的首页帧,usable_startpfn是ZONE_MOVABLE区域锁允许的最小帧。start_pfn< usable_startpfn意味着start_pfn -->usable_startpfn的帧是属于非ZONE_MOVABLE区域的。4638-4645在所要保存的页面数中减去这段包含的页面。

         4648行end_pfn <= usable_startpfn表示正闲暇区段都属于非ZONE_MOVABLE区域。4656行zone_movable_pfn[nid] = end_pfn,如果保存给非ZONE_MOVABLE区域的区域已经足够,用本次扫描的闲暇区段尾做本节点的ZONE_MOVABLE区域首页帧号。注意一点区段是包含首页帧号start_pfn,不包含尾帧end_pfn。

         代码执行到4659行表示end_pfn >usable_startpfn,执行start_pfn = usable_startpfn把usable_startpfnàend_pfn当成一个闲暇区域执行后面的代码。

         4667-4681行,执行到这段代码,表示整个区段都在都是可以作为ZONE_MOVABLE页面,这段代码中这个闲暇区段中保存非ZONE_MOVABLE区域页面。

         4691-4693行自减商数usable_nodes,并测试usable_nodes&& required_kernelcore > usable_nodes,这样可以比较无限循环,并在每一个节点需要保存的非ZONE_MOVABLE区域页的数量大于1时,重新扫描。

         4696-4698行对齐ZONE_MOVABLE区域的首页帧。

         4702恢复node_state数组。

 

    

free_area_init_node函数

         free_area_init_node函数初始化节点,在mm/page_alloc.c中实现,代码如下:

    4420 void __paginginitfree_area_init_node(int nid, unsigned long *zones_size,

    4421                 unsigned long node_start_pfn,unsigned long *zholes_size)

    4422 {

    4423        pg_data_t *pgdat = NODE_DATA(nid);

    4424

    4425        pgdat->node_id = nid;

    4426        pgdat->node_start_pfn = node_start_pfn;

    4427        calculate_node_totalpages(pgdat, zones_size, zholes_size);

    4428

    4429        alloc_node_mem_map(pgdat);

    4430 #ifdef CONFIG_FLAT_NODE_MEM_MAP

    4431        printk(KERN_DEBUG "free_area_init_node: node %d, pgdat %08lx,node_mem_map %08lx\n",

    4432                 nid, (unsigned long)pgdat,

    4433                 (unsignedlong)pgdat->node_mem_map);

    4434 #endif

    4435

    4436        free_area_init_core(pgdat, zones_size, zholes_size);

    4437 }

         free_area_init_node函数调用calculate_node_totalpages对节点长度和节点总可用页面数停止初始化。calculate_node_totalpages函数是通过调用zone_spanned_pages_in_node和

    zone_absent_pages_in_node函数实现的,这两个函数上面已经分析过。

    alloc_node_mem_map是对节点的page管理数据初始化。其他的初始化工作在free_area_init_core函数中完成。

 

 

    

alloc_node_mem_map函数

         alloc_node_mem_map函数分配节点的page管理数组的内存,在mm/page_alloc.c中实现,代码如下:

    4379 static void __init_refokalloc_node_mem_map(struct pglist_data *pgdat)

    4380 {

    4381        /* Skip empty nodes */

    4382        if (!pgdat->node_spanned_pages)

    4383                 return;

    4384

    4385 #ifdef CONFIG_FLAT_NODE_MEM_MAP

    4386        /* ia64 gets its own node_mem_map, before this, without bootmem */

    4387        if (!pgdat->node_mem_map) {

    4388                 unsigned long size, start,end;

    4389                 struct page *map;

    4390

    4391                 /*

    4392                  * The zone's endpoints aren'trequired to be MAX_ORDER

    4393                  * aligned but thenode_mem_map endpoints must be in order

    4394                 * for the buddyallocator to function correctly.

    4395                  */

    4396                 start =pgdat->node_start_pfn & ~(MAX_ORDER_NR_PAGES - 1);

    4397                 end = pgdat->node_start_pfn+ pgdat->node_spanned_pages;

    4398                end = ALIGN(end,MAX_ORDER_NR_PAGES);

    4399                 size =  (end - start) * sizeof(struct page);

    4400                 map =alloc_remap(pgdat->node_id, size);

    4401                 if (!map)

    4402                         map = alloc_bootmem_node_nopanic(pgdat,size);

    4403                 pgdat->node_mem_map = map +(pgdat->node_start_pfn - start);

    4404        }

    4405 #ifndef CONFIG_NEED_MULTIPLE_NODES

    4406        /*

    4407          * With no DISCONTIG, the globalmem_map is just set as node 0's

    4408          */

    4409        if (pgdat == NODE_DATA(0)) {

    4410                 mem_map =NODE_DATA(0)->node_mem_map;

    4411 #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP

    4412                 if (page_to_pfn(mem_map) !=pgdat->node_start_pfn)

    4413                         mem_map -= (pgdat->node_start_pfn -ARCH_PFN_OFFSET);

    4414 #endif /*CONFIG_HAVE_MEMBLOCK_NODE_MAP */

    4415        }

    4416 #endif

    4417 #endif /* CONFIG_FLAT_NODE_MEM_MAP */

    4418 }

        

         在节点结构pglist_data中,成员node_start_pfn是节点的首页帧号,node_spanned_pages是包含旁边不可用页面的节点的长度。node_mem_map指向节点page结构管理数组,并且指向节点首页的page结构。

         4388-4403行的代码执行逻辑是:计算一个页帧范围,这个范围是包含节点的所有页面的最小范围,并且起始页帧和尾页帧都是按最大块对齐的。然后按这个范围来分配存放page结构数组的内存。分配完后(4403行)让node_mem_map成员指向node_start_pfn页帧的page结构地址。

         对page数组的内存是调用alloc_remap和alloc_bootmem_node_nopanic停止分配的,这两个函数中初始化内存分频器章节中分析。

         4410行,在较早的版本,page管理数组的首地址是存放在变量mem_map中的,现在这个变量指向第零个节点的page管理数组

         4412-4413行对page管理结构地址到页帧的转换停止校正。

    

free_area_init_core函数

         free_area_init_core是伙伴系统初始化的核心函数,在mm/page_alloc.c中实现,代码如下:

    4291 static void __paginginitfree_area_init_core(struct pglist_data *pgdat,

    4292                 unsigned long *zones_size,unsigned long *zholes_size)

    4293 {

    4294        enum zone_type j;

    4295        int nid = pgdat->node_id;

    4296        unsigned long zone_start_pfn = pgdat->node_start_pfn;

    4297        int ret;

    4298

    4299        pgdat_resize_init(pgdat);

    4300        pgdat->nr_zones = 0;

    4301        init_waitqueue_head(&pgdat->kswapd_wait);

    4302        pgdat->kswapd_max_order = 0;

    4303        pgdat_page_cgroup_init(pgdat);

    4304

    4305        for (j = 0; j < MAX_NR_ZONES; j++) {

    4306                 struct zone *zone =pgdat->node_zones + j;

    4307                 unsigned long size, realsize,memmap_pages;

    4308                 enum lru_list lru;

    4309

    4310                 size = zone_spanned_pages_in_node(nid,j, zones_size);

    4311                 realsize = size -zone_absent_pages_in_node(nid, j,

    4312                                                                zholes_size);

    4313

    4314                 /*

    4315                  * Adjust realsize so that itaccounts for how much memory

    4316                  * is used by this zone formemmap. This affects the watermark

    4317                  * and per-cpu initialisations

    4318                  */

    4319                 memmap_pages =

    4320                         PAGE_ALIGN(size * sizeof(structpage)) >> PAGE_SHIFT;

    4321                 if (realsize >=memmap_pages) {

    4322                         realsize -=memmap_pages;

    4323                         if (memmap_pages)

    4324                                 printk(KERN_DEBUG

    4325                                       "  %s zone: %lu pages usedfor memmap\n",

    4326                                       zone_names[j], memmap_pages);

    4327                 } else

    4328                         printk(KERN_WARNING

    4329                                 "  %s zone: %lu pages exceeds realsize%lu\n",

    4330                                 zone_names[j],memmap_pages, realsize);

    4331

    4332                 /* Account for reserved pages*/

    4333                 if (j == 0 && realsize> dma_reserve) {

    4334                         realsize -=dma_reserve;

    4335                         printk(KERN_DEBUG"  %s zone: %lu pagesreserved\n",

    4336                                        zone_names[0], dma_reserve);

    4337                 }

    4338

    4339                 if (!is_highmem_idx(j))

    4340                         nr_kernel_pages +=realsize;

    4341                 nr_all_pages += realsize;

    4342

    4343                 zone->spanned_pages = size;

    4344                 zone->present_pages = realsize;

    4345 #ifdef CONFIG_NUMA

    4346                 zone->node = nid;

    4347                 zone->min_unmapped_pages =(realsize*sysctl_min_unmapped_ratio)

    4348                                                / 100;

    4349                 zone->min_slab_pages =(realsize * sysctl_min_slab_ratio) / 100;

    4350 #endif

    4351                 zone->name = zone_names[j];

    4352                spin_lock_init(&zone->lock);

    4353                spin_lock_init(&zone->lru_lock);

    4354                 zone_seqlock_init(zone);

    4355                 zone->zone_pgdat = pgdat;

    4356

    4357                 zone_pcp_init(zone);

    4358                 for_each_lru(lru)

    4359                        INIT_LIST_HEAD(&zone->lruvec.lists[lru]);

    4360                 zone->reclaim_stat.recent_rotated[0]= 0;

    4361                zone->reclaim_stat.recent_rotated[1] = 0;

    4362                zone->reclaim_stat.recent_scanned[0] = 0;

    4363                zone->reclaim_stat.recent_scanned[1] = 0;

    4364                 zap_zone_vm_stats(zone);

    4365                zone->flags = 0;

    4366                 if (!size)

    4367                         continue;

    4368

    4369                set_pageblock_order(pageblock_default_order());

    4370                 setup_usemap(pgdat, zone,size);

    4371                 ret =init_currently_empty_zone(zone, zone_start_pfn,

    4372                                                size, MEMMAP_EARLY);

    4373                 BUG_ON(ret);

    4374                 memmap_init(size, nid, j,zone_start_pfn);

    4375                 zone_start_pfn += size;

    4376        }

    4377 }

         这个函数的代码比较长,但比较简单,就一些变量,锁和链表的初始化。对这个函数本身就不做分析了,而对函数中调用的memmap_init做些分析,memmap_init是一个宏定义如下:

    #define memmap_init(size, nid, zone,start_pfn) \

         memmap_init_zone((size),(nid), (zone), (start_pfn), MEMMAP_EARLY)。

    是对memmap_init_zone函数的调用。

    

memmap_init_zone函数

         memmap_init_zone对一个区域的page管理结构的初始化,在mm/page_alloc.c中实现,代码如下:

    3619 * done. Non-atomic initialization, single-pass.

    3620 */

    3621 void __meminitmemmap_init_zone(unsigned long size, int nid, unsigned long zone,

    3622                unsigned long start_pfn,enum memmap_context context)

    3623 {

    3624        struct page *page;

    3625        unsigned long end_pfn = start_pfn + size;

    3626        unsigned long pfn;

    3627        struct zone *z;

    3628

    3629        if (highest_memmap_pfn < end_pfn - 1)

    3630                 highest_memmap_pfn = end_pfn -1;

    3631

    3632        z = &NODE_DATA(nid)->node_zones[zone];

    3633        for (pfn = start_pfn; pfn < end_pfn; pfn++) {

    3634                 /*

    3635                  * There can be holes inboot-time mem_map[]s

    3636                  * handed to thisfunction.  They do not

    3637                  * exist on hotplugged memory.

    3638                  */

    3639                 if (context == MEMMAP_EARLY) {

    3640                         if (!early_pfn_valid(pfn))

    3641                                 continue;

    3642                         if(!early_pfn_in_nid(pfn, nid))

    3643                                 continue;

    3644                 }

    3645                 page = pfn_to_page(pfn);

    3646                 set_page_links(page, zone, nid, pfn);

    3647                 mminit_verify_page_links(page,zone, nid, pfn);

    3648                 init_page_count(page);

    3649                 reset_page_mapcount(page);

    3650                 SetPageReserved(page);

    3651                /*

    3652                  * Mark the block movable sothat blocks are reserved for

    3653                  * movable at startup. Thiswill force kernel allocations

    3654                  * to reserve their blocksrather than leaking throughout

    3655                  * the address space duringboot when many long-lived

    3656                  * kernel allocations aremade. Later some blocks near

    3657                  * the start are markedMIGRATE_RESERVE by

    3658                  * setup_zone_migrate_reserve()

    3659                  *

    3660                  * bitmap is created forzone's valid pfn range. but memmap

    3661                  * can be created for invalidpages (for alignment)

    3662                  * check here not to callset_pageblock_migratetype() against

    3663                  * pfn out of zone.

    3664                  */

    3665                 if ((z->zone_start_pfn<= pfn)

    3666                     && (pfn <z->zone_start_pfn + z->spanned_pages)

    3667                     && !(pfn &(pageblock_nr_pages - 1)))

    3668                        set_pageblock_migratetype(page, MIGRATE_MOVABLE);

    3669

    3670                INIT_LIST_HEAD(&page->lru);

    3671 #ifdef WANT_PAGE_VIRTUAL

    3672                 /* The shift won't overflowbecause ZONE_NORMAL is below 4G. */

    3673                 if (!is_highmem_idx(zone))

    3674                         set_page_address(page,__va(pfn << PAGE_SHIFT));

    3675 #endif

    3676        }

    3677 }

         3629-3630行highest_memmap_pfn是存在page管理结构的最大的页帧号,如果本管理区的最大的存在page管理结构的最大的页帧号大于highest_memmap_pfn,就需要更新highest_memmap_pfn。

         3632行取得区域结构地址。

         3633对区域的所有页帧停止遍历。

         3640-3641行检查页帧号是否合法,也就是要小于系统最大的页帧号,大于系统允许的最小的页帧。

         3642-3643行检查页帧pfn是否属于节点nid。

         3645行取得pfn帧的page管理结构地址。

         3646行调用set_page_links函数设置页面的一些链接,主要包含页面地点节点,页面的区域类型,页面地点段。这样信息都是保存在page结构的成员flags中,每种信息占用一些位。3647行对设置的页面地点节点,页面的区域类型,页面地点段的信息停止验证,如果有错误输出一些调试信息。

         3648初始引用数信息,3649初始化映射数信息。

         3665-3668行,对每一个最大块的首帧,调用set_pageblock_migratetype函数设置迁徙类型信息,set_pageblock_migratetype函数在伙伴系统的内存迁徙一节有分析。

         3647行设置页面映射的虚拟地址。

        

        

    

      ====区域列表的初始化

    

build_all_zonelists函数

         区域列表的初始化由函数build_all_zonelists来完成,build_all_zonelists函数的进入路径是:

         start_kernel() at init/main.c:504

    build_all_zonelists() at mm/page_alloc.c:3409

 

    build_all_zonelists在mm/page_alloc.c中实现,代码如下:

    3408 void __refbuild_all_zonelists(void *data)

    3409 {

    3410         set_zonelist_order();

    3411

    3412         if (system_state == SYSTEM_BOOTING) {

    3413                 __build_all_zonelists(NULL);

    3414                 mminit_verify_zonelist();

    3415                cpuset_init_current_mems_allowed();

    3416         } else {

    3417                 /* we have to stop all cpus toguarantee there is no user

    3418                    of zonelist */

    3419 #ifdefCONFIG_MEMORY_HOTPLUG

    3420                 if (data)

    3421                        setup_zone_pageset((struct zone *)data);

    3422 #endif

    3423                stop_machine(__build_all_zonelists, NULL, NULL);

    3424                 /* cpuset refresh routineshould be here */

    3425         }

    3426         vm_total_pages =nr_free_pagecache_pages();

    3427         /*

    3428          * Disable grouping by mobility if thenumber of pages in the

    3429          * system is too low to allow themechanism to work. It would be

    3430          * more accurate, but expensive tocheck per-zone. This check is

    3431          * made on memory-hotadd so a system canstart with mobility

    3432          * disabled and enable it later

    3433          */

    3434         if (vm_total_pages <(pageblock_nr_pages * MIGRATE_TYPES))

    3435                page_group_by_mobility_disabled = 1;

    3436         else

    3437                 page_group_by_mobility_disabled= 0;

    3438

    3439         printk("Built %i zonelists in %sorder, mobility grouping %s.  "

    3440                 "Total pages:%ld\n",

    3441                         nr_online_nodes,

    3442                         zonelist_order_name[current_zonelist_order],

    3443                        page_group_by_mobility_disabled ? "off" : "on",

    3444                         vm_total_pages);

    3445 #ifdefCONFIG_NUMA

    3446         printk("Policy zone: %s\n",zone_names[policy_zone]);

    3447 #endif

    3448 }

 

    在初始化过程中,函数会进入3413-3415行代码运行。

    3413行区域列表的初始的主体工作是在__build_all_zonelists中完成的。分析完本函数后分析__build_all_zonelists函数。

    3414行调用mminit_verify_zonelist函数做一些验证工作。

         在伙伴系统的内存分配一节中,我们把伙伴系统内存分为三个阶段,而第一阶段的主要任务是确定区域列表和节点掩码。在进程结构中有个成员mems_allowed,是一个节点掩码,表示进程所允许分配内存的节点,只有一个节点包含在进程的mems_allowed中,并且在内存策略也允许在这个节点停止分配时才会到这个节点停止内存分配。cpuset_init_current_mems_allowed设置进程的mems_allowed成员包含所有节点。

         3626行,nr_free_pagecache_pages返回的是对所有区域可用页面数减去高水位线后的的剩余页面数相加的值,这个值作为剩余可用页面数。

         3434行,如果剩余可用页面小于pageblock_nr_pages * MIGRATE_TYPES,也就是说如果不能满足每一个迁徙类型都包含一个迁徙块。则禁用迁徙类型,禁用迁徙类型后所有页面的迁徙都会迁徙到MIGRATE_UNMOVABLE迁徙类型,也就是不可迁徙类型。

        

    

__build_all_zonelists函数

         __build_all_zonelists在mm/page_alloc.c中实现,代码如下:

    3356 static__init_refok int __build_all_zonelists(void *data)

    3357 {

    3358         int nid;

    3359         int cpu;

    3360

    3361 #ifdefCONFIG_NUMA

    3362         memset(node_load, 0,sizeof(node_load));

    3363 #endif

    3364         for_each_online_node(nid) {

    3365                 pg_data_t *pgdat =NODE_DATA(nid);

    3366

    3367                 build_zonelists(pgdat);

    3368                 build_zonelist_cache(pgdat);

    3369         }

    3370

    3371         /*

    3372          * Initialize the boot_pagesets thatare going to be used

    3373          * for bootstrapping processors. Thereal pagesets for

    3374          * each zone will be allocated laterwhen the per cpu

    3375          * allocator is available.

    3376          *

    3377          * boot_pagesets are used also forbootstrapping offline

    3378          * cpus if the system is alreadybooted because the pagesets

    3379          * are needed to initialize allocatorson a specific cpu too.

    3380          * F.e. the percpu allocator needs thepage allocator which

    3381          * needs the percpu allocator in orderto allocate its pagesets

    3382          * (a chicken-egg dilemma).

    3383          */

    3384         for_each_possible_cpu(cpu) {

    3385                setup_pageset(&per_cpu(boot_pageset, cpu), 0);

    3386

    3387 #ifdefCONFIG_HAVE_MEMORYLESS_NODES

    3388                 /*

    3389                  * We now know the "localmemory node" for each node--

    3390                  * i.e., the node of the firstzone in the generic zonelist.

    3391                  * Set up numa_mem percpuvariable for on-line cpus.  During

    3392                  * boot, only the boot cpushould be on-line;  we'll init the

    3393                  * secondary cpus' numa_mem as theycome on-line.  During

    3394                  * node/memory hotplug, we'llfixup all on-line cpus.

    3395                  */

    3396                 if (cpu_online(cpu))

    3397                         set_cpu_numa_mem(cpu,local_memory_node(cpu_to_node(cpu)));

    3398 #endif

    3399         }

    3400

    3401         return 0;

    3402 }

 

    区域列表是区域的有序集合,设置区域列表的目的是为了从列表中选择一个区域,在区域中停止内存分配。

    有几个因素会影响区域的选择:

    1:一个是区域在区域列表中的顺序。

    2:还有一个是分配标志位指定的最大区域类型,一些分配只能在低端内存中分配,如一些只支持低端内存访问的设备驱动程序。当选择一个区域时,要斟酌区域的类型,只有区域类型小于等于标志位指定的最大区域类型,才选择这个区域。

    3:在分配的时候,如果快速通道分配内存失败,在慢速通道中会记录区域内存不充足缓存信息,在内存的时候会检查内存内存是否充足的缓存信息,这会影响区域的选择。

    4: 节点掩码也会影响区域的选择,只会选择在节点掩码集合中的区域。

    斟酌这几个因素,我们就可以解释区域列表的结构zonelist的定义了,为什么在列表中定义一个zoneref数组,而不直接定义一个zone的数组指针?zoneref结构包含一个zone结构指针zone和zone_idx是区域的类型,斟酌第二个因素,在我们扫描区域列表的一项,需要的区域类型直接可以从zoneref成员的zone_idx得到。

    而zonelist的成员zlcache是个zonelist_cache结构。用来保存区域的内存是否充足信息,对区域列表中的每一个区域,zonelist_cache结构的成员fullzones,是个位图数组,和zonelist结构的zoneref数组是对应的,用来表示zoneref数组索引的项内存是否充足,z_to_n用来实现从数组索引到节点号的转换,在zlc_zone_worth_trying函数中会用到这些参数。

    zonelist的成员zlcache_ptr指向实际可用的zonelist_cache结构地址,zlcache_ptr不总是指向zonelist的zonelist_cache。

    3364-3369行,遍历所有在线的节点,调用函数build_zonelists初始化节点的区域列表,每一个节点包含若干个区域列表。调用build_zonelist_cache初始化节点的内存是否充足缓存信息。

    3384-3399行,编译所有可用的cpu,调用setup_pageset初始化每cpu页缓存信息。3396-3397行对在线的cpu,调用set_cpu_numa_mem设置cpu地点节点。

    在后面只分析build_zonelists的指向流程,build_zonelist_cache和其他部分不分析了。

 

    

build_zonelists函数

         build_zonelists初始化一个节点的区域列表,在mm/page_alloc.c中实现,代码如下:

 

    3286 static void build_zonelists(pg_data_t*pgdat)

    3287 {

    3288        int node, local_node;

    3289        enum zone_type j;

    3290        struct zonelist *zonelist;

    3291

    3292        local_node =pgdat->node_id;

    3293

    3294        zonelist = &pgdat->node_zonelists[0];

    3295        j = build_zonelists_node(pgdat, zonelist, 0, MAX_NR_ZONES - 1);

    3296

    3297        /*

    3298          * Now we build the zonelist so thatit contains the zones

    3299          * of all the other nodes.

    3300          * We don't want to pressure aparticular node, so when

    3301          * building the zones for node N, wemake sure that the

    3302          * zones coming right after the localones are those from

    3303          * node N+1 (modulo N)

    3304          */

    3305        for (node = local_node + 1; node < MAX_NUMNODES; node++) {

    3306                 if (!node_online(node))

    3307                         continue;

    3308                 j = build_zonelists_node(NODE_DATA(node),zonelist, j,

    3309                                                        MAX_NR_ZONES - 1);

    3310        }

    3311        for (node = 0; node < local_node; node++) {

    3312                 if (!node_online(node))

    3313                         continue;

    3314                 j =build_zonelists_node(NODE_DATA(node), zonelist, j,

    3315                                                        MAX_NR_ZONES - 1);

    3316        }

    3317

    3318        zonelist->_zonerefs[j].zone = NULL;

    3319        zonelist->_zonerefs[j].zone_idx = 0;

    3320 }

         build_zonelists_node函数把一个包含的区域编译到区域列表。

         这个函数的重点是区域列表初始化的顺序,local_node是本节点的号码,从3295,3305,3311行我们可以晓得,对在线的节点点,对节点的初始化顺序是local_node, local_node+1,…,MAX_NR_ZONES – 1,0,…, local_node-1。

         3318-3319我们晓得对最后一个区域索引项,索引的是空区域,而前面的每一个区域索引项都指向非空区域,这样我们可以判断区域列表的结束。

    

build_zonelists_node函数

    build_zonelists_node把一个节点的区域编译到区域列表,把节点pgdat中类型小于等于zone_type的区域以nr_zones项开始编译到区域列表zonelist。build_zonelists_node函数在mm/page_alloc.c中实现,代码如下:

    2860 static intbuild_zonelists_node(pg_data_t *pgdat, struct zonelist *zonelist,

    2861                                 int nr_zones,enum zone_type zone_type)

    2862 {

    2863        struct zone *zone;

    2864

    2865        BUG_ON(zone_type >= MAX_NR_ZONES);

    2866        zone_type++;

    2867

    2868        do {

    2869                 zone_type--;

    2870                 zone = pgdat->node_zones +zone_type;

    2871                 if (populated_zone(zone)) {

    2872                         zoneref_set_zone(zone,

    2873                                &zonelist->_zonerefs[nr_zones++]);

    2874                        check_highest_zone(zone_type);

    2875                 }

    2876

    2877        } while (zone_type);

    2878        return nr_zones;

    2879 }

         区域被编译的顺序和区域类型是一致的,populated_zone是判断区域是否具有可用页面,有可用页返回真,否则返回假。check_highest_zone更新policy_zone变量,policy_zone变量保存在系统中能用的非ZONE_MOVABLE的最大的区域类型。

文章结束给大家分享下程序员的一些笑话语录: 程序语言综述
CLIPPER 程序员不去真的猎捕大象,他们只是购买大象部分的库然后花几年的时间试图综合它们。
DBASE 程序员只在夜间猎捕大象,因为那时没人会注意到他们还在使用石弓。
FOXPRO 程序员开始使用更新更好的步枪,这使他们花掉比实际狩猎更多的时间学习新的射击技术。
C 程序员拒绝直接购买步枪,宁可带着钢管和一个移动式机器车间到非洲,意欲从零开始造一枝完美的步枪。
PARADOX 程序员去非洲时带着好莱坞关于猎捕大象的电影剧本,他们认为照剧本行事就会逮到一头大象。
ACCESS 程序员在没有任何猎象经验的经验下就出发了,他们穿着华丽的猎装、带着全部装备,用漂亮的望远镜找到了大象,然后发觉忘了带扳机。
RBASE 程序员比大象还要稀少,事实上,如果一头大象看到了一个RBASE程序员,对他是个幸运日。
VISUAL ACCESS 程序员装上子弹、举起步枪、瞄准大象,这使大象感到可笑,究竟谁逃跑。他们无法抓住大象,因为由于他们对多重控制的偏爱,他们的吉普车有太多的方向盘因而无法驾驶。
ADA、APL和FORTRAN 程序员与圣诞老人和仙女一样是虚构的。
COBOL 程序员对和自己一样濒临灭绝的大象寄予了深切的同情。

--------------------------------- 原创文章 By 区域和函数 ---------------------------------

你可能感兴趣的:(linux)