任何一个体系结构,如果打算利用内核提供的一般性框架,则需要设置配置ARCH_POPULATES_NODE_MAP。在注册所有活动内存区后,其余的工作由通用的内核代码完成。
活动内存区就是不包含空洞的内存区。必须使用add_active_range在全局变量early_node_map中注册内存区。(在2.6.25中如此,3.18.3中已经找不到该变量及函数)
3.18.3内核:
start_kernel
->setup_arch
->machine_specific_memory_setup
->setup_memory
->paging_init
->zone_sizes_init();
2.6.25内核:
setup_arch
->setup_arch
->zone_sizes_init
->add_active_range
这里主要参考2.26.25版本内核。
当前注册的内存区数目记载在nr_nodemap_entries中。不同内存区的最大数目由MAX_ACTIVE_REGION给出。
/* * MAX_ACTIVE_REGIONS determines the maximum number of distinct * ranges of memory (RAM) that may be registered with add_active_range(). * Ranges passed to add_active_range() will be merged if possible * so the number of times add_active_range() can be called is * related to the number of nodes and the number of holes */ #ifdef CONFIG_MAX_ACTIVE_REGIONS /* Allow an architecture to set MAX_ACTIVE_REGIONS to save memory */ #define MAX_ACTIVE_REGIONS CONFIG_MAX_ACTIVE_REGIONS #else #if MAX_NUMNODES >= 32 /* If there can be many nodes, allow up to 50 holes per node */ #define MAX_ACTIVE_REGIONS (MAX_NUMNODES*50) #else /* By default, allow up to 256 distinct regions */ #define MAX_ACTIVE_REGIONS 256 #endif #endif static struct node_active_region __meminitdata early_node_map[MAX_ACTIVE_REGIONS]; void __init add_active_range(unsigned int nid, unsigned long start_pfn, unsigned long end_pfn);
如果不设置,在默认情况下内核允许每个内存结点注册256个活动内存区(如果在超过32个结点的系统上,允许每个NUMA结点注册50个内存区)。每个内存区由下列数据结构描述:
#ifdef CONFIG_ARCH_POPULATES_NODE_MAP struct node_active_region { unsigned long start_pfn; unsigned long end_pfn; int nid; }; #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
Start_pfn和end_pfn标记了一个连续内存区中的第一个和最后一个页帧,nid是该内存区所属结点的NUMA ID。UMA系统设置为0.
活动内存区是使用add_active_range注册的:
void __init add_active_range(unsigned int nid, unsigned long start_pfn, unsigned long end_pfn) { int i; printk(KERN_DEBUG "Entering add_active_range(%d, %lu, %lu) " "%d entries of %d used\n", nid, start_pfn, end_pfn, nr_nodemap_entries, MAX_ACTIVE_REGIONS); /* Merge with existing active regions if possible */ for (i = 0; i < nr_nodemap_entries; i++) { if (early_node_map[i].nid != nid) continue; /* Skip if an existing region covers this new one */ if (start_pfn >= early_node_map[i].start_pfn && end_pfn <= early_node_map[i].end_pfn) return; /* Merge forward if suitable */ if (start_pfn <= early_node_map[i].end_pfn && end_pfn > early_node_map[i].end_pfn) { early_node_map[i].end_pfn = end_pfn; return; } /* Merge backward if suitable */ if (start_pfn < early_node_map[i].end_pfn && end_pfn >= early_node_map[i].start_pfn) { early_node_map[i].start_pfn = start_pfn; return; } } /* Check that early_node_map is large enough */ if (i >= MAX_ACTIVE_REGIONS) { printk(KERN_CRIT "More than %d memory regions, truncating\n", MAX_ACTIVE_REGIONS); return; } early_node_map[i].nid = nid; early_node_map[i].start_pfn = start_pfn; early_node_map[i].end_pfn = end_pfn; nr_nodemap_entries = i + 1; }
在注册两个毗邻的内存区时,add_active_range会确保将他们合并为一个。此外,该函数不提供其他额外的功能特性。该函数在zone_sizes_init调用,zone_sizes_init函数以页帧为单位,存储了不同内存区的边界:
void __init zone_sizes_init(void) { unsigned long max_zone_pfns[MAX_NR_ZONES]; memset(max_zone_pfns, 0, sizeof(max_zone_pfns)); max_zone_pfns[ZONE_DMA] = virt_to_phys((char *)MAX_DMA_ADDRESS) >> PAGE_SHIFT; max_zone_pfns[ZONE_NORMAL] = max_low_pfn; #ifdef CONFIG_HIGHMEM max_zone_pfns[ZONE_HIGHMEM] = highend_pfn; add_active_range(0, 0, highend_pfn); #else add_active_range(0, 0, max_low_pfn); #endif free_area_init_nodes(max_zone_pfns); }
MAX_DMA_ADDRESS是适用于DMA操作的最高内存地址。该常数声明为:
unsigned long MAX_DMA_ADDRESS = PAGE_OFFSET + 0x100000000UL;
物理内存页映射到PAGE_OFFSET开始的虚拟地址空间,而物理内存的前16MB适合于DMA操作,十六进制表示就是0x1000000字节。用virt_to_phys转换,可以获得物理内存地址,而右移PAGE_SHIFT位则相当于除以页大小,计算最后得到适用于DMA的页数。不出意料之外,在使用4KB页的IA-32系统上,结果是4096页。
Max_low_pfn和hignend_pfn是全局常量,分别指定了低端(如果按照3:1划分,通常<=896MB)和高端内存中最高的页号。
Free_area_init_nodes会合并early_mem_map和max_zone_pfns中的信息,其分别选择各个内存域中的活动内存区,并构建体系结构无关的数据结构。
在AMD64上注册内存区使用如下函数:
/* Walk the e820 map and register active regions within a node */ void __init e820_register_active_regions(int nid, unsigned long start_pfn, unsigned long end_pfn) { unsigned long ei_startpfn; unsigned long ei_endpfn; int i; for (i = 0; i < e820.nr_map; i++) if (e820_find_active_region(&e820.map[i], start_pfn, end_pfn, &ei_startpfn, &ei_endpfn)) add_active_range(nid, ei_startpfn, ei_endpfn); }
本质上,上述代码就是根据BIOS提供的信息遍历所有的内存区,并针对每个内存区找到活动内存区。这一点与IA-32对比是不同的,add_active_range可能会调用很多次。
Max_zone_pfns值的设置有paging_init处理:
#ifndef CONFIG_NUMA void __init paging_init(void) { unsigned long max_zone_pfns[MAX_NR_ZONES]; memset(max_zone_pfns, 0, sizeof(max_zone_pfns)); max_zone_pfns[ZONE_DMA] = MAX_DMA_PFN; max_zone_pfns[ZONE_DMA32] = MAX_DMA32_PFN; max_zone_pfns[ZONE_NORMAL] = end_pfn; memory_present(0, 0, end_pfn); sparse_init(); free_area_init_nodes(max_zone_pfns); } #endif
16位和32位DMA内存域的页帧边界保存在预处理器符号中,分别对应于16MB和4GB转换为页帧的值。
#if defined(CONFIG_SGI_IP22) || defined(CONFIG_SGI_IP28) /* don't care; ISA bus master won't work, ISA slave DMA supports 32bit addr */ #define MAX_DMA_ADDRESS PAGE_OFFSET #else #define MAX_DMA_ADDRESS (PAGE_OFFSET + 0x01000000) #endif #define MAX_DMA_PFN PFN_DOWN(virt_to_phys((void *)MAX_DMA_ADDRESS)) #define MAX_DMA32_PFN (1UL << (32 - PAGE_SHIFT))