启动过程中,内核先后使用的内存分配器有:early_res,bootmem,zone allocator;后一个内存分配器启用之后,前一个内存分配器不再使用。
early_res是内存最早使用的内存分配器
arch/x86/kernel/e820.c:
1 /* 2 * Handle the memory map. 3 * The functions here do the job until bootmem takes over.
内核获取内存信息
1、实模式下用BIOS提供的中断服务获取物理内存信息,并保存在boot_params的e820_map字段中。
2、将实模式下的boot_params复制到保护模式的boot_params中
3、预留内存段信息存放到early_res中,最多20个
4、将boot_params的e820_map值复制到e820中
5、bootmem初始化,标记所有内存为已使用
6、bootmem从e820_map中取得可用内存,注册到bootmem中
7、bootmem将出现在early_res预留内存段的内存标记为已使用
1、实模式下获取BIOS提供的物理内存信息
boot/header.S->main->detect_memory->detect_memory_e820
调用BIOS提供的0x15中断服务获取物理内存信息
2、实模式boot_params复制到保护模式boot_params中
获取实模式数据过程:
a、切换保护模式,boot/header.S->main.c->pmjump.S
b、内核解压,boot/compressed/head_32.S
c、参数复制,kernel/head_32.S
跳转到压缩内核时传递的参数boot_params放在esi中,而esi在解压过程中不变,从而传给解压后内核的启动函数startup_32;
arch/x86/kernel/head_32.S:
110 /* 111 * Copy bootup parameters out of the way. 112 * Note: %esi still has the pointer to the real-mode data. 113 * With the kexec as boot loader, parameter segment might be loaded beyond 114 * kernel image and might not even be addressable by early boot page tables. 115 * (kexec on panic case). Hence copy out the parameters before initializing 116 * page tables. 117 */ 118 movl $pa(boot_params),%edi 119 movl $(PARAM_SIZE/4),%ecx 120 cld 121 rep
3、预留内存区间
内核在未启动bootmem之前,使用early_res存放预留内存区间;在bootmem启动时将预留区间注入到bootmem中。
early_res分配内存主要有:内核代码数据、页表、BOOTMAP(bootmem中使用的位图)等
arch/x86/kernel/head32.c:
34 reserve_early(__pa_symbol(&_text), __pa_symbol(&__bss_stop), "TEXT DATA BSS");
4、e820取出boot_params中e820map数据
start_kernel->setup_arch->setup_memory_map
dmesg | less BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009f800 (usable) BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved) BIOS-e820: 00000000000ca000 - 00000000000cc000 (reserved) BIOS-e820: 00000000000dc000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000007fef0000 (usable) BIOS-e820: 000000007fef0000 - 000000007feff000 (ACPI data) BIOS-e820: 000000007feff000 - 000000007ff00000 (ACPI NVS) BIOS-e820: 000000007ff00000 - 0000000080000000 (usable) BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved) BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved) BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved) BIOS-e820: 00000000fffe0000 - 0000000100000000 (reserved)
5、bootmem初始化
start_kernel->setup_arch->initmem_init
6、bootmem注册可用内存
a、将e820中可用内存放入active_regions中(start_kernel->setup_arch->initmem_init->e820_register_active_regions)
b、将active_regions中的可用内存释放到bootmem中(start_kernel->setup_arch->initmem_init->setup_bootmem_allocator->setup_node_bootmem->free_bootmem_with_active_regions)
7、bootmem保留预留内存区间
start_kernel->setup_arch->initmem_init->setup_bootmem_allocator->setup_node_bootmem->early_res_to_bootmem
dmesg | less mapped low ram: 0 - 375fe000 low ram: 0 - 375fe000 node 0 low ram: 00000000 - 375fe000 node 0 bootmap 00014000 - 0001aec0 (9 early reservations) ==> bootmem [0000000000 - 00375fe000] #0 [0000000000 - 0000001000] BIOS data page ==> [0000000000 - 0000001000] #1 [0000001000 - 0000002000] EX TRAMPOLINE ==> [0000001000 - 0000002000] #2 [0000006000 - 0000007000] TRAMPOLINE ==> [0000006000 - 0000007000] #3 [0000400000 - 0000befe50] TEXT DATA BSS ==> [0000400000 - 0000befe50] #4 [000009f800 - 0000100000] BIOS reserved ==> [000009f800 - 0000100000] #5 [0000bf0000 - 0000bf81c8] BRK ==> [0000bf0000 - 0000bf81c8] #6 [0000010000 - 0000014000] PGTABLE ==> [0000010000 - 0000014000] #7 [0000bf9000 - 0001941182] NEW RAMDISK ==> [0000bf9000 - 0001941182] #8 [0000014000 - 000001b000] BOOTMAP ==> [0000014000 - 000001b000]
=======================================================
e820map,BIOS提供的物理内存map
I.处理e820中重叠部分
sanitize_e820_map对e820map中重叠的部分进行处理,将重叠部分归并到最高优先级,并从低优先级区间中删除;在此过程中可能出现合并与分割。
173 /* 174 * Sanitize the BIOS e820 map. 175 * 176 * Some e820 responses include overlapping entries. The following 177 * replaces the original e820 map with a new one, removing overlaps, 178 * and resolving conflicting memory types in favor of highest 179 * numbered type. 180 * 181 * The input parameter biosmap points to an array of 'struct 182 * e820entry' which on entry has elements in the range [0, *pnr_map) 183 * valid, and which has space for up to max_nr_map entries. 184 * On return, the resulting sanitized e820 map entries will be in 185 * overwritten in the same location, starting at biosmap. 186 * 187 * The integer pointed to by pnr_map must be valid on entry (the 188 * current number of valid entries located at biosmap) and will 189 * be updated on return, with the new number of valid entries 190 * (something no more than max_nr_map.) 191 * 192 * The return value from sanitize_e820_map() is zero if it 193 * successfully 'sanitized' the map entries passed in, and is -1 194 * if it did nothing, which can happen if either of (1) it was 195 * only passed one map entry, or (2) any of the input map entries 196 * were invalid (start + size < start, meaning that the size was 197 * so big the described memory range wrapped around through zero.) 198 * 199 * Visually we're performing the following 200 * (1,2,3,4 = memory types)... 201 * 202 * Sample memory map (w/overlaps): 203 * ____22__________________ 204 * ______________________4_ 205 * ____1111________________ 206 * _44_____________________ 207 * 11111111________________ 208 * ____________________33__ 209 * ___________44___________ 210 * __________33333_________ 211 * ______________22________ 212 * ___________________2222_ 213 * _________111111111______ 214 * _____________________11_ 215 * _________________4______ 216 * 217 * Sanitized equivalent (no overlap): 218 * 1_______________________ 219 * _44_____________________ 220 * ___1____________________ 221 * ____22__________________ 222 * ______11________________ 223 * _________1______________ 224 * __________3_____________ 225 * ___________44___________ 226 * _____________33_________ 227 * _______________2________ 228 * ________________1_______ 229 * _________________4______ 230 * ___________________2____ 231 * ____________________33__ 232 * ______________________4_ 233 */ 234 235 int __init sanitize_e820_map(struct e820entry *biosmap, int max_nr_map, 236 u32 *pnr_map)
=======================================================
early_res内存预留区间
I.early_res数据结构
arch/x86/kernel/e820.c
724 /* 725 * Early reserved memory areas. 726 */ 727 #define MAX_EARLY_RES 20 728 729 struct early_res { 730 u64 start, end; 731 char name[16]; 732 char overlap_ok; 733 }; 734 static struct early_res early_res[MAX_EARLY_RES] __initdata = { 735 { 0, PAGE_SIZE, "BIOS data page" }, /* BIOS data page */ 736 {} 737 };
start:预留区间起始地址
end:预留区间终止地址,0标识数组结束
name:预留区间名称
overlap_ok:可重叠
II.查找重叠的内存区间:
arch/x86/kernel/e820.c
739 static int __init find_overlapped_early(u64 start, u64 end) 740 { 741 int i; 742 struct early_res *r; 743 744 for (i = 0; i < MAX_EARLY_RES && early_res[i].end; i++) { 745 r = &early_res[i]; 746 if (end > r->start && start < r->end) 747 break; 748 } 749 750 return i; 751 }
查找预留内存区间数组,存在与区间(start,end)重叠的返回重叠区间索引,不存在重叠返回结束区间索引。
III.预留内存区间
1、预留内存区间
838 static void __init __reserve_early(u64 start, u64 end, char *name, 839 int overlap_ok) 840 { 841 int i; 842 struct early_res *r; 843 844 i = find_overlapped_early(start, end); 845 if (i >= MAX_EARLY_RES) 846 panic("Too many early reservations"); 847 r = &early_res[i]; 848 if (r->end) 849 panic("Overlapping early reservations " 850 "%llx-%llx %s to %llx-%llx %s\n", 851 start, end - 1, name?name:"", r->start, 852 r->end - 1, r->name); 853 r->start = start; 854 r->end = end; 855 r->overlap_ok = overlap_ok; 856 if (name) 857 strncpy(r->name, name, sizeof(r->name) - 1); 858 }
添加一个新区间到early_res数组的尾部,如果出现重叠或超出预留区间最大值MAX_EARLY_RES则内核panic
2、删除可重叠区间中重叠部分
771 /* 772 * Split any existing ranges that: 773 * 1) are marked 'overlap_ok', and 774 * 2) overlap with the stated range [start, end) 775 * into whatever portion (if any) of the existing range is entirely 776 * below or entirely above the stated range. Drop the portion 777 * of the existing range that overlaps with the stated range, 778 * which will allow the caller of this routine to then add that 779 * stated range without conflicting with any existing range. 780 */ 781 static void __init drop_overlaps_that_are_ok(u64 start, u64 end) 782 { 783 int i; 784 struct early_res *r; 785 u64 lower_start, lower_end; 786 u64 upper_start, upper_end; 787 char name[16]; 788 789 for (i = 0; i < MAX_EARLY_RES && early_res[i].end; i++) { 790 r = &early_res[i]; 791 792 /* Continue past non-overlapping ranges */ 793 if (end <= r->start || start >= r->end) 794 continue; 795 796 /* 797 * Leave non-ok overlaps as is; let caller 798 * panic "Overlapping early reservations" 799 * when it hits this overlap. 800 */ 801 if (!r->overlap_ok) 802 return; 803 804 /* 805 * We have an ok overlap. We will drop it from the early 806 * reservation map, and add back in any non-overlapping 807 * portions (lower or upper) as separate, overlap_ok, 808 * non-overlapping ranges. 809 */ 810 811 /* 1. Note any non-overlapping (lower or upper) ranges. */ 812 strncpy(name, r->name, sizeof(name) - 1); 813 814 lower_start = lower_end = 0; 815 upper_start = upper_end = 0; 816 if (r->start < start) { 817 lower_start = r->start; 818 lower_end = start; 819 } 820 if (r->end > end) { 821 upper_start = end; 822 upper_end = r->end; 823 } 824 825 /* 2. Drop the original ok overlapping range */ 826 drop_range(i); 827 828 i--; /* resume for-loop on copied down entry */ 829 830 /* 3. Add back in any non-overlapping ranges. */ 831 if (lower_end) 832 reserve_early_overlap_ok(lower_start, lower_end, name); 833 if (upper_end) 834 reserve_early_overlap_ok(upper_start, upper_end, name); 835 } 836 }
a.计算出未重叠的前半部分与后半部分
b.释放掉原区间
c.将未重叠的前半部分与后半部分以可重叠的方式加入到预留内存区间
3.分配可重叠区间
860 /* 861 * A few early reservtations come here. 862 * 863 * The 'overlap_ok' in the name of this routine does -not- mean it 864 * is ok for these reservations to overlap an earlier reservation. 865 * Rather it means that it is ok for subsequent reservations to 866 * overlap this one. 867 * 868 * Use this entry point to reserve early ranges when you are doing 869 * so out of "Paranoia", reserving perhaps more memory than you need, 870 * just in case, and don't mind a subsequent overlapping reservation 871 * that is known to be needed. 872 * 873 * The drop_overlaps_that_are_ok() call here isn't really needed. 874 * It would be needed if we had two colliding 'overlap_ok' 875 * reservations, so that the second such would not panic on the 876 * overlap with the first. We don't have any such as of this 877 * writing, but might as well tolerate such if it happens in 878 * the future. 879 */ 880 void __init reserve_early_overlap_ok(u64 start, u64 end, char *name) 881 { 882 drop_overlaps_that_are_ok(start, end); 883 __reserve_early(start, end, name, 1); 884 }
首先释放出可重叠区间的重叠部分,然后做以可重叠的方式做区间预留;可重叠是指以后的预留过程中,该区间可以重叠,而不是重叠以前的区间。
4、分配不可重叠区间
886 /* 887 * Most early reservations come here. 888 * 889 * We first have drop_overlaps_that_are_ok() drop any pre-existing 890 * 'overlap_ok' ranges, so that we can then reserve this memory 891 * range without risk of panic'ing on an overlapping overlap_ok 892 * early reservation. 893 */ 894 void __init reserve_early(u64 start, u64 end, char *name) 895 { 896 if (start >= end) 897 return; 898 899 drop_overlaps_that_are_ok(start, end); 900 __reserve_early(start, end, name, 0); 901 }
首先释放出可重叠区间的重叠部分,然后以不可重叠的方式做区间预留
IV.释放内存区间
753 /* 754 * Drop the i-th range from the early reservation map, 755 * by copying any higher ranges down one over it, and 756 * clearing what had been the last slot. 757 */ 758 static void __init drop_range(int i) 759 { 760 int j; 761 762 for (j = i + 1; j < MAX_EARLY_RES && early_res[j].end; j++) 763 ; 764 765 memmove(&early_res[i], &early_res[i + 1], 766 (j - 1 - i) * sizeof(struct early_res)); 767 768 early_res[j - 1].end = 0; 769 }
释放第i个区间,并将i后的所有区间向前移动sizeof(struct early_res)字节