启动过程中,内核先后使用的内存分配器有:early_res,bootmem,zone allocator;后一个内存分配器启用之后,前一个内存分配器不再使用。
early_res是内存最早使用的内存分配器
arch/x86/kernel/e820.c:
1 /*
2 * Handle the memory map.
3 * The functions here do the job until bootmem takes over.
内核获取内存信息
1、实模式下用BIOS提供的中断服务获取物理内存信息,并保存在boot_params的e820_map字段中。
2、将实模式下的boot_params复制到保护模式的boot_params中
3、预留内存段信息存放到early_res中,最多20个
4、将boot_params的e820_map值复制到e820中
5、bootmem初始化,标记所有内存为已使用
6、bootmem从e820_map中取得可用内存,注册到bootmem中
7、bootmem将出现在early_res预留内存段的内存标记为已使用
1、实模式下获取BIOS提供的物理内存信息
boot/header.S->main->detect_memory->detect_memory_e820
调用BIOS提供的0x15中断服务获取物理内存信息
2、实模式boot_params复制到保护模式boot_params中
获取实模式数据过程:
a、切换保护模式,boot/header.S->main.c->pmjump.S
b、内核解压,boot/compressed/head_32.S
c、参数复制,kernel/head_32.S
跳转到压缩内核时传递的参数boot_params放在esi中,而esi在解压过程中不变,从而传给解压后内核的启动函数startup_32;
arch/x86/kernel/head_32.S:
110 /*
111 * Copy bootup parameters out of the way.
112 * Note: %esi still has the pointer to the real-mode data.
113 * With the kexec as boot loader, parameter segment might be loaded beyond
114 * kernel image and might not even be addressable by early boot page tables.
115 * (kexec on panic case). Hence copy out the parameters before initializing
116 * page tables.
117 */
118 movl $pa(boot_params),%edi
119 movl $(PARAM_SIZE/4),%ecx
120 cld
121 rep
3、预留内存区间
内核在未启动bootmem之前,使用early_res存放预留内存区间;在bootmem启动时将预留区间注入到bootmem中。
early_res分配内存主要有:内核代码数据、页表、BOOTMAP(bootmem中使用的位图)等
arch/x86/kernel/head32.c:
34 reserve_early(__pa_symbol(&_text), __pa_symbol(&__bss_stop), "TEXT DATA BSS");
4、e820取出boot_params中e820map数据
start_kernel->setup_arch->setup_memory_map
dmesg | less
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009f800 (usable)
BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000ca000 - 00000000000cc000 (reserved)
BIOS-e820: 00000000000dc000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000007fef0000 (usable)
BIOS-e820: 000000007fef0000 - 000000007feff000 (ACPI data)
BIOS-e820: 000000007feff000 - 000000007ff00000 (ACPI NVS)
BIOS-e820: 000000007ff00000 - 0000000080000000 (usable)
BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
BIOS-e820: 00000000fffe0000 - 0000000100000000 (reserved)
5、bootmem初始化
start_kernel->setup_arch->initmem_init
6、bootmem注册可用内存
a、将e820中可用内存放入active_regions中(start_kernel->setup_arch->initmem_init->e820_register_active_regions)
b、将active_regions中的可用内存释放到bootmem中(start_kernel->setup_arch->initmem_init->setup_bootmem_allocator->setup_node_bootmem->free_bootmem_with_active_regions)
7、bootmem保留预留内存区间
start_kernel->setup_arch->initmem_init->setup_bootmem_allocator->setup_node_bootmem->early_res_to_bootmem
dmesg | less
mapped low ram: 0 - 375fe000
low ram: 0 - 375fe000
node 0 low ram: 00000000 - 375fe000
node 0 bootmap 00014000 - 0001aec0
(9 early reservations) ==> bootmem [0000000000 - 00375fe000]
#0 [0000000000 - 0000001000] BIOS data page ==> [0000000000 - 0000001000]
#1 [0000001000 - 0000002000] EX TRAMPOLINE ==> [0000001000 - 0000002000]
#2 [0000006000 - 0000007000] TRAMPOLINE ==> [0000006000 - 0000007000]
#3 [0000400000 - 0000befe50] TEXT DATA BSS ==> [0000400000 - 0000befe50]
#4 [000009f800 - 0000100000] BIOS reserved ==> [000009f800 - 0000100000]
#5 [0000bf0000 - 0000bf81c8] BRK ==> [0000bf0000 - 0000bf81c8]
#6 [0000010000 - 0000014000] PGTABLE ==> [0000010000 - 0000014000]
#7 [0000bf9000 - 0001941182] NEW RAMDISK ==> [0000bf9000 - 0001941182]
#8 [0000014000 - 000001b000] BOOTMAP ==> [0000014000 - 000001b000]
=======================================================
e820map,BIOS提供的物理内存map
I.处理e820中重叠部分
sanitize_e820_map对e820map中重叠的部分进行处理,将重叠部分归并到最高优先级,并从低优先级区间中删除;在此过程中可能出现合并与分割。
173 /*
174 * Sanitize the BIOS e820 map.
175 *
176 * Some e820 responses include overlapping entries. The following
177 * replaces the original e820 map with a new one, removing overlaps,
178 * and resolving conflicting memory types in favor of highest
179 * numbered type.
180 *
181 * The input parameter biosmap points to an array of 'struct
182 * e820entry' which on entry has elements in the range [0, *pnr_map)
183 * valid, and which has space for up to max_nr_map entries.
184 * On return, the resulting sanitized e820 map entries will be in
185 * overwritten in the same location, starting at biosmap.
186 *
187 * The integer pointed to by pnr_map must be valid on entry (the
188 * current number of valid entries located at biosmap) and will
189 * be updated on return, with the new number of valid entries
190 * (something no more than max_nr_map.)
191 *
192 * The return value from sanitize_e820_map() is zero if it
193 * successfully 'sanitized' the map entries passed in, and is -1
194 * if it did nothing, which can happen if either of (1) it was
195 * only passed one map entry, or (2) any of the input map entries
196 * were invalid (start + size < start, meaning that the size was
197 * so big the described memory range wrapped around through zero.)
198 *
199 * Visually we're performing the following
200 * (1,2,3,4 = memory types)...
201 *
202 * Sample memory map (w/overlaps):
203 * ____22__________________
204 * ______________________4_
205 * ____1111________________
206 * _44_____________________
207 * 11111111________________
208 * ____________________33__
209 * ___________44___________
210 * __________33333_________
211 * ______________22________
212 * ___________________2222_
213 * _________111111111______
214 * _____________________11_
215 * _________________4______
216 *
217 * Sanitized equivalent (no overlap):
218 * 1_______________________
219 * _44_____________________
220 * ___1____________________
221 * ____22__________________
222 * ______11________________
223 * _________1______________
224 * __________3_____________
225 * ___________44___________
226 * _____________33_________
227 * _______________2________
228 * ________________1_______
229 * _________________4______
230 * ___________________2____
231 * ____________________33__
232 * ______________________4_
233 */
234
235 int __init sanitize_e820_map(struct e820entry *biosmap, int max_nr_map,
236 u32 *pnr_map)
=======================================================
early_res内存预留区间
I.early_res数据结构
arch/x86/kernel/e820.c
724 /*
725 * Early reserved memory areas.
726 */
727 #define MAX_EARLY_RES 20
728
729 struct early_res {
730 u64 start, end;
731 char name[16];
732 char overlap_ok;
733 };
734 static struct early_res early_res[MAX_EARLY_RES] __initdata = {
735 { 0, PAGE_SIZE, "BIOS data page" }, /* BIOS data page */
736 {}
737 };
start:预留区间起始地址
end:预留区间终止地址,0标识数组结束
name:预留区间名称
overlap_ok:可重叠
II.查找重叠的内存区间:
arch/x86/kernel/e820.c
739 static int __init find_overlapped_early(u64 start, u64 end)
740 {
741 int i;
742 struct early_res *r;
743
744 for (i = 0; i < MAX_EARLY_RES && early_res[i].end; i++) {
745 r = &early_res[i];
746 if (end > r->start && start < r->end)
747 break;
748 }
749
750 return i;
751 }
查找预留内存区间数组,存在与区间(start,end)重叠的返回重叠区间索引,不存在重叠返回结束区间索引。
III.预留内存区间
1、预留内存区间
838 static void __init __reserve_early(u64 start, u64 end, char *name,
839 int overlap_ok)
840 {
841 int i;
842 struct early_res *r;
843
844 i = find_overlapped_early(start, end);
845 if (i >= MAX_EARLY_RES)
846 panic("Too many early reservations");
847 r = &early_res[i];
848 if (r->end)
849 panic("Overlapping early reservations "
850 "%llx-%llx %s to %llx-%llx %s\n",
851 start, end - 1, name?name:"", r->start,
852 r->end - 1, r->name);
853 r->start = start;
854 r->end = end;
855 r->overlap_ok = overlap_ok;
856 if (name)
857 strncpy(r->name, name, sizeof(r->name) - 1);
858 }
添加一个新区间到early_res数组的尾部,如果出现重叠或超出预留区间最大值MAX_EARLY_RES则内核panic
2、删除可重叠区间中重叠部分
771 /*
772 * Split any existing ranges that:
773 * 1) are marked 'overlap_ok', and
774 * 2) overlap with the stated range [start, end)
775 * into whatever portion (if any) of the existing range is entirely
776 * below or entirely above the stated range. Drop the portion
777 * of the existing range that overlaps with the stated range,
778 * which will allow the caller of this routine to then add that
779 * stated range without conflicting with any existing range.
780 */
781 static void __init drop_overlaps_that_are_ok(u64 start, u64 end)
782 {
783 int i;
784 struct early_res *r;
785 u64 lower_start, lower_end;
786 u64 upper_start, upper_end;
787 char name[16];
788
789 for (i = 0; i < MAX_EARLY_RES && early_res[i].end; i++) {
790 r = &early_res[i];
791
792 /* Continue past non-overlapping ranges */
793 if (end <= r->start || start >= r->end)
794 continue;
795
796 /*
797 * Leave non-ok overlaps as is; let caller
798 * panic "Overlapping early reservations"
799 * when it hits this overlap.
800 */
801 if (!r->overlap_ok)
802 return;
803
804 /*
805 * We have an ok overlap. We will drop it from the early
806 * reservation map, and add back in any non-overlapping
807 * portions (lower or upper) as separate, overlap_ok,
808 * non-overlapping ranges.
809 */
810
811 /* 1. Note any non-overlapping (lower or upper) ranges. */
812 strncpy(name, r->name, sizeof(name) - 1);
813
814 lower_start = lower_end = 0;
815 upper_start = upper_end = 0;
816 if (r->start < start) {
817 lower_start = r->start;
818 lower_end = start;
819 }
820 if (r->end > end) {
821 upper_start = end;
822 upper_end = r->end;
823 }
824
825 /* 2. Drop the original ok overlapping range */
826 drop_range(i);
827
828 i--; /* resume for-loop on copied down entry */
829
830 /* 3. Add back in any non-overlapping ranges. */
831 if (lower_end)
832 reserve_early_overlap_ok(lower_start, lower_end, name);
833 if (upper_end)
834 reserve_early_overlap_ok(upper_start, upper_end, name);
835 }
836 }
a.计算出未重叠的前半部分与后半部分
b.释放掉原区间
c.将未重叠的前半部分与后半部分以可重叠的方式加入到预留内存区间
3.分配可重叠区间
860 /*
861 * A few early reservtations come here.
862 *
863 * The 'overlap_ok' in the name of this routine does -not- mean it
864 * is ok for these reservations to overlap an earlier reservation.
865 * Rather it means that it is ok for subsequent reservations to
866 * overlap this one.
867 *
868 * Use this entry point to reserve early ranges when you are doing
869 * so out of "Paranoia", reserving perhaps more memory than you need,
870 * just in case, and don't mind a subsequent overlapping reservation
871 * that is known to be needed.
872 *
873 * The drop_overlaps_that_are_ok() call here isn't really needed.
874 * It would be needed if we had two colliding 'overlap_ok'
875 * reservations, so that the second such would not panic on the
876 * overlap with the first. We don't have any such as of this
877 * writing, but might as well tolerate such if it happens in
878 * the future.
879 */
880 void __init reserve_early_overlap_ok(u64 start, u64 end, char *name)
881 {
882 drop_overlaps_that_are_ok(start, end);
883 __reserve_early(start, end, name, 1);
884 }
首先释放出可重叠区间的重叠部分,然后做以可重叠的方式做区间预留;可重叠是指以后的预留过程中,该区间可以重叠,而不是重叠以前的区间。
4、分配不可重叠区间
886 /*
887 * Most early reservations come here.
888 *
889 * We first have drop_overlaps_that_are_ok() drop any pre-existing
890 * 'overlap_ok' ranges, so that we can then reserve this memory
891 * range without risk of panic'ing on an overlapping overlap_ok
892 * early reservation.
893 */
894 void __init reserve_early(u64 start, u64 end, char *name)
895 {
896 if (start >= end)
897 return;
898
899 drop_overlaps_that_are_ok(start, end);
900 __reserve_early(start, end, name, 0);
901 }
首先释放出可重叠区间的重叠部分,然后以不可重叠的方式做区间预留
IV.释放内存区间
753 /*
754 * Drop the i-th range from the early reservation map,
755 * by copying any higher ranges down one over it, and
756 * clearing what had been the last slot.
757 */
758 static void __init drop_range(int i)
759 {
760 int j;
761
762 for (j = i + 1; j < MAX_EARLY_RES && early_res[j].end; j++)
763 ;
764
765 memmove(&early_res[i], &early_res[i + 1],
766 (j - 1 - i) * sizeof(struct early_res));
767
768 early_res[j - 1].end = 0;
769 }
释放第i个区间,并将i后的所有区间向前移动sizeof(struct early_res)字节