上一次说过了物理内存由node,zone,page三级结构来描述。而node是根据当前的系统是NUMA还是UMA系统。假设我们当前是UMA系统架构,则只有一个node。
我们本节则重点学习下ZONE,重点是ZONE的数据结构,其中就可以看到ZONE中是如何管理我们page的,就会看到buddy分配器。
struct zone {
unsigned long _watermark[NR_WMARK];
unsigned long watermark_boost;
unsigned long nr_reserved_highatomic;
long lowmem_reserve[MAX_NR_ZONES];
const char *name;
struct free_area free_area[MAX_ORDER];
unsigned long flags;
}
enum zone_watermarks {
WMARK_MIN,
WMARK_LOW,
WMARK_HIGH,
NR_WMARK
};
struct free_area {
struct list_head free_list[MIGRATE_TYPES];
unsigned long nr_free;
};
而每一个order中又根据迁移类型分成了几组
enum migratetype {
MIGRATE_UNMOVABLE,
MIGRATE_MOVABLE,
MIGRATE_RECLAIMABLE,
#ifdef CONFIG_CMA
MIGRATE_CMA,
#endif
MIGRATE_PCPTYPES, /* the number of types on the pcp lists */
MIGRATE_HIGHATOMIC = MIGRATE_PCPTYPES,
#ifdef CONFIG_MEMORY_ISOLATION
MIGRATE_ISOLATE, /* can't allocate from here */
#endif
MIGRATE_TYPES
};
如果用一张图表示zone结构体之间的联系的话,就看下张图
可以通过我当前的设备,查看page的信息cat /proc/pagetypeinfo
root:/ # cat /proc/pagetypeinfo
Page block order: 10
Pages per block: 1024
Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10
Node 0, zone Normal, type Unmovable 801 290 24 5 4 7 2 0 1 1 0
Node 0, zone Normal, type Movable 288 296 69 18 87 34 14 5 1 1 55
Node 0, zone Normal, type Reclaimable 0 3 1 1 1 1 0 1 1 0 0
Node 0, zone Normal, type CMA 12 6 5 3 3 2 1 3 2 2 57
Node 0, zone Normal, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Normal, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Movable, type Unmovable 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Movable, type Movable 1980 1089 913 347 35 8 4 2 3 0 653
Node 0, zone Movable, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Movable, type CMA 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Movable, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone Movable, type Isolate 0 0 0 0 0 0 0 0 0 0 0
Number of blocks type Unmovable Movable Reclaimable CMA HighAtomic Isolate
Node 0, zone Normal 535 498 49 72 0 0
Node 0, zone Movable 0 768 0 0 0 0
Number of mixed blocks Unmovable Movable Reclaimable CMA HighAtomic Isolate
Node 0, zone Normal 0 15 1 0 0 0
Node 0, zone Movable 0 0 0 0 0 0
可以很清晰的看到各个order中不同类型,不同zone,page的剩余情况。当然也可以从cat /proc/buddyinfo看各个page的剩余情况
root:/ # cat /proc/buddyinfo
Node 0, zone Normal 338 355 130 37 99 43 17 9 5 4 112
Node 0, zone Movable 1604 1204 912 348 35 8 4 2 3 0 653
随着时间的推移,order值最大的page就会慢慢的分解开,变为更小的order的page。这时候当申请一个连续的大page都没有的时候,就会做碎片整理操作
当然了我们的zone,也可以通过cat /proc/zoneinfo去查看zone的详细信息的
root:/ # cat /proc/zoneinfo
Node 0, zone Normal
pages free 126204
min 1251
low 9254
high 9566
spanned 1308544
present 1180543
managed 1136476
protection: (0, 24576)
nr_free_pages 126204
nr_zone_inactive_anon 984
nr_zone_active_anon 61238
nr_zone_inactive_file 423539
nr_zone_active_file 122889
nr_zone_unevictable 987
nr_zone_write_pending 288
nr_mlock 987
nr_page_table_pages 13969
nr_kernel_stack 36784
nr_bounce 0
nr_zspages 0
nr_free_cma 60532
Node 0, zone Movable
pages free 680267
min 866
low 6404
high 6620
spanned 786432
present 786432
managed 786432
protection: (0, 0)
nr_free_pages 680267
nr_zone_inactive_anon 0
nr_zone_active_anon 104777
nr_zone_inactive_file 0
nr_zone_active_file 0
nr_zone_unevictable 121
nr_zone_write_pending 0
nr_mlock 121
nr_page_table_pages 0
nr_kernel_stack 0
nr_bounce 0
nr_zspages 0
nr_free_cma 0
可以看到我们当前只有一个node,两个zone分别为NORAML和Movable,以及各个水位的详细信息。
而我们zone是通过struct pglist_data管理的,pglist_date结构每个node是对应一个的,在numa机器上每个node对应一个pglist_data结构体,在Uma机器上只有一个pglist_data结构来描述整个内存
/*
* On NUMA machines, each NUMA node would have a pg_data_t to describe
* it's memory layout. On UMA machines there is a single pglist_data which
* describes the whole memory.
*
* Memory statistics and page replacement data structures are maintained on a
* per-zone basis.
*/
typedef struct pglist_data {
struct zone node_zones[MAX_NR_ZONES];
struct zonelist node_zonelists[MAX_ZONELISTS];
int nr_zones;
/*
* This is a per-node reserve of pages that are not available
* to userspace allocations.
*/
unsigned long totalreserve_pages;
/* Fields commonly accessed by the page reclaim scanner */
struct lruvec lruvec;
unsigned long flags;
} pg_data_t;
其中node_zonelist中有两种类型,分别为ZONELIST_FALLBACK和ZONELIST_NOFALLBACK,在UMA系统中只存在一个zonelist,就是ZONELIST_FALLBACK
LRU链表中会根据不同的LRU类型分为不同的列表,常见的有匿名活动页,匿名低活动页,活动的文件页,低活动的文件页等。
通过pglist_data结构就可以完全的描述一个内存的layout了。