当内存很大时,除了刷脏页的调度可能需要优化,还有一方面是虚拟内存与物理内存映射表相关的部分需要优化。
1、主要包括,调整后台进程刷脏页的阈值、唤醒间隔、以及老化阈值。(脏页大于多少时开始刷、多久探测一次有多少脏页、刷时多老的脏页刷出。)。
vm.dirty_background_bytes = 4096000000
vm.dirty_background_ratio = 0
vm.dirty_expire_centisecs = 6000
vm.dirty_writeback_centisecs = 100
2、用户进程刷脏页调度,当脏页大于多少时,用户如果要申请内存,需要协助刷脏页。
vm.dirty_bytes = 0
vm.dirty_ratio = 80
《DBA不可不知的操作系统内核参数》
这部分主要是因为虚拟内存管理,Linux需要维护虚拟内存地址与物理内存的映射关系,为了提升转换性能,最好这部分能够cache在cpu的cache里面。页越大,映射表就越小。使用huge page可以减少页表大小。
默认页大小可以这样获取,
# getconf PAGESIZE
4096
https://en.wikipedia.org/wiki/Page_table
另一个使用HUGE PAGE的原因,HUGE PAGE是常驻内存的,不会被交换出去,这也是重度依赖内存的应用(包括数据库)非常喜欢的。
In a virtual memory system, the tables store the mappings between virtual addresses and physical addresses. When the system needs to access a virtual memory location, it uses the page tables to translate the virtual address to a physical address. Using huge pages means that the system needs to load fewer such mappings into the Translation Lookaside Buffer (TLB), which is the cache of page tables on a CPU that speeds up the translation of virtual addresses to physical addresses. Enabling the HugePages feature allows the kernel to use hugetlb entries in the TLB that point to huge pages. The hugetbl entries mean that the TLB entries can cover a larger address space, requiring many fewer entries to map the SGA, and releasing entries that can map other portions of the address space.
With HugePages enabled, the system uses fewer page tables, reducing the overhead for maintaining and accessing them. Huges pages remain pinned in memory and are not replaced, so the kernel swap daemon has no work to do in managing them, and the kernel does not need to perform page table lookups for them. The smaller number of pages reduces the overhead involved in performing memory operations, and also reduces the likelihood of a bottleneck when accessing page tables.
1、查看Linux huage page页大小
# grep Hugepage /proc/meminfo
Hugepagesize: 2048 kB
2、准备设置多大的shared buffer参数,假设我们的内存有512GB,想设置128GB的SHARED BUFFER。
vi postgresql.conf
shared_buffers='128GB'
3、计算需要多少huge page
128GB/2MB=65535
4、设置Linux huge page页数
sysctl -w vm.nr_hugepages=67537
5、设置使用huge page。
vi $PGDATA/postgresql.conf
huge_pages = on # on, off, or try
# 设置为try的话,会先尝试huge page,如果启动时无法锁定给定数目的大页,则不会使用huge page
6、启动数据库
pg_ctl start
7、查看当前使用了多少huge page
cat /proc/meminfo |grep -i huge
AnonHugePages: 6144 kB
HugePages_Total: 67537 ## 设置的HUGE PAGE
HugePages_Free: 66117 ## 这个是当前剩余的,但是实际上真正可用的并没有这么多,因为被PG锁定了65708个大页
HugePages_Rsvd: 65708 ## 启动PG时申请的HUGE PAGE
HugePages_Surp: 0
Hugepagesize: 2048 kB ## 当前大页2M
8、执行一些查询,可以看到Free会变小。被PG使用掉了。
cat /proc/meminfo |grep -i huge
AnonHugePages: 6144 kB
HugePages_Total: 67537
HugePages_Free: 57482
HugePages_Rsvd: 57073
HugePages_Surp: 0
Hugepagesize: 2048 kB
Oracle也是重度内存使用应用,当SGA配置较大时,同样建议设置HUGEPAGE。
Oracle 建议当SGA大于或等于8GB时,使用huge page。
https://yq.aliyun.com/articles/582870?spm=a2c4e.11153940.blogcont694164.17.4762dfedJxr3Rr