How to manager swap partition?怎样管理swap分区?

Title:How to disable and enable [swap] in Linux / 怎样禁用和启用swap分区

以下内容适用于任何Linux系统。

How to create a swap partition?

First of all, create a new partition, no need to format it, if the new partition is /dev/sda2, 2G in size. / 首先,新作一个分区,不需要格式化,假设新分区是/dev/sda2,大小为2G。
Then execute 'mkswap' and 'swapon',

mkswap /dev/sda2
swapon /dev/sda2

Then use [free] to check your memory, look at the third line of the output. / 用free命令检查内存,看它输出的第三行。

free -h

You can also use 'swapon -s' to check how many swap partitions or swap files you have / 用swap -s检查你拥有多少swap分区或swap文件。

swapon -s

But after you rebooting your system, your system won't use this new swap partition automatically. / 但是重启系统后,系统不会自动使用这个新交换分区。

To make the system automatically mount the swap partition, you need to check you partition's UUID, execute 'blkid' / 要让系统启动时自动挂载swap分区,你需要首先检查分区的UUID,执行blkid

blkid /dev/sda2
/dev/sda2: UUID="a8d7e14a-3fba-4d07-b77a-735489796b4b" TYPE="swap" PARTLABEL="primary" PARTUUID="2b9587c4-3a02-433a-94aa-fba7055d2170" 

now we know it's UUID is a8d7e14a-3fba-4d07-b77a-735489796b4b .

Then edit /etc/fstab, add this line to it:

UUID=a8d7e14a-3fba-4d07-b77a-735489796b4b none swap defaults 0 0

You can also do not use UUID, but write '/dev/sda2' directly, / 也可以不用UUID,直接写/dev/sda2

/dev/sda2 none swap defaults 0 0

So, the next time you rebooting, your system will automatically use this swap partition. / 这样,下次重启时系统会自动使用这个交换分区。

How to use a normal file as swap memory? / 怎样使用一个普通文件作为交换内存?

First, you create a file, with 2G size. / 首先你作一个文件,2G大小。

dd if=/dev/zero of=TheFileUsedForSwap bs=1024 count=2097152
  • The output file is 'TheFileusedForSwap'. / 输出文件是TheFileUsedForSwap
  • bs=1024, means writing 1024 bytes to that file each time. / 每次向那个文件写入1024字节。
  • count=2097152, means writing to that file 2097152 times. / 向那个文件写2097152次。
  • 2097152 = 1024 * 1024 * 2
    so bs X count = 2G. / 所以生成的文件大小为2G。

Then execute 'mkswap' and 'swapon',

mkswap TheFileUsedForSwap
swapon TheFileUsedForSwap

To make the system automatically use this file on boot, edit /etc/fstab / 如果要让系统启动时自动使用这个文件,需要修改/etc/fstab

TheFileUsedForSwap none swap defaults 0 0

Now we use 'swap memory' refers to 'swap partition' and 'swap memory'. / 现在我们用swap内存指代swap分区或swap文件。

How to disable a swap memory / 怎样禁用交换内存

swapoff /dev/sda2

or

swapoff TheFileUsedForSwap

or

swapoff -a

How to control how Linux decide when to use swap memory? / 怎样控制Linux怎样决定何时使用交换内存?

Since Linux 2.6, it has an argument swappiness, / 从2.6内核起,有一个参数swappiness,
It's value range is 0 ~ 100, more less it is, Linux are less likely to use swap memory, more larger it is, Linux are more likely to use swap memory / 它的值范围是0到100,它越小,Linux越不愿意使用交换内存,它越大, Linux越倾向于使用交换内存。

You can edit /proc/sys/vm/swappiness / 你可以修改/proc/sys/vm/swappiness

echo 100 | sudo tee /proc/sys/vm/swappiness

or

sysctl vm.swappiness=100

Notice, you can't do like this: / 注意,你不能直接这样:

echo > /proc/sys/vm/swappiness

because you will meet permission error, even if you are root. / 因为你会需到权限报错,即使你是root用户。

How to check the current swappiness? / 怎样检查现在的swappiness?

cat /proc/sys/vm/swappiness

or

sysctl vm.swappiness

How to modify swappiness permanently? / 怎样永远修改swappiness?
Edit /etc/sysctl.conf

vm.swappiness=100

Let's talk about swappiness / 我们聊聊swappiness

以下内容参考自
https://www.howtogeek.com/449691/what-is-swapiness-on-linux-and-how-to-change-it/

Notice, ,the Linux swappiness value has nothing to do with how much RAM is used before swapping starts.

注意,swapiness的值与(系统使用了多少内存后才开始交换)无关。

Swapping is a technique where data in Random Access Memory(RAM) is written to a special location on your hard disk - either a swap partition or a swap file - to free up RAM.

(交换内存)是一种把RAM里的数据写到硬盘上的某个位置 - swap分区或swap文件 - 的技术,以释放RAM。
Linux has a setting called the swappiness value. There is a lot of confusion about what this setting controls. The most common incorrect description of swappiness is that is sets a threshold of RAM usage, and when the amount of the used RAM hits that threshold, swapping starts.

Linux有一个设置项叫swappiness。大家对这个值有很多误会。最常见的错误描述是说,swappiness为RAM的使用设置了一个上限,当已用的RAM超过了上限,就会触发内存交换。

This is a misconception that has been repeated so often that it is now received wisdom. If (almost) everyone else tells you that’s exactly how swappiness works, why should you believe us when we say it isn’t?

这个错误观念传播的这么广,以致于大家都以为这个观念是对的。如果别人都跟你说内存交换机制就是那样工作的,你怎么听得进去我们的相反意见?

Simple, we're going to prove it.

好办,我们来证明它。

Linux doesn’t think of your RAM as one big homogenous pool of memory. It considers it to be divided into a number of different regions called zones. Which zones are present on your computer depends on whether it is 32-bit or 64-bit. Here’s a simplified description of the possible zones on an x86 architecture computer.

Linux不把你的内存看作一整块均匀同质的内存,而是把它分为很多区域。你的电脑上可能有哪些区域取决于你的系统是32位还是64位。这里有一个x86架构电脑上可能有哪些内存区域的说明。

  • Direct Memory Access(DMA):This is the low 16MB of the memory. The zone gets its name because, a long time ago, there were computers that could only do direct memory access into this area of physical memory.

    直接内存访问(DMA):内存的低位16MB。这个区域的名字来历说来话长,因为以前的电脑对这块内存只能执行 直接内存访问。

  • Direct Memory Access 32:Despite its name, Direct Memory Access 32 (DMA32) is a zone only found in 64-bit Linux. it's the low 4GB of memory. Linux running on 32-bit computers can only do DMA to this amount of RAM (unless they are using the Physical Address Extension(PAE) kernel), which is how the zone got its name. Although, on 32-bit computers, it is called HighMem.

    直接内存访问32:尽管它名字里有32,但是它只存在于64位Linux系统里。它指的是内存的低4G区域。32位Linux系统只能对这块内存执行DMA(除非系统使用了内核的物理地址扩展(PAE)),这块内存区域因此得名。在32系统上,这块内存区域叫高位内存。

  • Normal: On 64-bit computers, normal memory is all of the RAM above 4GB (roughly). On 32-bit machines, it is RAM between 16 MB and 896 MB.

    普通内存:在64位电脑上,普通内存是指所有超过4G的内存区域。在32位电脑上,是指超过16M,小于896M的内存区域。

  • HighMem: This only exists on 32-bit Linux computers. It is all RAM above 896 MB, including RAM above 4 GB on sufficiently large machines.

    高位内存:只存在于32位Linux电脑上。是指超过896M的内存区域,包括超过4G的区域。

What is PAGESIZE? / 什么是pagesize?
RAM is allocated in pages, which are of a fixed size. That size is determined by the kernel at boot time by detecting the architecture of the computer. Typically the page size on a Linux computer is 4K bytes.

内存是按页来组织的,页有固定的大小。这个大小由系统在启动时通过检查CPU架构来决定。一般是4K字节。

You can see your page size using the getconf command:
你可以用getconf命令检查页大小:

getconf PAGESIZE

Zones are attached to nodes / 内存区域依附于内存条
Zones are attached to nodes. Nodes are associated with a Central Processing Unit (CPU). The kernel will try to allocate memory for a process running on a CPU from the node associated with that CPU.

内存区域依附于内存条,内存条于CPU相关联。内核会为运行在某个CPU上的程序在与这个CPU关联的内存条上分配内存。

The concept of nodes being tied to CPUs allows mixed memory types to be installed in specialist multi-CPU computers, using the Non-Uniform Memory Access architecture.

内存条与CPU关联的概念使得不同型内存条可以被安装在特殊的多个CPU,使用(不统一内存访问)架构的电脑上。

That’s all very high-end. The average Linux computer will have a single node, called node zero. All zones will belong to that node.

这些都是高等概念。一般的Linux电脑只有一个内存条,叫0号内存条。所有内存区域都在那个内存条上。

To see the nodes and zones in your computer, look inside the /proc/buddyinfo file. We’ll use less to do so:

如果要检查你的电脑的内存条和内存区域,可以查看文件/proc/buddyinfo。我们用less命令看一看:

less /proc/buddyinfo

Node 0, zone      DMA      0      0      0      1      2      1      1      0      1      1      3 
Node 0, zone    DMA32     13     12      9      5      4      4      2      1      0      4     77 

Let's see the second line of the output,

  • Node 0: 第0根内存条。
  • zone DMA32: DMA32内存区域。
  • 13: 这一块有 13 * 2^(0*PAGESIZE) 大小的内存。
  • 12: 这一块有 12 * 2^(1*PAGESIZE) 大小的内存。
  • 9: 这一块有 9 * 2^(2*PAGESIZE) 大小的内存。
  • 5: 这一块有 5 * 2^(3*PAGESIZE) 大小的内存。
  • 4: 这一块有 5 * 2^(4*PAGESIZE) 大小的内存。
  • 4: 这一块有 5 * 2^(5*PAGESIZE) 大小的内存。
  • 2: 这一块有 2 * 2^(6*PAGESIZE) 大小的内存。
  • 1: 这一块有 1 * 2^(7*PAGESIZE) 大小的内存。
  • 0: 这一块有 0 * 2^(8*PAGESIZE) 大小的内存。
  • 4: 这一块有 4 * 2^(9*PAGESIZE) 大小的内存。
  • 77: 这一块有 77 * 2^(10*PAGESIZE) 大小的内存。

What is file pages and annoymous pages? / 什么是文件页和无名页?

Memory mapping uses sets of page table entries to record which memory pages are used, and for what.

内存映射机制使用一堆页表记录项来记录哪些内存页被使用,如何使用。

Memory mappings can be / 内存映射可以是:

File backed: File backed mappings contain data that has been read from a file. It can be any kind of file. The important thing to note is that if the system freed this memory and needed to obtain that data again, it can be read from the file once more. But, if the data has been changed in memory, those changes will need to be written to the file on the hard drive before the memory can be freed. If that didn’t happen, the changes would be lost.

背后是文件:背后是文件的映射内存包含从文件读来的数据。文件可以是任何文件。重点要注意,如果系统释放了这块内存,魷后又要读取这份数据,可以再次从文件读。但是,如果内存里的数据改变了,需要先把变化写回硬盘上的文件,然后才能释放这块内存,不然变化就丢了。

Anonymous: Anonymous memory is a memory mapping with no file or device backing it. These pages may contain memory requested on-the-fly by programs to hold data, or for such things as the stack and the heap. Because there is no file behind this type of data, a special place must be set aside for the storage of anonymous data. That place is the swap partition or swap file. Anonymous data is written to swap before anonymous pages are freed.

无名内存:无名内存是不以文件或设备作后盾的内存。这些页可能包含程序运行时临时需要的内存,或者栈和堆。因为背后没有文件,所以要有专门的地方保存这些无名数据。那种地方就是swap分区或swap文件。无名数据所在的内存页被释放前,需要先被写入swap内存。

Device backed: Devices are addressed through block device files that can be treated as though they were files. Data can be read from them and written to them. A device backed memory mapping has data from a device stored in it.

背后是设备:设备是通过块设备文件寻址的,块设备文件可以被视为文件。我们可以从设备读数据,可以向设备写数据。背后是设备的内存保存了从设备读来的数据。

Shared: Multiple page table entries can map to the same page of RAM. Accessing the memory locations through any of the mappings will show the same data. Different processes can communicate with one another in a very efficient way by changing the data in these jointly-watched memory locations. Shared writable mappings are a common means of achieving high-performance inter-process communications.

公用内存:多个页表项可以指向同一个内存页。从任何一个路径访问内存都能得到一样的数据。进程间可以通过修改被共同照料的内存位置的数据来高效的联系。公用可写内存映射是实现高效进程间联系的常用方法。

Copy on write: Copy on write is a lazy allocation technique. If a copy of a resource already in memory is requested, the request is satisfied by returning a mapping to the original resource. If one of the processes “sharing” the resource tries to write to it, the resource must be truly replicated in memory to allow the changes to be made to the new copy. So the memory allocation only takes place on the first write command.

边复制边写:边复制边写是一种偷懒的内存分配技术。如果一个要被复制的资源已在内存里,复制的接收方将会得到原始资源的内存映射。如果另一个”共享“了这份资源的进程准备向这份资源写入数据,那么那份资源需要被在内存里真实的复制一份,以允许向新实例写入变化的数据。所以内存分配只发生于第一次写入命令。

For swappiness, we need only concern ourselves with the first two in the list: file pages and anonymous pages.

对于swappiness这个值,我们只需要关心上面说的前两项:文件页和无名页。

Here’s the description of swappiness from the Linux documentation on GitHub:

下面是Github上Linux文档上关于swappiness的说明

"This control is used to define how aggressive (sic) the kernel will swap memory pages. Higher values will increase aggressiveness, lower values decrease the amount of swap. A value of 0 instructs the kernel not to initiate swap until the amount of free and file-backed pages is less than the high water mark in a zone.

”这个值决定内核有多倾向于把内存页交换出去。越高就越积极,越低就越不愿意。0值会让内核直到空余和文件页少于警戒线才启用内存交换。

The default value is 60."

默认值是60.“

That sounds like swappiness turns swap up or down in intensity. Interestingly, it states that setting swappiness to zero doesn’t turn off swap. It instructs the kernel not to swap until certain conditions are met. But swapping can still occur.

听起来像是swappiness这个值可以增强减弱内存交换的强度。但是,上面也说了把swappiness设为0不会关闭内存交换,它只是告诉内核直到满足条件才执行内存交换,内存交换仍然可以发生。

Let’s dig deeper. Here’s the definition and default value of vm_swappiness in the kernel source code file vmscan.c:

我们深入研究一下。下面是内核源码文件vmscan.c里关于vm_swappiness的定义和默认值:

/*

  • From 0 .. 100. Higher means more swappy.
    */
    int vm_swappiness = 60;

The swappiness value can range from 0 to 100. Again, the comment certainly sounds like the swappiness value has a bearing on how much swapping takes place, with a higher figure leading to more swapping.

swappiness的值从0到100。那行注释让人以为swappiness的值可以影响会有多少内存被交换,值越高交换得越多。

Further on in the source code file, we can see that a new variable called swappiness is assigned a value that is returned by the function mem_cgroup_swappiness(). Some more tracing through the source code will show that the value returned by this function is vm_swappiness. So now, the variable swappiness is set to equal whatever value vm_swappiness was set to.

继续读源码,我们得知可以用函数mem_cgroup_swappiness()为一个新变量swappiness赋值。继续读源码可以发现这个函数返回的就是vm_swappiness的值。所以,变量swappiness总是等于vm_swappiness .

int swappiness = mem_cgroup_swappiness(memcg);

And a little further down in the same source code file, we see this:

继续读源码,可以发现:

/*

  • With swappiness at 100, anonymous and file have the same priority.
  • This scanning priority is essentially the inverse of IO cost.
    */
    anon_prio = swappiness;
    file_prio = 200 - anon_prio;

That’s interesting. Two distinct values are derived from swappiness. The anon_prio and file_prio variables hold these values. As one increases, the other decreases, and vice versa.

这就有意思了。swappiness衍生了两个值,anon_prio和file_prio,一个高,另一个就低。

The Linux swappiness value actually sets the ratio between two values.

swappiness实际上决定了anon_prio和file_prio的比例。

The Golden Ratio / 这个比例最好是多少?

File pages hold data that can be easily retrieved if that memory is freed. Linux can just read the file again. As we’ve seen, if the file data has been changed in RAM, those changes must be written to the file before the file page can be freed. But, either way, the file page in RAM can be repopulated by reading data from the file. So why bother adding these pages to the swap partition or swap file? If you need that data again, you might as well read it back from the original file instead of a redundant copy in the swap space. So file pages are not stored in swap. They’re “stored” back in the original file.

如果内存被释放了,文件页保存的数据可以轻易再次从文件取得,系统只需要再读一次文件。我们已经知道,如果内存里的文件数据有修改,这些修改必须先写回文件,那块内存才能释放。但是不管怎样,文件页的数据总是能再次从文件获得,所以干嘛要为这些内存页启用swap分区或swap文件?需要再次取得数据的话,只需要再次读原始文件,不需要往swap空间复制一份文件内容,所以文件页不保存在swap内存里,它们”保存“在原始文件里。

With anonymous pages, there is no underlying file associated with the values in memory. The values in those pages have been dynamically arrived at. You can’t simply read them back in from a file. The only way anonymous page memory values can be recovered is to store the data somewhere before freeing the memory. And that’s what swap holds. Anonymous pages that you are going to need to reference again.

而无名内存页没有对应的原始文件,这些内存页里的数据是动态产生的。恢复无名内存页数据的唯一方法是释放内存前把数据保存在某处,就是swap内存。

But note that for both file pages and for anonymous pages, freeing up the memory may require a hard drive write. If the file page data or the anonymous page data has changed since it was last written to the file or to swap, a file system write is required. To retrieve the data will require a file system read. Both types of page reclaim are costly. Trying to reduce hard drive input and output by minimizing the swapping of anonymous pages only increases the amount of hard drive input and output that is required to deal with file pages being written to, and read from, files.

但是不论文件页还是内存页,释放内存前都需要向硬盘写数据。如果文件页或无名页的数据在上一次写回文件或swap内存后修改过,就需要执行一次文件系统写操作。取得数据需要执行一次文件系统读操作。两种内存页的回收都是有成本的。试图通过减少把无名内存页交换出去来减少硬盘输入输出,其实会增加文件内存页读写文件产生的硬盘输入输出。

As you can see from the last code snippet, there are two variables. One called file_prio for “file priority”, and one called anon_prio for “anonymous priority”.

从上一个代码片断可以看到,有两个值,一个是file_prio,一个是anon_prio .

The anon_prio variable is set to the Linux swappiness value.
The file_prio value is set to 200 minus the anon_prio value.

anon_prio等于swappiness.
file_prio等于200减anon_prio.

These variables hold values that work in tandem. If they are both set to 100, they are equal. For any other values, anon_prio will decrease from 100 towards 0, and file_prio will increase from 100 towards 200. The two values feed into a complicated algorithm that determines whether the Linux kernel runs with a preference for reclaiming (freeing up) file pages or anonymous pages.

这两个值一起起作用。如果都是100,它们就相等。anon_prio可以从100减到0,file_prio可以从100加到200。这两个值被扔给一个复杂的算法,这个算法决定内核是否要回收文件内存页或无或内存页。

You can think of file_prio as the system’s willingness to free up file pages and anon_prio as the system’s willingness to free anonymous pages. What these values don’t do is set any kind of trigger or threshold for when swap is going to be used. That’s decided elsewhere.

你可以把file_prio看作系统释放文件内存页的欲望值,把anon_prio看作系统释放无名内存页的欲望值。这两个值并没有设置任何启用交换内存的触发条件或限值,触发条件或限制在其它地方设置。

But, when memory needs to be freed, these two variables—and the ratio between them—are taken into consideration by the reclamation and swap algorithms to determine which page types are preferentially considered for freeing up. And that dictates whether the associated hard drive activity will be processing files for file pages or swap space for anonymous pages.

但是,当内存需要被释放,这两个值,还有它们的比例将被内存释放和交换算法考虑以决定先释放哪种内存页。

When Does Swap Actually Cut In? / 什么时间会发生内存交换?

We’ve established that the Linux swappiness value sets a preference for the type of memory pages that will be scanned for potential reclamation. That’s fine, but something must decide when swap is going to cut in.

我们已经知道swappiness这个值设置了哪种类型的内存页优先被释放。但是总得有什么来决定何时执行内存交换。

Each memory zone has a high water mark and a low water mark. These are system derived values. They are percentages of the RAM in each zone. It is these values that are used as the swap trigger thresholds.

每个内存区域有一个高位警戒线和一个低位警戒线,它们是系统设置的。它们指的是内存区域的一个比例,它们决定了何时触发内存交换。

To check what your high and low water marks are, look inside the /proc/zoneinfo file with this command:

可以在/proc/zoneinfo 查看高位警戒线和低位警戒线:

cat /proc/zoneinfo

Each of the zones will have a set of memory values measured in pages. Here are the values for the DMA32 zone on the test machine. The low water mark is 6394 pages, and the high water mark is 7671 pages:

每个内存区域都有多个指标。比如 DMA32 有这些指标:

Node 0, zone    DMA32
pages free     88426
    min      5115
    low      6393
    high     7671
    spanned  1044480
    present  626500
    managed  610080
    protection: (0, 0, 5441, 5441, 5441)

If your PAGESIZE is 4096 bytes, then the low water mark is 24M, the high water mark is 29M.

如果你的每个内存页大小是4K,那低位警戒线就是24M,高位警戒线是29M.

  • In normal running conditions, when free memory in a zone drops below the zone’s low water mark, the swap algorithm starts scanning memory pages looking for memory that it can reclaim, taking into account the relative values of anon_prio and file_prio.

    一般来说,剩余内存低于低位警戒线,就会触发内存交换。

  • If the Linux swappiness value is set to zero, swap occurs when the combined value of file pages and free pages are less than the high water mark.

    如果swappiness是0,当文件内存页和剩余内存页之和小于高位警戒线,会触发内存交换。
    (这一段我没有读懂,之和?)

So you can see that you cannot use the Linux swappiness value to influence swap’s behavior with respect to RAM usage. It just doesn’t work like that.

所以你会发现你无法用swappiness影响多少内存使用量会触发内存交换。

你可能感兴趣的:(How to manager swap partition?怎样管理swap分区?)