2.6 swapping behavior

http://lwn.net/Articles/83588/

2.6 swapping behavior

There has, recently, been a new round of complaints about how the 2.6kernel swaps out memory. Some users have been very vocal in their beliefthat, if they have sufficient physical memory, their applications should never be swappedout. These people get annoyed when they sit down at their display in themorning and find that their office suite or web browser is unresponsive,and stays that way for some time. They get even more annoyed when they lookand see how much memory the kernel is using for caching file contentsrather than process memory. The obvious question to ask is: couldn't thekernel cut back a bit on the file caches and keep applications in memory?

The answer is that the kernel can be made to behave that way by tweaking aruntime parameter, but it is not necessarily a good idea. Before gettinginto that, however, it's worth noting that recent 2.6 kernels have a memorymanagement problem which can cause serious problems after an applicationwhich reads through entire filesystems (updatedb, say, or a backup) hasrun. The problem is the slab cache's tendency to request allocations ofmultiple, contiguous pages; these allocations, when done at the behestof filesystem code, can bring the system to a halt. A patch has been merged which fixes thisparticular problem for 2.6.6.

The bigger issue remains, however: should the kernel swap out userapplications in order to cache more file contents? There are plenty ofarguments in favor of this behavior. Quite a few large applications set upbig areas of memory which they rarely, if ever use. If application memoryis occasionally forced to disk, the unused parts will remain there, andthat much physical memory will be freed for more useful contents. Withoutswapping application memory to disk and seeing what gets faulted back in,it is almost impossible to figure out which pages are not really needed.A large file cache is also a performance enhancer. The speedups that comefrom having frequently-accessed data in memory are harder to see than theslowdowns caused by having to fault in a large application, but they canlead to better system throughput overall.

Still, there are users who insist that, for example, a system backup shouldnever force OpenOffice out to disk. They don't care how quickly a systemmaintenance application runs at 3:00 in the morning, but they care a lotabout how the system responds when they are at the keyboard. This wish wasexpressed repeatedly until Andrew Morton exclaimed:

I'm gonna stick my fingers in my ears and sing "la la la" untilpeople tell me "I set swappiness to zero and it didn't do what Iwanted it to do".

This helped quiet the debate as the parties involved looked more closely atthis particular parameter. Or, perhaps, it was just fear of Andrew'ssinging. Either way, it has become clear that most people are unaware ofwhat the "swappiness" parameter does; the fact that it has never been documented mayhave something to do with that.

So... swappiness, which is exported to/proc/sys/vm/swappiness, is a parameter which sets the kernel'sbalance between reclaiming pages from the page cache and swapping outprocess memory. The reclaim code works (in a very simplified way) bycalculating a few numbers:

  • The "distress" value is a measure of how much trouble the kernel is having freeing memory. The first time the kernel decides it needs to start reclaiming pages, distress will be zero; if more attempts are required, that value goes up, approaching a high value of 100.

  • mapped_ratio is an approximate percentage of how much of the system's total memory is mapped (i.e. is part of a process's address space) within a given memory zone.

  • vm_swappiness is the swappiness parameter, which is set to 60 by default.

With those numbers in hand, the kernel calculates its "swap tendency":

	swap_tendency = mapped_ratio/2 + distress + vm_swappiness;

If swap_tendency is below 100, the kernel will only reclaim pagecache pages. Once it goes above that value, however, pages which are partof some process's address space will also be considered for reclaim. So,if life is easy, swappiness is set to 60, and distress is zero,the system will not swap process memory until it reaches 80% of the total. Users who would like tonever see application memory swapped out can set swappiness to zero; thatsetting will cause the kernel to ignore process memory until thedistress value gets quite high.

The swappiness parameter should do what a lot of users want, but it doesnot solve the whole problem. Swappiness is a global parameter; it affectsevery process on the system in the same way. What a number of people wouldlike to see, however, is a way to single out individual applications forspecial treatment. Possible approaches include using the process's "nice"value to control memory behavior; a low-priority process would not be ableto push out significant amounts of a high-priority process's memory.Alternatively, the VM subsystem and the scheduler could become more tightlyintegrated. The scheduler already makes an effort to detect "interactive"processes; those processes could be given the benefit of a larger workingset in memory. That sort of thing is 2.7 work, however; in the mean time,people who are unhappy with the kernel's swap behavior may want to tryplaying with the knobs which have been provided.


你可能感兴趣的:(behavior,application,system,filesystems,numbers,cache)