The answer is that the kernel can be made to behave that way by tweaking aruntime parameter, but it is not necessarily a good idea. Before gettinginto that, however, it's worth noting that recent 2.6 kernels have a memorymanagement problem which can cause serious problems after an applicationwhich reads through entire filesystems (updatedb, say, or a backup) hasrun. The problem is the slab cache's tendency to request allocations ofmultiple, contiguous pages; these allocations, when done at the behestof filesystem code, can bring the system to a halt. A patch has been merged which fixes thisparticular problem for 2.6.6.
The bigger issue remains, however: should the kernel swap out userapplications in order to cache more file contents? There are plenty ofarguments in favor of this behavior. Quite a few large applications set upbig areas of memory which they rarely, if ever use. If application memoryis occasionally forced to disk, the unused parts will remain there, andthat much physical memory will be freed for more useful contents. Withoutswapping application memory to disk and seeing what gets faulted back in,it is almost impossible to figure out which pages are not really needed.A large file cache is also a performance enhancer. The speedups that comefrom having frequently-accessed data in memory are harder to see than theslowdowns caused by having to fault in a large application, but they canlead to better system throughput overall.
Still, there are users who insist that, for example, a system backup shouldnever force OpenOffice out to disk. They don't care how quickly a systemmaintenance application runs at 3:00 in the morning, but they care a lotabout how the system responds when they are at the keyboard. This wish wasexpressed repeatedly until Andrew Morton exclaimed:
This helped quiet the debate as the parties involved looked more closely atthis particular parameter. Or, perhaps, it was just fear of Andrew'ssinging. Either way, it has become clear that most people are unaware ofwhat the "swappiness" parameter does; the fact that it has never been documented mayhave something to do with that.
So... swappiness, which is exported to/proc/sys/vm/swappiness, is a parameter which sets the kernel'sbalance between reclaiming pages from the page cache and swapping outprocess memory. The reclaim code works (in a very simplified way) bycalculating a few numbers:
With those numbers in hand, the kernel calculates its "swap tendency":
swap_tendency = mapped_ratio/2 + distress + vm_swappiness;
If swap_tendency is below 100, the kernel will only reclaim pagecache pages. Once it goes above that value, however, pages which are partof some process's address space will also be considered for reclaim. So,if life is easy, swappiness is set to 60, and distress is zero,the system will not swap process memory until it reaches 80% of the total. Users who would like tonever see application memory swapped out can set swappiness to zero; thatsetting will cause the kernel to ignore process memory until thedistress value gets quite high.
The swappiness parameter should do what a lot of users want, but it doesnot solve the whole problem. Swappiness is a global parameter; it affectsevery process on the system in the same way. What a number of people wouldlike to see, however, is a way to single out individual applications forspecial treatment. Possible approaches include using the process's "nice"value to control memory behavior; a low-priority process would not be ableto push out significant amounts of a high-priority process's memory.Alternatively, the VM subsystem and the scheduler could become more tightlyintegrated. The scheduler already makes an effort to detect "interactive"processes; those processes could be given the benefit of a larger workingset in memory. That sort of thing is 2.7 work, however; in the mean time,people who are unhappy with the kernel's swap behavior may want to tryplaying with the knobs which have been provided.