That’s a common dilemma: I’ve just got a brand new Linux machine, I add a lot of expensive RAM and leave it for a day. Now it’s out of memory, or it’s using Swap memory! It definitely has enough RAM, so there must be something wrong in Linux!
I assure you, in most cases, your system has a lot of RAM. However, where does all that RAM go? Why does Linux use all RAM? How can I really say that I’m out of RAM? Unfortunately, Linux can make questions very difficult to answer. This article will explain in more detail how Linux uses RAM for things other than user data and how you can know when your system is running out of RAM.
Linux has the following basic rules: an empty RAM page is a wasted RAM page. RAM is used for more than the user’s application data. It also stores data for the kernel itself and most importantly can mirror data stored on disk for super fast access. The mirrors in this memory are important because the RAM access speed is faster than the hard drive. Have you ever noticed how long it takes to start the web browser for the first time after your system boots up? Have you ever downloaded it a second time and it pops up almost immediately? The start time is significantly reduced due to the copied parts in memory of the data on disk. There are a number of types of copies of these in Linux, so let’s examine each one. This data is largely listed in / Proc / meminfo file and we will mention its content on a regular basis.
This is the output from the author’s 2GB laptop, which runs kernel version 2.6.20:
MemTotal: 2073564 kB
MemFree: 1259628 kB
Buffers: 27924 kB
Cached: 176764 kB
SwapCached: 285188 kB
Active: 562120 kB
Inactive: 145592 kB
HighTotal: 1179008 kB
HighFree: 562948 kB
LowTotal: 894556 kB
LowFree: 696680 kB
SwapTotal: 1992052 kB
SwapFree: 1167632 kB
Dirty: 9052 kB
Writeback: 0 kB
AnonPages: 437520 kB
Mapped: 49800 kB
Slab: 91332 kB
SReclaimable: 64816 kB
SUnreclaim: 26516 kB
PageTables: 4872 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
CommitLimit: 3028832 kB
Committed_AS: 2402708 kB
VmallocTotal: 114680 kB
VmallocUsed: 6112 kB
VmallocChunk: 108044 kB
Your file / Proc / meminfo may contain items other than the above. The kernel developers have gradually added this file over the years and it has grown. Some distributions also add their own custom sections to this file. Do not worry if your file is slightly different from this file.
Linux Page Cache (the “Cached:” section of the meminfo file) is the largest RAM on most systems. Whenever you execute the read () command from a file on disk, that data will be read into memory and stored in the cache page. After the read () command completes, the kernel has the option of simply removing the page as it is not used. However, if you do a second reading of the same area in a file, the data will be read directly from memory and there will be no reading to the disk. This is an incredible speed and is why Linux uses its cache page so much: it makes sure that after you visit a page on disk once, you will revisit that page soon.
The same is true for mmap () ‘d files (the “Mapped:” section of the meminfo file). The first time the mmap () ‘d area is accessed, the page is taken from the disk and mapped into memory. The kernel may choose to remove that page immediately after the request to access the site has completed. However, the kernel is pretty sure the same thing it did to make the simple read () commands of the file. It keeps the page mapped to memory and bet you will be able to access it soon. This behavior may manifest in confusing ways.
You may think that the forest of mmap () ‘d memory is not “cached” because it is being actively used and “cached” in English means “completely not being used right now”. However, Linux does not define it that way. The Linux “cached” definition is close to “this is a copy of the data from the disk we have here to save you time”. It has no implication about how the page is actually used. This is why we have both “Cached:” and “Mapped:” in the meminfo file. All “Mapped:” memory is “Cached:”, but not all “Cached:” memory is “Mapped:”.
dentry / inode caches
Every time you perform an ‘ls’ (or any other action: open (), stat (), etc.) on the file system, the kernel needs data on disk. The kernel analyzes this data on disk and places it into some structure independent of the file system so that it can be processed in the same way on all different file systems. In the same fashion as the Cache in the examples above, the kernel has the option to remove these structures after the ‘ls’ is completed. However, it bet the same as before: if you read it once, you will definitely read it again. The kernel stores this information in a number of “caches” called dentry and inode cache. Dentry is common on all file systems, but each file system has its own cache for innodes. You can view different cache types and their sizes by executing this command:
head -2 /proc/slabinfo; cat /proc/slabinfo | egrep dentry|inode
(This RAM is a component of “Slab:” in meminfo)
Older kernels (about 2.6.9) have left some structure in the cache slab than new kernels. That means that although they may not be used, they remain in the device until there is memory pressure. This happens specifically with Proc_inodes. / Proc inodes also appear to pin task_structs, which means that each one can effectively occupy over 2 KBytes of RAM. This RAM will not display as ‘Cache’ and may display as a kernel memory leak. On a system with only about 100 tasks (with little memory pressure) there could be hundreds of thousands of them lying around.
They are harmless. But, on the surface, this could manifest as a kernel memory leak. To be sure, try this procedure for dentry and inodes. If the number of task_struct and Proc_inode_cache objects decreases, then there is no actual error.
Buffer cache (the “Buffer:” section of meminfo) is closely related to dentry / inode cache. The dentry and innode in memory represent structures on the disk, but are laid out very differently. This may be because we have the kernerl structure like a pointer in a copy in memory, but not on disk. May also occur when the format on the disk is an endianess different from the CPU.
In any case, when we need to get an innode or dentry to fill the cache, we must first take a page from the disk where the structures are represented. This cannot be part of the page cache because it is not actually the content of a file, but rather the raw content of the disk. A page in the buffer cache may have dozens of innodes on the disk, although we only create one innode in memory for each. The cache buffer is, again, a bet that the kernel will need another in the same group of innodes and will save a trip to disk by keeping this buffer page in memory.
Out of memory
Now that you know all these great uses for your unused RAM, you have to ask yourself: what happens when there’s no more unused RAM? If I don’t have free memory and I need a page for page cache, inode cache, or dentry cache ,, where do I get it?
First of all, the kernel tries not to let you have nearly 0 bytes of unused RAM. This is because, to free up RAM, you usually need to allocate more RAM. Have you ever gone to start a big project at your desk, and realized that you need to clean an area before going to work? The kernel needs such a “workspace” to perform its administration.
Based on the amount of RAM (and a few other factors), the kernel offers a solution for the amount of memory that it is comfortable with in its workspace (this value is displayed in human space). used in / Proc / sys / vm / min_free_kbyte). This value is translated and stored in different memory areas on the system. Upon reaching this level for any given area, the kernel begins to regain memory from the various uses described above.
Item in meminfo file:
SwapTotal: 1992052 kB
SwapFree: 1167632 kB
When the kernel decided not to take memory from any other source we have described so far, it started swapping. In the process, it takes user application data and writes it to a special location (or location) on the disk. You might think this is only a last resort once we absolutely can’t release any other type of RAM. However, the kernel doesn’t do it this way. Why?
Consider an application like / sbin / init. It has some extremely important tasks such as setting up the system at startup and prompting for login prompts if they die. But, how much of its data is actually used during normal system runtime? If the system is at its limit and is running out of RAM, should we exchange a completely unused page from the data boot page / sbin / init and use that page for the page buffer? Or should we keep / sbin / init entirely in memory and force potential page buffer users to disk?
The kernel will typically choose to exchange data / sbin / init to meet the current needs of currently running applications. For this reason, even a system with large amounts of RAM (even when properly adjusted) can trade. There are many memory pages that are user application data, but rarely used. All of these are targets to be swapped in favor of other uses for RAM.
But, if the mere presence of the used exchange is not evidence of a system with too little RAM for its workload, what is it? As you can see, the most effectively used exchange of data will not be accessed for a long time. If data in continuous exchange is accessed, then it is not effectively used. We can monitor the amount of data going in and out of exchange with the vmstat command. The following will produce output every 5 seconds:
$ vmstat 5
procs -----------memory---------- ---swap--- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
3 0 833704 54824 25196 328672 10 0 343 18 510 1382 96 4 0 0
6 0 833704 54556 25092 324584 0 0 333 22 504 1180 93 7 0 0
4 0 833704 51516 25112 320856 33 0 315 19 508 1234 95 5 0 0
3 0 833704 54836 24984 314404 6 0 223 27 498 1191 95 5 0 0
3 0 833704 53072 24944 307844 4 0 216 22 518 1375 96 4 0 0
5 0 833704 53928 24888 304076 6 0 262 18 548 1665 94 6 0 0
3 4 843964 50192 184 58064 16 2416 16 2464 570 1451 78 22 0 0
3 7 908244 48756 224 47760 118 13645 149 13664 730 1245 76 16 0 8
3 2 922064 54280 340 49228 1470 2838 1817 2865 711 1481 88 12 0 0
4 2 932644 54068 424 52204 1972 2195 2596 2211 678 1388 90 10 0 0
2 3 944012 56304 492 52292 2986 2591 3063 2615 735 1562 89 11 0 0
2 4 957304 54604 572 51964 4042 3414 4096 3438 852 1808 88 12 0 0
The columns we’re most interested in are “si” and “so” are the abbreviations for “swap in” and “swap”. You can interpret them this way:
- A small “swap” and “swap” are normal and indicate that there is very little need for the application data currently being exchanged and any new memory needs that are being handled by other means. in addition to application data exchange.
- A large “swap” with a small “swap” is often an indication that a swapped application is now starting to run again and needs to get data back from the disk.
- A large “swap” with a small “swap” often indicates that an application is in need of some kind of RAM (possibly any cache or application data) and is exchanging application data. used to get that RAM.
- A large “swap” with a large “swap” is generally the condition you want to avoid. That means the system is crashing or needing new RAM as fast as being able to exchange application data. This usually means that the application that needs RAM has removed all really old data and has begun forcing actively used data to exchange. Those “positive usage data” will immediately be read back from the exchange, making both “exchange in” and “exchange” raise and nearly equal.
The vmstat example above shows a system running normally, then has very, very large memory using the startup application.
The concept of exchange buffer is very similar to the concept of page buffer. A user application data page written to disk is very similar to a file data page on disk. Whenever a page is read from the swap (“si” in vmstat), it will be placed in the swap buffer. Like the page buffer, this is a bet on the kernel part. Bet that we might need to swap this page again . If such a need arises, we may discover that there is already a copy on the disk and simply throw the page into memory immediately. This saves us the cost of re-writing the page to disk.
The swap buffer is really only useful when we are reading data from the exchange and never writing to it. If we write to the page, the copy on the disk is no longer in sync with the copy in memory. If this happens, we have to write to disk to swap pages again, just like the first time we did. However, the cost of saving any write to disk is huge and even with only a small portion of the swap buffer ever written to, the system will work better.
Another operation that occurs when we run out of memory is to write dirty data (“Dirty:” from meminfo) to disk. Dirty data is the page buffer for which writing has occurred. Before we can free up that page buffer, we must first update the original copy on the disk with the write data. When the system drops below its min_free_kbytes value, the system will try to free the page buffer. When clearing the page cache, it is common to find such dirty pages and the kernel will initiate these logs whenever it finds them. You can see this happening when “Dirty:” decreases at the same time as “bo” (Writing block) increases from vmstat.
The kernel may require multiple pages to be written to the parallel disk. This speeds up disk performance by lumping them together or extending them across several disks. When the kernel is actively trying to update data on a disk for a page, it will increase meminfo’s “WritBack:” entry for that page.
The “sync” command will force all dirty data to be written and “Dirty:” to a very low value momentarily.