文章出处:http://stackoverflow.com/questions/21617500/understanding-load-average-vs-cpu-usage
First, top
does not calculate load average
itself. It just reads load average
from the /proc/loadavg
file (strace
shows that top
opens /proc/loadavg
and then reads from it periodically). man proc
says this for /proc/loadavg
:
/proc/loadavg
The first three fields in this file are load average figures giving the number of jobs in the run queue (state R) or waiting for disk I/O (state D) averaged over 1, 5, and 15 minutes. They are the same as the load average numbers given by uptime(1) and other programs. The fourth field consists of two numbers separated by a slash (/). The first of these is the number of currently executing kernel scheduling entities (processes, threads); this will be less than or equal to the number of CPUs. The value after the slash is the number of kernel scheduling entities that currently exist on the system. The fifth field is the PID of the process that was most recently created on the system.
So load average
shows the number of jobs in the run queue. And you are shown the first three values from /proc/loadavg
by the top
. If you run cat /proc/loadavg
you will see all values in the file.
By the way there seems to be an error in the description of the fourth field. I have written a simple test program in C++ that reads an integer N from input and then creates N threads that are just running an infinitive loop each. When I run my program and ask it to create 256 threads. I have only 8 processors with HT. Yet I see this in the file:
>cat /proc/loadavg
74.44 21.04 10.59 259/931 17293
Clearly 259 is bigger than the number CPU on my machine. This post http://juliano.info/en/Blog:Memory_Leak/Understanding_the_Linux_load_average talks about the same - there is an error in the description of load average
. This is a quote:
It is worth noting that the current explanation in proc(5) manual page (as of man-pages version 3.21, March 2009) is wrong. It reports the first number of the forth field as the number of currently executing scheduling entities, and so predicts it can't be greater than the number of CPUs. That doesn't match the real implementation, where this value reports the current number of runnable threads
So, answering your questions:
If the load average is at 7, with 4 hyper-threaded processors, shouldn't that means that the CPU is working to about 7/8 capacity?
No it just means that you have 7 running processes in the job queue on average.
Why, then was it showing 50.0%id? How can it be idle half the time?
Because load average
does not mean "% of CPU capacity". Your threads are simply using only 50% of CPU and 50% of time doing something else.
And finally. Below is my simple test. To build use g++ -pthread my_test.cpp -o my_test
. Run ./my_test 8
and see your idle time when threads run constantly and do not spend time waiting for anything. Or run ./my_test 128
to see that load average can be much bigger than the number of CPU.
#include
#include
#include
#include
#include
#include
#include
static void* __attribute__ ((noinline))
my_thread_func(void * arg)
{
printf("Thread %lld:\n", (long long)pthread_self());
volatile long long i = 0;
while (1) {
++i;
}
return 0;
}
int
main(int argc, char *argv[])
{
int num_threads;
pthread_t *my_threads;
num_threads = atoi(argv[1]);
my_threads = new pthread_t[num_threads];
for (int tnum = 0; tnum < num_threads; tnum++) {
pthread_create(&my_threads[tnum], NULL, &my_thread_func, NULL);
}
sleep (600);
return 0;
}