来自 《Linux® Debugging and Performance Tuning: Tips and Techniques》ch1
重点如下:
1) 编译时 使用 -pg 产生profile gmon.out
2) gprof -b a.out
3) 编译时使用-pg -fprofile-arcs -ftest-coverage 支持产生调用图
4) 运行程序后,使用gcov *.c产生调用图*.c.gcov
5) 使用kprof查看图形界面
gcc Option Needed for gprofBefore programs can be profiled using gprof, they must be compiled with the-pg gcc option. To get complete information about gprof, you can use the commandinfo gprof or man gprof. Listing 1.5 shows the benefits that profiling can have on a small program. The sample1 program prints the prime numbers up to 50,000. You can use the output from gprof to increase this program's performance by changing the program to sample2, shown later in Listing 1.8. Listing 1.5. sample1.c1 #include <stdlib.h> 2 #include <stdio.h> 3 4 int prime (int num); 5 6 int main() 7 { 8 int i; 9 int colcnt = 0; 10 for (i=2; i <= 50000; i++) 11 if (prime(i)) { 12 colcnt++; 13 if (colcnt%9 == 0) { 14 printf("%5d\n",i); 15 colcnt = 0; 16 } 17 else 18 printf("%5d ", i); 19 } 20 putchar('\n'); 21 return 0; 22 } 23 24 int prime (int num) { 25 /* check to see if the number is a prime? */ 26 int i; 27 for (i=2; i < num; i++) 28 if (num %i == 0) 29 return 0; 30 return 1; 31 } Building the sample1 Program and Using gprofThe sample1.c program needs to be compiled with the option -pg to have profile data generated, as shown in Figure 1.11. Figure 1.11. Building and running sample1.When the sample1 program is run, the gmon.out file is created. To view the profiling data, the gprof utility must be on your system. If your system isrpm-based, the rpm command shows the version of gprof, as shown inFigure 1.12. Figure 1.12. The version of gprof.gprof is in the binutils package. For you to use the utility, the package must be installed on your system. One useful gprof option is-b. The -b option eliminates the text output that explains the data output provided by gprof: # gprof -b ./sample1 The output shown in Listing 1.6 from gprof gives some high-level information like the total running time, which is 103.74 seconds. The main routine running time is 0.07 seconds, and the prime routine running time is 103.67 seconds. The prime routine is called 49,999 times. Listing 1.6. Output from gprof for sample1Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 99.93 103.67 103.67 49999 2.07 2.07 prime 0.07 103.74 0.07 main Call graph granularity: each sample hit covers 4 byte(s) for 0.01% of 103.74 seconds index % time self children called name <spontaneous> [1] 100.0 0.07 103.67 main [1] 103.67 0.00 49999/49999 prime [2] ------------------------------------------------ 103.67 0.00 49999/49999 main [1] [2] 99.9 103.67 0.00 49999 prime [2] ------------------------------------------------ Index by function name [1] main [2] prime Next we can use the gcov program to look at the actual number of times each line of the program was executed. (SeeChapter 2, "Code Coverage," for more about gcov.) We will build the sample1 program with two additional options-fprofile-arcs and-ftest-coverage, as shown in Figure 1.13. These options let you look at the program using gcov, as shown inFigure 1.14. Figure 1.13. Building sample1 with gcov options.
[View full size image]
Figure 1.14. Running sample1 and creating gcov output.Running gcov on the source code produces the file sample1.c.gcov. It shows the actual number of times each line of the program was executed.Listing 1.7 is the output of gcov on sample1. Listing 1.7. Output from gcov for sample1-: 0:Source:sample1.c -: 0:Graph:sample1.bbg -: 0:Data:sample1.da -: 1:#include <stdlib.h> -: 2:#include <stdio.h> -: 3: -: 4:int prime (int num); -: 5: -: 6:int main() 1: 7: { 1: 8: int i; 1: 9: int colcnt = 0; 50000: 10: for (i=2; i <= 50000; i++) 49999: 11: if (prime(i)) { 5133: 12: colcnt++; 5133: 13: if (colcnt%9 == 0) { 570: 14: printf("%5d\n",i); 570: 15: colcnt = 0; -: 16: } -: 17: else 4563: 18: printf("%5d ", i); -: 19: } 1: 20: putchar('\n'); 1: 21: return 0; -: 22: } -: 23: 49999: 24:int prime (int num) { -: 25: /* check to see if the number is a prime? */ 49999: 26: int i; 121337004: 27: for (i=2; i < num; i++) 121331871: 28: if (num %i == 0) 44866: 29: return 0; 5133: 30: return 1; -: 31: } -: 32: There are 5,133 prime numbers. The expensive operations in the routine prime are thefor loop (line 27) and the if statement (line 28). The "hot spots" are the loop and the if test inside the prime routine. This is where we will work to increase the program's performance. One change that will help this program is to use thesqrt() function, which returns the nonnegative square root function of the number passed in. sample2, shown inListing 1.8, has been changed to use the sqrt function in the newly created function called faster. Listing 1.8. sample2.c1 #include <stdlib.h> 2 #include <stdio.h> 3 #include <math.h> 4 5 int prime (int num); 6 int faster (int num); 7 8 int main() 9 { 10 int i; 11 int colcnt = 0; 12 for (i=2; i <= 50000; i++) 13 if (prime(i)) { 14 colcnt++; 15 if (colcnt%9 == 0) { 16 printf("%5d\n",i); 17 colcnt = 0; 18 } 19 else 20 printf("%5d ", i); 21 } 22 putchar('\n'); 23 return 0; 24 } 25 26 int prime (int num) { 27 /* check to see if the number is a prime? */ 28 int i; 29 for (i=2; i <= faster(num); i++) 30 if (num %i == 0) 31 return 0; 32 return 1; 33 } 34 35 int faster (int num) 36 { 37 return (int) sqrt( (float) num); 38 } Now you can build the sample2 program (see Figure 1.15) and use gprof to check how long the program will take to run (seeFigure 1.16). Also, the gcov output shows the reduced number of times each line needs to be executed. InListing 1.9, the total running time has been reduced from 103.74 seconds to 2.80 seconds. Figure 1.15. Building and running sample2.Figure 1.16. Using gprof on sample2.Listing 1.9 shows the output of gprof for the sample2 program. Listing 1.9. Output from gprof for sample2Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls us/call us/call name 52.68 1.48 1.48 1061109 1.39 1.39 faster 46.61 2.78 1.30 49999 26.10 55.60 prime 0.71 2.80 0.02 main Call graph granularity: each sample hit covers 4 byte(s) for 0.36% of 2.80 seconds index % time self children called name <spontaneous> [1] 100.0 0.02 2.78 main [1] 1.30 1.48 49999/49999 prime [2] --------------------------------------------------- 1.30 1.48 49999/49999 main [1] [2] 99.3 1.30 1.48 49999 prime [2] 1.48 0.00 1061109/1061109 faster [3] --------------------------------------------------- 1.48 0.00 1061109/1061109 prime [2] [3] 52.7 1.48 0.00 1061109 faster [3] --------------------------------------------------- Index by function name [3] faster [1] main [2] prime Now we'll run gcov on the sample2 program, as shown in Figures 1.17 and 1.18. Figure 1.17. Building sample2 with gcov and running sample2.
[View full size image]
Figure 1.18. Running sample2 and getting gcov output.Listing 1.10 shows gcov output for the sample2 program. Listing 1.10. Output of sample2.c.gcov -: 0:Source:sample2.c
-: 0:Graph:sample2.bbg
-: 0:Data:sample2.da
-: 1:#include <stdlib.h>
-: 2:#include <stdio.h>
-: 3:#include <math.h>
-: 4:
-: 5:int prime (int num);
-: 6:int faster (int num);
-: 7:
-: 8:int main()
1: 9:{
1: 10: int i;
1: 11: int colcnt = 0;
50000: 12: for (i=2; i <= 50000; i++)
49999: 13: if (prime(i)) {
5133: 14: colcnt++;
5133: 15: if (colcnt%9 == 0) {
570: 16:printf("%5d\n",i);
570: 17:colcnt = 0;
-: 18: }
-: 19: else
4563: 20: printf("%5d ", i);
-: 21: }
1: 22: putchar('\n');
1: 23: return 0;
-: 24: }
-: 25:
49999: 26:int prime (int num) {
-: 27: /* check to see if the number is a
prime? */
49999: 28: int i;
1061109: 29: for (i=2; i <= faster(num); i++)
1055976: 30: if (num %i == 0)
44866: 31: return 0;
5133: 32: return 1;
-: 33: }
-: 34:
-: 35:int faster (int num)
1061109: 36: {
1061109: 37: return (int) sqrt( (float) num);
-: 38: }
-: 39: The for loop in the prime routine has been reduced from 121 million executions to 1 million executions. Therefore, the total time has been reduced from 103.74 seconds to 2.80 seconds. The tools gprof and gcov helped find the "hot spots" in this sample program. After the "hot spots" were found, the program was changed to increase its overall performance. It is interesting how changing a few lines of code can have a great impact on a program's performance. Listing 1.11, sample3.cpp, has three different functions (1, 2, and 3). It shows a more complex use of profiling, with both flat and graphic profiles. We'll also use kprof, which can use gprof output. It presents the information in list or tree views, which make the information easier to understand when programs are more complicated. Let's start by building the sample3.cpp program and displaying the flat and graphic profiles and then displaying the data using kprof. Listing 1.11. sample3.cpp1 #include <iostream> 2 3 void function1(){ 4 for(int i=0;i<1000000;i++); 5 } 6 7 void function2(){ 8 function1(); 9 for (int i=0;i<2000000;i++); 10 } 11 12 void function3(){ 13 function1(); 14 function2(); 15 for (int i=0;i<3000000;i++); 16 function1(); 17 } 18 19 int main(){ 20 for(int i=0;i<10;i++) 21 function1(); 22 23 for (int i=0;i<5000000;i++); 24 25 for(int i=0;i<10;i++) 26 function2(); 27 for(int i=0; i<13;i++); 28 { 29 function3(); 30 function2(); 31 function1(); 32 } 33 } Figure 1.19 shows the commands used to build and run the sample3 program. gprof is also run on sample3 to get the profile data from sample3. Figure 1.19. Building and capturing gprof output for sample3.We won't use the -b option on the gprof output on the sample3 program so that we can see all the descriptive information that gprof can display. The sample3.gprof should look similar to this: Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 43.36 4.21 4.21 12 0.35 0.52 function2() 42.84 8.37 4.16 25 0.17 0.17 function1() 8.65 9.21 0.84 main 5.15 9.71 0.50 1 0.50 1.35 function3() 0.00 9.71 0.00 1 0.00 0.00 global constructors keyed to function1() 0.00 9.71 0.00 1 0.00 0.00 __static_initialization_and_destruction_0(int, int)
Call graph (explanation follows) granularity: each sample hit covers 4 byte(s) for 0.10% of 9.71 seconds index % time self children called name <spontaneous> [1] 100.0 0.84 8.87 main [1] 3.86 1.83 11/12 function2() [2] 1.83 0.00 11/25 function1() [3] 0.50 0.85 1/1 function3() [4] ------------------------------------ 0.35 0.17 1/12 function3() [4] 3.86 1.83 11/12 main [1] [2] 63.9 4.21 2.00 12 function2() [2] 2.00 0.00 12/25 function1() [3] ------------------------------------ 0.33 0.00 2/25 function3() [4] 1.83 0.00 11/25 main [1] 2.00 0.00 12/25 function2() [2] [3] 42.8 4.16 0.00 25 function1() [3] ------------------------------------ 0.50 0.85 1/1 main [1] [4] 13.9 0.50 0.85 1 function3() [4] 0.35 0.17 1/12 function2() [2] 0.33 0.00 2/25 function1() [3] ------------------------------------ 0.00 0.00 1/1 __do_global_ctors_aux [13] [11] 0.0 0.00 0.00 1 global constructors keyed to function1() [11] 0.00 0.00 1/1 __static_initialization_and_destruction_0(int, int) [12] ------------------------------------ 0.00 0.00 1/1 global constructors keyed to function1() [11] [12] 0.0 0.00 0.00 1 __static_initialization_and_destruction_0(int, int) [12] ------------------------------------ This table describes the program's call tree. It is sorted by the total amount of time spent in each function and its children. Each entry in this table consists of several lines. The line with the index number at the left margin lists the current function. The lines above it list the functions that called this function, and the lines below it list the functions this one called. You see the following:
For the function's parents, the fields have the following meanings:
If the function's parents cannot be determined, the word <spontaneous> is printed in the name field, and all the other fields are blank. For the function's children, the fields have the following meanings:
If the call graph has any cycles (circles), there is an entry for the cycle as a whole. This entry shows who called the cycle (as parents) and the members of the cycle (as children). The + recursive calls entry shows how many function calls were internal to the cycle. The calls entry for each member shows, for that member, how many times it was called from other members of the cycle. Index by function name [11] global constructors keyed to function1() [3] function1() [4] function3() [12] __static_initialization_and_destruction_0(int, int) [2] function2() [1] main |
|