VTune Call Graph Utilization Report
The call graph collector of the VTune(TM) Performance Analyzer collects information about the program flow of an application, that is, the number of function calls to some other function and the amount of time each function spent executing its code and/or calling other functions.
1. Pre-Condition
The application needed to be optimized must be built with the link option /FIXED:NO, or else the function can’t be used.
2. Instrumentation for Call Graph Profiling
Instrumentation is the process of modifying a program so that dynamic information is recorded during program execution. Data collection routines invoked at specific points in the execution of the target program record run-time information. These routines provide information about time spent in each function, and the call sequence that leads to a specific function. By default, the VTune Performance Analyzer instruments all application functions and system-level exports.
Note: VTune export too many dynamic library to make the application can’t startup, so the configure of the Activity should be modified. In the Configure Call Graph Window, click the Advance button, turn the instrumental level of System DLL and User EXE to minimal. |
3. Process of Call Graph Data Collection
When you select an Activity with the call graph collector in the Tuning Browser and click Run Activity to begin performance data collection, the VTune Performance Analyzer performs the following steps:
1) Instruments the application and/or modules of interest defined during Activity creation.
2) Launches and profiles the instrumented application and/or modules of interest until the application terminates, or until you stop running the application. It keeps track of the exit and entry points, records the number of times each function was called, establishes a relationship between the caller (parent) and callee (child) function, and stores this data.
3) Analyzes the profile data, generates a new Activity result, which is stored in the Tuning Browser, and displays the call graph data in tabular and graph view.
4. Viewing Call Graph Data
After collected call graph data using the VTune Performance Analyzer, you can view the call graph profiling information in the following views:
Graph: provides visual graphical presentation of the application execution.
Call list: provides full information on the selected function, its callers (parents) and callees (children) in the table format.
Function summary: provides full information on all the application instrumented functions in table format.
The upper section of the call graph window displays the function information in a table format. The rows in the function summary display functions with different background colors according to the hierarchical position. The default view shows the first four types of data as follows:
Module |
Thread |
Class(optional) |
Function |
The function summary view provides several columns, the most import column for our test is the Self Time (microseconds), Total Time (microseconds), % in function.
Self Time: Time (microseconds) spent in the function itself.
Total Time: Time (microseconds) spent in the function and in all the callees it called.
% in function: Ratio displaying how much time was spent in the function itself. You can calculate the ratio using the following formula:
% in function = (Self Time/Total Time)* 100
|
In the same environment, the three benchmark may change slightly, we can choose the average value calculated by several measurements.
Note: the punctuation of the column percentage in function can’t be more precise, maybe the calculator can help. |
5. Examples
The key data from the call graph as below, the first two indicate running in the same environment and the third one use the optimized SCCP library.
Table 1 unoptimized sccp 1
Module |
Function |
Self Time |
Total Time |
% in function |
|
3gpp_r99_sccp_DLL.dll – Total |
|
29060 |
|
|
|
3gpp_r99_sccp_DLL.dll |
|
0 |
|
|
|
3gpp_r99_sccp_DLL.dll |
pa_DLLGetPtr |
0 |
0 |
0 |
|
3gpp_r99_sccp_DLL.dll |
pa_DLLGetTable |
0 |
0 |
0 |
|
3gpp_r99_sccp_DLL.dll |
pal_Initialize |
0 |
0 |
0 |
|
3gpp_r99_sccp_DLL.dll |
pal_Terminate |
0 |
0 |
0 |
|
3gpp_r99_sccp_DLL.dll |
|
29060 |
|
|
|
3gpp_r99_sccp_DLL.dll |
pal_PreDecodeKFE |
17869 |
78826 |
0.23 |
|
3gpp_r99_sccp_DLL.dll |
pal_PreDecodeTO |
11191 |
32389 |
0.35 |
|
Table 2 unoptimized sccp 2
Module |
Function |
Self Time |
Total Time |
% in function |
3gpp_r99_sccp_DLL.dll – Total |
|
28931 |
|
|
3gpp_r99_sccp_DLL.dll |
|
0 |
|
|
3gpp_r99_sccp_DLL.dll |
pa_DLLGetPtr |
0 |
0 |
0 |
3gpp_r99_sccp_DLL.dll |
pa_DLLGetTable |
0 |
0 |
0 |
3gpp_r99_sccp_DLL.dll |
pal_Initialize |
0 |
0 |
0 |
3gpp_r99_sccp_DLL.dll |
pal_Terminate |
0 |
0 |
0 |
3gpp_r99_sccp_DLL.dll |
|
28931 |
|
|
3gpp_r99_sccp_DLL.dll |
pal_PreDecodeKFE |
17837 |
78844 |
0.23 |
3gpp_r99_sccp_DLL.dll |
pal_PreDecodeTO |
11094 |
32234 |
0.34 |
Table 3 Optimized SCCP
Module |
Function |
Self Time |
Total Time |
% in function |
3gpp_r99_sccp_DLL.dll - Total |
|
22636 |
|
|
3gpp_r99_sccp_DLL.dll |
|
1 |
|
|
3gpp_r99_sccp_DLL.dll |
pa_DLLGetPtr |
1 |
1 |
1 |
3gpp_r99_sccp_DLL.dll |
pa_DLLGetTable |
0 |
0 |
0 |
3gpp_r99_sccp_DLL.dll |
pal_InitKFE |
0 |
0 |
0 |
3gpp_r99_sccp_DLL.dll |
|
22635 |
|
|
3gpp_r99_sccp_DLL.dll |
pal_PreDecodeKFE |
16823 |
77956 |
0.22 |
3gpp_r99_sccp_DLL.dll |
pal_PreDecodeTO |
5812 |
40065 |
0.15 |