VTune Call Graph Utilization Report

VTune Call Graph Utilization Report

The call graph collector of the VTune(TM) Performance Analyzer collects information about the program flow of an application, that is, the number of function calls to some other function and the amount of time each function spent executing its code and/or calling other functions.

1. Pre-Condition

The application needed to be optimized must be built with the link option /FIXED:NO, or else the function can’t be used.

2. Instrumentation for Call Graph Profiling

Instrumentation is the process of modifying a program so that dynamic information is recorded during program execution. Data collection routines invoked at specific points in the execution of the target program record run-time information. These routines provide information about time spent in each function, and the call sequence that leads to a specific function. By default, the VTune Performance Analyzer instruments all application functions and system-level exports.

Note: VTune export too many dynamic library to make the application can’t startup, so the configure of the Activity should be modified. In the Configure Call Graph Window, click the Advance button, turn the instrumental level of System DLL and User EXE to minimal.

This process does not change the functionality of the program. However, during runtime, it slows performance down. The VTune analyzer keeps track of the exit and entry points, records the number of times each function was called, establishes a relationship between the caller (parent) and callee (child) function, and stores this data.

3. Process of Call Graph Data Collection

When you select an Activity with the call graph collector in the Tuning Browser and click Run Activity to begin performance data collection, the VTune Performance Analyzer performs the following steps:

1) Instruments the application and/or modules of interest defined during Activity creation.

2) Launches and profiles the instrumented application and/or modules of interest until the application terminates, or until you stop running the application. It keeps track of the exit and entry points, records the number of times each function was called, establishes a relationship between the caller (parent) and callee (child) function, and stores this data.

3) Analyzes the profile data, generates a new Activity result, which is stored in the Tuning Browser, and displays the call graph data in tabular and graph view.

4. Viewing Call Graph Data

After collected call graph data using the VTune Performance Analyzer, you can view the call graph profiling information in the following views:

Graph: provides visual graphical presentation of the application execution.

Call list: provides full information on the selected function, its callers (parents) and callees (children) in the table format.

Function summary: provides full information on all the application instrumented functions in table format.

The upper section of the call graph window displays the function information in a table format. The rows in the function summary display functions with different background colors according to the hierarchical position. The default view shows the first four types of data as follows:

Module

Thread

Class(optional)

Function

The function summary view provides several columns, the most import column for our test is the Self Time (microseconds), Total Time (microseconds), % in function.

Self Time: Time (microseconds) spent in the function itself.

Total Time: Time (microseconds) spent in the function and in all the callees it called.

% in function: Ratio displaying how much time was spent in the function itself. You can calculate the ratio using the following formula:

% in function = (Self Time/Total Time)* 100


In the same environment, the three benchmark may change slightly, we can choose the average value calculated by several measurements.

Note: the punctuation of the column percentage in function can’t be more precise, maybe the calculator can help.

Theoretically, when we substitute one of the protocol decode library, the Self Time and the percentage in function ascend or descend can be regarded as the performance of the library changed.

5. Examples

The key data from the call graph as below, the first two indicate running in the same environment and the third one use the optimized SCCP library.

Table 1 unoptimized sccp 1

Module

Function

Self Time

Total Time

% in function

3gpp_r99_sccp_DLL.dll – Total

29060

3gpp_r99_sccp_DLL.dll

0

3gpp_r99_sccp_DLL.dll

pa_DLLGetPtr

0

0

0

3gpp_r99_sccp_DLL.dll

pa_DLLGetTable

0

0

0

3gpp_r99_sccp_DLL.dll

pal_Initialize

0

0

0

3gpp_r99_sccp_DLL.dll

pal_Terminate

0

0

0

3gpp_r99_sccp_DLL.dll

29060

3gpp_r99_sccp_DLL.dll

pal_PreDecodeKFE

17869

78826

0.23

3gpp_r99_sccp_DLL.dll

pal_PreDecodeTO

11191

32389

0.35

Table 2 unoptimized sccp 2

Module

Function

Self Time

Total Time

% in function

3gpp_r99_sccp_DLL.dll – Total

28931

3gpp_r99_sccp_DLL.dll

0

3gpp_r99_sccp_DLL.dll

pa_DLLGetPtr

0

0

0

3gpp_r99_sccp_DLL.dll

pa_DLLGetTable

0

0

0

3gpp_r99_sccp_DLL.dll

pal_Initialize

0

0

0

3gpp_r99_sccp_DLL.dll

pal_Terminate

0

0

0

3gpp_r99_sccp_DLL.dll

28931

3gpp_r99_sccp_DLL.dll

pal_PreDecodeKFE

17837

78844

0.23

3gpp_r99_sccp_DLL.dll

pal_PreDecodeTO

11094

32234

0.34

Table 3 Optimized SCCP

<spa

分享到:
评论
winzenghua
  • 浏览: 312946 次
  • 性别: Icon_minigender_2
  • 来自: 广州
文章分类
社区版块
存档分类
最新评论

你可能感兴趣的:(thread,performance)