Vtune

 

 

Inter Thread Profiler 注册地址

https://registrationcenter.intel.com/RegCenter/RegisterSNInfo.aspx?sn=VNPR-PZ9VPFHC&EmailID=wisage%40gmail.com&Sequence=1064324

 

 

VTune Call Graph Utilization Report

                The call graph collector of the VTune(TM) Performance Analyzer collects information about the program flow of an application, that is, the number of function calls to some other function and the amount of time each function spent executing its code and/or calling other functions.

1.       Pre-Condition

                The application needed to be optimized must be built with the link option /FIXED:NO, or else the function can’t be used.

2.       Instrumentation for Call Graph Profiling

                Instrumentation is the process of modifying a program so that dynamic information is recorded during program execution. Data collection routines invoked at specific points in the execution of the target program record run-time information. These routines provide information about time spent in each function, and the call sequence that leads to a specific function. By default, the VTune Performance Analyzer instruments all application functions and system-level exports.

 

Note: VTune export too many dynamic library to make the application can’t startup, so the configure of the Activity should be modified. In the Configure Call Graph Window, click the Advance button, turn the instrumental level of System DLL and User EXE to minimal.

                This process does not change the functionality of the program. However, during runtime, it slows performance down. The VTune analyzer keeps track of the exit and entry points, records the number of times each function was called, establishes a relationship between the caller (parent) and callee (child) function, and stores this data.

 

 

3.       Process of Call Graph Data Collection

            When you select an Activity with the call graph collector in the Tuning Browser and click Run Activity to begin performance data collection, the VTune Performance Analyzer performs the following steps:

1)      Instruments the application and/or modules of interest defined during Activity creation.

2)      Launches and profiles the instrumented application and/or modules of interest until the application terminates, or until you stop running the application. It keeps track of the exit and entry points, records the number of times each function was called, establishes a relationship between the caller (parent) and callee (child) function, and stores this data.

3)      Analyzes the profile data, generates a new Activity result, which is stored in the Tuning Browser, and displays the call graph data in tabular and graph view.

4.      Viewing Call Graph Data

            After collected call graph data using the VTune Performance Analyzer, you can view the call graph profiling information in the following views:

 

            Graph: provides visual graphical presentation of the application execution.

            Call list: provides full information on the selected function, its callers (parents) and callees (children) in the table format.

            Function summary: provides full information on all the application instrumented functions in table format.

            The upper section of the call graph window displays the function information in a table format. The rows in the function summary display functions with different background colors according to the hierarchical position. The default view shows the first four types of data as follows:

 

Module

    Thread

        Class(optional)

            Function

 

The function summary view provides several columns, the most import column for our test is the Self Time (microseconds), Total Time (microseconds), % in function.

Self Time: Time (microseconds) spent in the function itself.

Total Time: Time (microseconds) spent in the function and in all the callees it called.

% in function: Ratio displaying how much time was spent in the function itself. You can calculate the ratio using the following formula:

 

% in function (Self Time/Total Time)* 100

 

 

 

 

In the same environment, the three benchmark may change slightly, we can choose the average value calculated by several measurements.

 

Note:  the punctuation of the column percentage in function can’t be more precise, maybe the calculator can help.

Theoretically, when we substitute one of the protocol decode library, the Self Time and the percentage in function ascend or descend can be regarded as the performance of the library changed.

 

 

 

5.      Examples

The key data from the call graph as below, the first two indicate running in the same environment and the third one use the optimized SCCP library.

Table 1 unoptimized sccp 1

Module

Function

Self Time

Total Time

% in function

3gpp_r99_sccp_DLL.dll – Total

 

29060

 

 

3gpp_r99_sccp_DLL.dll

 

0

 

 

3gpp_r99_sccp_DLL.dll

pa_DLLGetPtr

0

0

0

3gpp_r99_sccp_DLL.dll

pa_DLLGetTable

0

0

0

3gpp_r99_sccp_DLL.dll

pal_Initialize

0

0

0

3gpp_r99_sccp_DLL.dll

pal_Terminate

0

0

0

3gpp_r99_sccp_DLL.dll

 

29060

 

 

3gpp_r99_sccp_DLL.dll

pal_PreDecodeKFE

17869

78826

0.23

3gpp_r99_sccp_DLL.dll

pal_PreDecodeTO

11191

32389

0.35

           

 

Table 2 unoptimized sccp 2

Module

Function

Self Time

Total Time

% in function

3gpp_r99_sccp_DLL.dll – Total

 

28931

 

 

3gpp_r99_sccp_DLL.dll

 

0

 

 

3gpp_r99_sccp_DLL.dll

pa_DLLGetPtr

0

0

0

3gpp_r99_sccp_DLL.dll

pa_DLLGetTable

0

0

0

3gpp_r99_sccp_DLL.dll

pal_Initialize

0

0

0

3gpp_r99_sccp_DLL.dll

pal_Terminate

0

0

0

3gpp_r99_sccp_DLL.dll

 

28931

 

 

3gpp_r99_sccp_DLL.dll

pal_PreDecodeKFE

17837

78844

0.23

3gpp_r99_sccp_DLL.dll

pal_PreDecodeTO

11094

32234

0.34

Table 3 Optimized SCCP 

 

Module

Function

Self Time

Total Time

% in function

3gpp_r99_sccp_DLL.dll - Total

 

22636

 

 

3gpp_r99_sccp_DLL.dll

 

1

 

 

3gpp_r99_sccp_DLL.dll

pa_DLLGetPtr

1

1

1

3gpp_r99_sccp_DLL.dll

pa_DLLGetTable

0

0

0

3gpp_r99_sccp_DLL.dll

pal_InitKFE

0

0

0

3gpp_r99_sccp_DLL.dll

 

22635

 

 

3gpp_r99_sccp_DLL.dll

pal_PreDecodeKFE

16823

77956

0.22

3gpp_r99_sccp_DLL.dll

pal_PreDecodeTO

5812

40065

0.15

 

 

发表于 @ 2009年11月13日 18:30:00 | 评论( 0 ) | 举报| 收藏

 

 

你可能感兴趣的:(Vtune)