Likwid-高性能服务器开发不可缺少的工具箱

原创文章,转载请注明: 转载自 非业余研究

本文链接地址: Likwid-高性能服务器开发不可缺少的工具箱

做高性能服务器的时候,知道如何开发高性能代码是一个事情,开发出来的系统是不是高性能那就是另外一个事情了。

通常我们需要了解系统的CPU拓扑结构,内存使用情况,各种CPU性能计数器的数字,各种CPU Cache的使用情况,命中率等等信息,这些信息有效的结合在一起才能准确的分析出我们程序的缺陷,从而找到更好的优化点。 通常这些信息是散落在系统的各个地方,对于普通的开发人员很难汇总起来,形成合力。

好了,以精细出名的德国人又来帮忙了,隆重推出Likwid。

Likwid-高性能服务器开发不可缺少的工具箱_第1张图片

Likwid项目的地址在 这里。 根据主页的上的描述:

Likwid stands for Like I knew what I am doing. This project contributes easy to use command line tools for Linux to support programmers in developing high performance multi threaded programs.

It contains the following tools:

likwid-topology: Show the thread and cache topology
likwid-perfctr: Measure hardware performance counters on Intel and AMD processors
likwid-features: Show and Toggle hardware prefetch control bits on Intel Core 2 processors
likwid-pin: Pin your threaded application without touching your code (supports pthreads, Intel OpenMP and gcc OpenMP)
likwid-bench: Benchmarking framework allowing rapid prototyping of threaded assembly kernels
likwid-mpirun: Script enabling simple and flexible pinning of MPI and MPI/threaded hybrid applications
likwid-perfscope: Frontend for likwid-perfctr timeline mode. Allows live plotting of performance metrics.
likwid-powermeter: Tool for accessing RAPL counters and query Turbo mode steps on Intel processor.
likwid-memsweeper: Tool to cleanup ccNUMA memory domains.
Likwid stands out because:

No kernel patching, any vanilla linux 2.6 or newer kernel works
Transparent, always clear which events are chosen, event tags have the same naming as in documentation
Lightweight, LIKWID tries to add no overhead and keeps out of your way.
Easy to use, simple to build, no need to touch your code, configurable from outside. Clear CLI interface.
Multiplatform, likwid supports Intel and AMD processors
Up to date, likwid tries to fully support new processors as soon as possible
Extensible, you can add functionality by means of simple text files

同时他的文档还是做的非常不错的,使用的介绍在 这里

具体的使用我就不墨迹了,文档里面都有。我在这里秀下他的功能:


[[email protected] likwid-3.0]$ sudo ./likwid-topology 
-------------------------------------------------------------
CPU type:       Intel Core Westmere processor 
*************************************************************
Hardware Thread Topology
*************************************************************
Sockets:        2 
Cores per socket:       4 
Threads per core:       2 
-------------------------------------------------------------
HWThread        Thread          Core            Socket
0               0               0               1
1               0               1               1
2               0               9               1
3               0               10              1
4               0               0               0
5               0               1               0
6               0               9               0
7               0               10              0
8               1               0               1
9               1               1               1
10              1               9               1
11              1               10              1
12              1               0               0
13              1               1               0
14              1               9               0
15              1               10              0
-------------------------------------------------------------
Socket 0: ( 4 12 5 13 6 14 7 15 )
Socket 1: ( 0 8 1 9 2 10 3 11 )
-------------------------------------------------------------

*************************************************************
Cache Topology
*************************************************************
Level:  1
Size:   32 kB
Cache groups:   ( 4 12 ) ( 5 13 ) ( 6 14 ) ( 7 15 ) ( 0 8 ) ( 1 9 ) ( 2 10 ) ( 3 11 )
-------------------------------------------------------------
Level:  2
Size:   256 kB
Cache groups:   ( 4 12 ) ( 5 13 ) ( 6 14 ) ( 7 15 ) ( 0 8 ) ( 1 9 ) ( 2 10 ) ( 3 11 )
-------------------------------------------------------------
Level:  3
Size:   12 MB
Cache groups:   ( 4 12 5 13 6 14 7 15 ) ( 0 8 1 9 2 10 3 11 )
-------------------------------------------------------------

*************************************************************
NUMA Topology
*************************************************************
NUMA domains: 2 
-------------------------------------------------------------
Domain 0:
Processors:  4 5 6 7 12 13 14 15
Relative distance to nodes:  10 20
Memory: 16222.4 MB free of total 24567.1 MB
-------------------------------------------------------------
Domain 1:
Processors:  0 1 2 3 8 9 10 11
Relative distance to nodes:  20 10
Memory: 5424.19 MB free of total 24576 MB
-------------------------------------------------------------



$ sudo ./likwid-perfctr  -C 0-3 -g MEM sleep 10
-------------------------------------------------------------
-------------------------------------------------------------
CPU type:       Intel Core Westmere processor 
CPU clock:      2.13 GHz 
Measuring group MEM
-------------------------------------------------------------
sleep 10
Status: 0x400000000 
Status: 0x0 
Status: 0x0 
Status: 0x0 
+--------------------------------+-------------+-------------+-------------+-------------+
|             Event              |   core 0    |   core 1    |   core 2    |   core 3    |
+--------------------------------+-------------+-------------+-------------+-------------+
|       INSTR_RETIRED_ANY        | 1.15794e+08 | 3.30559e+08 | 9.21383e+07 | 6.13907e+07 |
|     CPU_CLK_UNHALTED_CORE      | 2.16557e+08 | 5.36794e+08 | 1.60588e+08 | 1.07672e+08 |
|      CPU_CLK_UNHALTED_REF      | 2.1624e+08  | 5.15724e+08 | 1.55415e+08 | 1.0452e+08  |
|    UNC_QMC_NORMAL_READS_ANY    | 1.42469e+07 |      0      |      0      |      0      |
|    UNC_QMC_WRITES_FULL_ANY     | 3.3378e+06  |      0      |      0      |      0      |
| UNC_QHL_REQUESTS_REMOTE_READS  | 5.95875e+06 |      0      |      0      |      0      |
|  UNC_QHL_REQUESTS_LOCAL_READS  | 9.16778e+06 |      0      |      0      |      0      |
| UNC_QHL_REQUESTS_REMOTE_WRITES |   163766    |      0      |      0      |      0      |
+--------------------------------+-------------+-------------+-------------+-------------+
+-------------------------------------+-------------+-------------+-------------+-------------+
|                Event                |     Sum     |     Max     |     Min     |     Avg     |
+-------------------------------------+-------------+-------------+-------------+-------------+
|       INSTR_RETIRED_ANY STAT        | 5.99881e+08 | 3.30559e+08 | 6.13907e+07 | 1.4997e+08  |
|     CPU_CLK_UNHALTED_CORE STAT      | 1.02161e+09 | 5.36794e+08 | 1.07672e+08 | 2.55403e+08 |
|      CPU_CLK_UNHALTED_REF STAT      | 9.91899e+08 | 5.15724e+08 | 1.0452e+08  | 2.47975e+08 |
|    UNC_QMC_NORMAL_READS_ANY STAT    | 1.42469e+07 | 1.42469e+07 |      0      | 3.56173e+06 |
|    UNC_QMC_WRITES_FULL_ANY STAT     | 3.3378e+06  | 3.3378e+06  |      0      |   834449    |
| UNC_QHL_REQUESTS_REMOTE_READS STAT  | 5.95875e+06 | 5.95875e+06 |      0      | 1.48969e+06 |
|  UNC_QHL_REQUESTS_LOCAL_READS STAT  | 9.16778e+06 | 9.16778e+06 |      0      | 2.29194e+06 |
| UNC_QHL_REQUESTS_REMOTE_WRITES STAT |   163766    |   163766    |      0      |   40941.5   |
+-------------------------------------+-------------+-------------+-------------+-------------+
+-----------------------------+----------+----------+-----------+-----------+
|           Metric            |  core 0  |  core 1  |  core 2   |  core 3   |
+-----------------------------+----------+----------+-----------+-----------+
|     Runtime (RDTSC) [s]     | 10.0024  | 10.0024  |  10.0024  |  10.0024  |
|    Runtime unhalted [s]     | 0.101511 | 0.251623 | 0.0752758 | 0.0504714 |
|         Clock [MHz]         | 2136.45  | 2220.49  |  2204.33  |  2197.66  |
|             CPI             |  1.8702  |  1.6239  |  1.7429   |  1.75388  |
| Memory bandwidth [MBytes/s] | 112.515  |    0     |     0     |     0     |
| Memory data volume [GBytes] | 1.12542  |    0     |     0     |     0     |
|  Remote Read BW [MBytes/s]  | 38.1267  |    0     |     0     |     0     |
| Remote Write BW [MBytes/s]  | 1.04785  |    0     |     0     |     0     |
|    Remote BW [MBytes/s]     | 39.1746  |    0     |     0     |     0     |
+-----------------------------+----------+----------+-----------+-----------+
+----------------------------------+----------+----------+-----------+----------+
|              Metric              |   Sum    |   Max    |    Min    |   Avg    |
+----------------------------------+----------+----------+-----------+----------+
|     Runtime (RDTSC) [s] STAT     | 40.0097  | 10.0024  |  10.0024  | 10.0024  |
|    Runtime unhalted [s] STAT     | 0.478882 | 0.251623 | 0.0504714 | 0.11972  |
|         Clock [MHz] STAT         | 8758.93  | 2220.49  |  2136.45  | 2189.73  |
|             CPI STAT             | 1.70302  |  1.8702  |  1.6239   | 0.425755 |
| Memory bandwidth [MBytes/s] STAT | 112.515  | 112.515  |     0     | 28.1287  |
| Memory data volume [GBytes] STAT | 1.12542  | 1.12542  |     0     | 0.281355 |
|  Remote Read BW [MBytes/s] STAT  | 38.1267  | 38.1267  |     0     | 9.53168  |
| Remote Write BW [MBytes/s] STAT  | 1.04785  | 1.04785  |     0     | 0.261962 |
|    Remote BW [MBytes/s] STAT     | 39.1746  | 39.1746  |     0     | 9.79365  |
+----------------------------------+----------+----------+-----------+----------+

各种信息就在你指尖。

祝玩的开心!

Post Footer automatically generated by wp-posturl plugin for wordpress.

你可能感兴趣的:(linux,topology,工具介绍,Likwid,msr)