使用 Perf 衡量程序 FLOPs

使用 Perf 衡量程序 FLOPs_第1张图片

FLOPs 是用来衡量科学计算程序计算量的关键指标,表示一个程序完整运算所需的浮点运算次数。在此,我使用系统性能评测工具 Perf 来衡量一个程序的 FLOPs。

安装 Perf

Ubuntu/Debian

apt-get install linux-tools-common linux-tools-generic linux-tools-`uname -r`

CentOS

yum install perf

查看支持的事件及其代号

安装 libpfm4

git clone git://perfmon2.git.sourceforge.net/gitroot/perfmon2/libpfm4
cd libpfm4
make

查看事件

进入examples文件夹,运行showevtinfo程序,查看哪些事件是与 flops 相关的,在我的电脑中,我找到以下几个事件

IDX	 : 419430470
PMU name : skl (Intel Skylake)
Name     : FP_ARITH_INST_RETIRED
Equiv	 : None
Flags    : None
Desc     : Floating-point instructions retired
Code     : 0xc7
Umask-00 : 0x01 : PMU : [SCALAR_DOUBLE] : None : Number of scalar double precision floating-point arithmetic instructions (multiply by 1 to get flops)
Umask-01 : 0x02 : PMU : [SCALAR_SINGLE] : None : Number of scalar single precision floating-point arithmetic instructions (multiply by 1 to get flops)
Umask-02 : 0x04 : PMU : [128B_PACKED_DOUBLE] : None : Number of scalar 128-bit packed double precision floating-point arithmetic instructions (multiply by 2 to get flops)
Umask-03 : 0x08 : PMU : [128B_PACKED_SINGLE] : None : Number of scalar 128-bit packed single precision floating-point arithmetic instructions (multiply by 4 to get flops)
Umask-04 : 0x10 : PMU : [256B_PACKED_DOUBLE] : None : Number of scalar 256-bit packed double precision floating-point arithmetic instructions (multiply by 4 to get flops)
Umask-05 : 0x20 : PMU : [256B_PACKED_SINGLE] : None : Number of scalar 256-bit packed single precision floating-point arithmetic instructions (multiply by 8 to get flops)
Umask-06 : 0x40 : PMU : [512B_PACKED_DOUBLE] : None : Number of scalar 512-bit packed double precision floating-point arithmetic instructions (multiply by 8 to get flops)
Umask-07 : 0x80 : PMU : [512B_PACKED_SINGLE] : None : Number of scalar 512-bit packed single precision floating-point arithmetic instructions (multiply by 16 to get flops)
Modif-00 : 0x00 : PMU : [k] : monitor at priv level 0 (boolean)
Modif-01 : 0x01 : PMU : [u] : monitor at priv level 1, 2, 3 (boolean)
Modif-02 : 0x02 : PMU : [e] : edge level (may require counter-mask >= 1) (boolean)
Modif-03 : 0x03 : PMU : [i] : invert (boolean)
Modif-04 : 0x04 : PMU : [c] : counter-mask in range [0-255] (integer)
Modif-05 : 0x05 : PMU : [t] : measure any thread (boolean)
Modif-06 : 0x07 : PMU : [intx] : monitor only inside transactional memory region (boolean)
Modif-07 : 0x08 : PMU : [intxcp] : do not count occurrences inside aborted transactional memory region (boolean)
#-----------------------------
IDX	 : 419430469
PMU name : skl (Intel Skylake)
Name     : FP_ARITH
Equiv	 : FP_ARITH_INST_RETIRED
Flags    : None
Desc     : Floating-point instructions retired
Code     : 0xc7
Umask-00 : 0x01 : PMU : [SCALAR_DOUBLE] : None : Number of scalar double precision floating-point arithmetic instructions (multiply by 1 to get flops)
Umask-01 : 0x02 : PMU : [SCALAR_SINGLE] : None : Number of scalar single precision floating-point arithmetic instructions (multiply by 1 to get flops)
Umask-02 : 0x04 : PMU : [128B_PACKED_DOUBLE] : None : Number of scalar 128-bit packed double precision floating-point arithmetic instructions (multiply by 2 to get flops)
Umask-03 : 0x08 : PMU : [128B_PACKED_SINGLE] : None : Number of scalar 128-bit packed single precision floating-point arithmetic instructions (multiply by 4 to get flops)
Umask-04 : 0x10 : PMU : [256B_PACKED_DOUBLE] : None : Number of scalar 256-bit packed double precision floating-point arithmetic instructions (multiply by 4 to get flops)
Umask-05 : 0x20 : PMU : [256B_PACKED_SINGLE] : None : Number of scalar 256-bit packed single precision floating-point arithmetic instructions (multiply by 8 to get flops)
Umask-06 : 0x40 : PMU : [512B_PACKED_DOUBLE] : None : Number of scalar 512-bit packed double precision floating-point arithmetic instructions (multiply by 8 to get flops)
Umask-07 : 0x80 : PMU : [512B_PACKED_SINGLE] : None : Number of scalar 512-bit packed single precision floating-point arithmetic instructions (multiply by 16 to get flops)
Modif-00 : 0x00 : PMU : [k] : monitor at priv level 0 (boolean)
Modif-01 : 0x01 : PMU : [u] : monitor at priv level 1, 2, 3 (boolean)
Modif-02 : 0x02 : PMU : [e] : edge level (may require counter-mask >= 1) (boolean)
Modif-03 : 0x03 : PMU : [i] : invert (boolean)
Modif-04 : 0x04 : PMU : [c] : counter-mask in range [0-255] (integer)
Modif-05 : 0x05 : PMU : [t] : measure any thread (boolean)
Modif-06 : 0x07 : PMU : [intx] : monitor only inside transactional memory region (boolean)
Modif-07 : 0x08 : PMU : [intxcp] : do not count occurrences inside aborted transactional memory region (boolean)
#-----------------------------
IDX	 : 419430414
PMU name : skl (Intel Skylake)
Name     : FP_ASSIST
Equiv	 : None
Flags    : None
Desc     : X87 floating-point assists
Code     : 0xca
Umask-00 : 0x1001e : PMU : [ANY] : [default] : Cycles with any input/output SEE or FP assists
Modif-00 : 0x00 : PMU : [k] : monitor at priv level 0 (boolean)
Modif-01 : 0x01 : PMU : [u] : monitor at priv level 1, 2, 3 (boolean)
Modif-02 : 0x02 : PMU : [e] : edge level (may require counter-mask >= 1) (boolean)
Modif-03 : 0x03 : PMU : [i] : invert (boolean)
Modif-04 : 0x04 : PMU : [c] : counter-mask in range [0-255] (integer)
Modif-05 : 0x05 : PMU : [t] : measure any thread (boolean)
Modif-06 : 0x07 : PMU : [intx] : monitor only inside transactional memory region (boolean)
Modif-07 : 0x08 : PMU : [intxcp] : do not count occurrences inside aborted transactional memory region (boolean)

获取代号

在相同目录下,执行check_events程序,获取指定代号,程序的参数就是上一步骤中获取的Name和Umask,我的执行命令就是如下:

./check_events FP_ARITH_INST_RETIRED:SCALAR_SINGLE FP_ARITH:SCALAR_SINGLE FP_ASSIST

得到如下结果:

Requested Event: FP_ARITH_INST_RETIRED:SCALAR_SINGLE
Actual    Event: skl::FP_ARITH_INST_RETIRED:SCALAR_SINGLE:k=1:u=1:e=0:i=0:c=0:t=0:intx=0:intxcp=0
PMU            : Intel Skylake
IDX            : 419430470
Codes          : 0x5302c7
Requested Event: FP_ARITH:SCALAR_SINGLE
Actual    Event: skl::FP_ARITH_INST_RETIRED:SCALAR_SINGLE:k=1:u=1:e=0:i=0:c=0:t=0:intx=0:intxcp=0
PMU            : Intel Skylake
IDX            : 419430470
Codes          : 0x5302c7
Requested Event: FP_ASSIST
Actual    Event: skl::FP_ASSIST:ANY:k=1:u=1:e=0:i=0:c=1:t=0:intx=0:intxcp=0
PMU            : Intel Skylake
IDX            : 419430414
Codes          : 0x1531eca

结果中的 Codes,就是我们要的代号

衡量程序 FLOPs

找到要测量的程序,然后使用perf stat执行并给予事件代码,即可获得 FLOPs。示例如下:

sudo perf stat -e r5302c7 -e r1531eca  ./example.py

得到结果如下:

Performance counter stats for './example.py':

        13,061,638      r5302c7
        1      r1531eca
        
        1.834101748 seconds time elapsed
        
        1.888016000 seconds user
        0.231023000 seconds sys

其中,r5302c7对应的数值,即为该程序的总 FLOPs。

欢迎关注我的公众号~
使用 Perf 衡量程序 FLOPs_第2张图片

你可能感兴趣的:(Tools,performance)