perf stat 输出详解

这次对照内核源码以及intel sdm手册,对perf stat统计项做一次详细的梳理。力图做到权威,减少理解的偏差。

硬件事件,最终会落实到cpu pmu来统计。

perf stat 输出详解_第1张图片

这些事件属于perf_event_open()接口中的PERF_TYPE_HARDWARE大类。

选项 编码(event/umask) Intel SDM 解释 补充说明
cycles 0x3c, 0x00 Counts core clock cycles whenever the logical processor is in C0 state (not halted). The frequency of this event varies with state transitions in the core.
instructions 0xc0, 0x00 Counts when the last uop of an instruction retires.
cache-references 0x2e, 0x4f Accesses to the LLC, in which the data is present (hit) or not present (miss).
cache-misses 0x2e, 0x41 Accesses to the LLC in which the data is not present (miss).
branches 0xc4, 0x00 Counts when the last uop of a branch instruction retires.
branch-misses 0xc5, 0x00 Counts when the last uop of a branch instruction retires which corrected misprediction of the branch prediction hardware at execution time.
bus-cycles 0x3c, 0x01 Counts at a fixed frequency whenever the logical processor is in C0 state (not halted).Current implementations count at core crystal clock, TSC, or bus clock frequency.
stalled-cycles-frontend 与微架构相关 Increments each cycle the # of Uops issued by the RAT to RS. Set Cmask = 1, Inv = 1, Any= 1 to count stalled cycles of this core.
stalled-cycles-backend 与微架构相关 Counts total number of uops to be executed per- thread each cycle. Set Cmask = 1, INV =1 to count stall cycles.
ref-cycles 0x00, 0x30 This event counts the number of reference core cpu cycles. Reference means that the event increments at a constant rate which is not subject to core CPU frequency adjustments. The event may not count when the processor is in halted (low power) state. As such, it may not be equivalent to wall clock time. However, when the processor is not halted state, the event keeps a constant correlation with wall clock time.

cache硬件事件,最终会落实到cpu pmu来统计。

perf stat 输出详解_第2张图片

这些事件属于perf_event_open()接口中的PERF_TYPE_HW_CACHE事件大类。再配合cache_type, cache_op, cache_result属性来拼凑成不同统计值。
perf stat 输出详解_第3张图片

由于cache_type, cache_op, cache_result是跟微架构相关的,linux内核利用hw_cache_event_ids和hw_cache_extra_regs 数组来保存特定体系结构对应的值。内核根据CPU微架构区分赋值可以参考。对着intel SDM阅读后,感觉不能完全相信perf_event_open的man手册,统计项根据cpu微架构不同而有区别。

Skylake微架构:

选项 编码(event/umask) Intel SDM 分类 Intel SDM 解释 补充说明
L1-dcache-loads 0xd0,0x81 MEM_INST_RETIRED.ALL_LOADS All retired load instructions.
L1-dcache-loads-misses 0x51,0x01 L1D.REPLACEMENT Counts the number of lines brought into the L1 data cache.
L1-dcache-stores 0xd0,0x82 MEM_INST_RETIRED.ALL_STOR ES All retired store instructions.
L1-icache-loads-misses 0x83,0x02 ICACHE_64B.IFTAG_MISS Instruction fetch tag lookups that miss in the instruction cache (L1I). Counts at 64-byte cache-line granularity.
LLC-loads 0xb7,0x01
LLC-load-misses 0xb7,0x01

你可能感兴趣的:(Linux,Kernel)