Perf_event_open 遇到的问题和想法

之前一篇写了如何使用Perf_event_open来监控性能计数器。也找了一些例子监控多个性能计数器,有创建多个寄存器的,也有创建组的,比如https://stackoverflow.com/questions/42088515/perf-event-open-how-to-monitoring-multiple-events

但是有一个通用的问题,就是我将type设置为PERF_TYPE_HARDWARE,config设置为PERF_COUNT_HW_CPU_CYCLES、PERF_COUNT_HW_INSTRUCTIONS (不管是两个还是寄存器组),得到的数据都是有问题的。有各式各样的错误,比如

都为0:

cpu cycles: 0
instructin: 0
page faults: 0

测试结果不准:

cpu cycles: 763916370

而我使用命令行的perf stat 命令行得到的cycle数是911,975,116,差距有点大呀。

我搜索了google和github,终于发现了一个不同的,我测试结果是准确的。连接如下:https://github.com/castl/easyperf

这个不同点是,type使用PERF_TYPE_RAW,在PERF_EVENT_OPEN说明上是需要自己查看处理器的手册的。我来分析一下github上的easyperf。

首先是头文件:

#ifndef __EASYPERF_H__
#define __EASYPERF_H__

#ifdef __cplusplus
extern "C" {
#endif

#include 

// Extra options for each event
#define PERFMON_EVENTSEL_OS     (1 << 17)
#define PERFMON_EVENTSEL_USR    (1 << 16)

int perf_init(unsigned int num_ctrs, ...);
void perf_close();

uint64_t perf_read(unsigned int ctr);
void perf_read_all(uint64_t* vals);

// microarch neutral
#define EV_CYCLES          (0x3C | (0x0 << 8))
#define EV_REF_CYCLES      (0x3C | (0x1 << 8))
#define EV_INSTR           (0xC0 | (0x0 << 8))
#define EV_BRANCH          (0xC4 | (0x1 << 8))
#define EV_BRANCH_MISS     (0xC5 | (0x1 << 8))

// microarch specific
#define I7_L3_REFS      (0x2e | (0x4f << 8))
#define I7_L3_MISS      (0x2e | (0x41 << 8))

#define I7_L2_REFS      (0x24 | (0xff << 8))
#define I7_L2_MISS      (0x24 | (0xaa << 8))

#define I7_ICACHE_HITS  (0x80 | (0x01 << 8))
#define I7_ICACHE_MISS  (0x80 | (0x02 << 8))

#define I7_DL1_REFS     (0x43 | (0x01 << 8))

#define I7_LOADS        (0x0b | (0x01 << 8))
#define I7_STORES       (0x0b | (0x02 << 8))

#define I7_L2_DTLB_MISS (0x49 | (0x01 << 8))
#define I7_L2_ITLB_MISS (0x85 | (0x01 << 8))

#define I7_IO_TXNS      (0x6c | (0x01 << 8))
#define I7_DRAM_REFS    (0x0f | (0x20 << 8))

#ifdef __cplusplus
}
#endif

#endif // __EASYPERF_H__

可以看出,主要是定义了性能寄存器的使用。结合寄存器的图来具体说明,先看寄存器说明图:

Perf_event_open 遇到的问题和想法_第1张图片

下面两行是第十六位和第十七位,是看用户模式还是操作系统模式:

#define PERFMON_EVENTSEL_OS     (1 << 17)
#define PERFMON_EVENTSEL_USR    (1 << 16)
  • USR (user mode) flag (bit 16) — Specifies that the selected microarchitectural condition is counted when the logical processor is operating at privilege levels 1, 2 or 3. This flag can be used with the OS flag.
  • OS (operating system mode) flag (bit 17) — Specifies that the selected microarchitectural condition is counted when the logical processor is operating at privilege level 0. This flag can be used with the USR flag.

而其他的是后16位,

#define EV_CYCLES          (0x3C | (0x0 << 8))
#define EV_REF_CYCLES      (0x3C | (0x1 << 8))

具体解释是:

  • Unit mask (UMASK) field (bits 8 through 15) — These bits qualify the condition that the selected event logic unit detects. Valid UMASK values for each event logic unit are specific to the unit. For each architectural performance event, its corresponding UMASK value defines a specific microarchitectural condition. A pre-defined microarchitectural condition associated with an architectural event may not be applicable to a given processor. The processor then reports only a subset of pre-defined architectural events. Pre-defined architectural events are listed in Table 18-1; support for pre-defined architectural events is enumerated using CPUID.0AH:EBX. Architectural performance events available in the initial implementation are listed in Table 19-1.
  • Event select field (bits 0 through 7) — Selects the event logic unit used to detect microarchitectural conditions (see Table 18-1, for a list of architectural events and their 8-bit codes). The set of values for this field is defined architecturally; each value corresponds to an event logic unit for use with an architectural performance event. The number of architectural events is queried using CPUID.0AH:EAX. A processor may support only a subset of pre-defined values.

而umask和event select我们可以找到对应的表格(列举一个,还有很多):

Perf_event_open 遇到的问题和想法_第2张图片

但是我主要的是arm64的,所以还得找arm的技术手册。加油

你可能感兴趣的:(ubuntu,linux)